Re: problem with cache flush routine for G5?

From: Benjamin Herrenschmidt (benh_at_kernel.crashing.org)
Date: 03/06/04

  • Next message: Mike Fedyk: "Re: questions about io scheduler"
    To: Chris Friesen <cfriesen@nortelnetworks.com>
    Date:	Sat, 06 Mar 2004 10:53:55 +1100
    
    

    > This OS allows runtime patching of code. After changing the
    > instruction(s), it then has to make sure that the icache doesn't contain
    > stale instructions.
    >
    > The original code was written for ppc hardware that had the ability to
    > flush the whole dcache and invalidate the whole icache, all at once, so
    > that's what they used.

    That's very inefficient.

    > The code doesn't track the address/size of what
    > was changed. For our existing products, we are using the 74xx series,
    > and they've got hardware cache flush/invalidate as well, so we just kept
    > using that.

    Ouch... that _VERY_ inefficient... and the HW flush on the 74xx may
    be broken on some models afaik =P

    > For the 970 however, that hardware mechanisms seem to be
    > absent, which started me on this whole path.
    >
    > After doing some digging in the 970fx specs, it seems that we may not
    > need to explicitly force a store of the L1 dcache at all. According to
    > the docs, the L1 dcache is unconditionally store-through. Thus, for a
    > brute-force implementation we should be able to just invalidate the
    > whole icache, do the appropriate sync/isync, and it should pick up the
    > changed instructions from the L2 cache. Do you see any problems with
    > this? Do I actually still need the store?

    It's unclear if stores will get straight to the coherency domain or not,
    they may still be stuffed in a store queue... Though a sync would
    probably flush it. Still, you should really try to get your code fixes
    to just dcb{f,st}/icbi on the right instruction.

    > Of course, the proper fix is to change the code in the OS running on the
    > emulator to track the addresses that got changed and just do the minimal
    > work required.
    >
    > Chris

    -- 
    Benjamin Herrenschmidt <benh@kernel.crashing.org>
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Mike Fedyk: "Re: questions about io scheduler"

    Relevant Pages

    • RE: Problem Setting Up x.509 Certificates for WSE2.0
      ... and it turns out that the instructions and WSE2 itself ... "Other People" store on the certificate import dialog on the WinXPPro machine ... > certificate store and the local machine certificate store snap-ins. ... > CurrentUser/Other People ...
      (microsoft.public.dotnet.framework.webservices.enhancements)
    • Re: How to cast using MSVC++ Intrinsics
      ... to load the SIMD register from the stack on a quadword floating point load. ... What would be nice is to have init to 0 or -1 instructions. ... Store, Load, Store, etc. ...
      (comp.lang.asm.x86)
    • Problem Setting Up x.509 Certificates for WSE2.0
      ... I am having trouble following the instructions provided for the sample apps ... certificate store and the local machine certificate store snap-ins. ... MsdnWse2SecuritySamplesClient.cer (Client's public key) -> ...
      (microsoft.public.dotnet.framework.webservices.enhancements)
    • Re: [PATCH 2/2] Markers Implementation for Preempt RCU Boost Tracing
      ... counting instructions) ... taken areas of code - blowing up its icache footprint considerably. ... cache-blow-up aspect is more severe in practice at 10-20%. ... it away into another portion of the kernel image. ...
      (Linux-Kernel)
    • Re: Intel x86 memory model question
      ... processor 3 loads from Y ... show that PC alone wasn't sufficient to guarantee that if processor 3 saw the store into Y by processor 2 that it would see the store into X by processor 1. ... have such dependencies than it is to change threading libraries in OS ... "I/O, locking, and/or serializing instructions" can synchronize is? ...
      (comp.arch)