Network buffer hang was Re: [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata

From: Andi Kleen (ak_at_colin2.muc.de)
Date: 09/11/03

  • Next message: Maciej W. Rozycki: "Re: RFC: [2.6 patch] better i386 CPU selection"
    Date:	11 Sep 2003 14:09:56 +0200
    Date:	Thu, 11 Sep 2003 14:09:56 +0200
    To: dada1 <dada1@cosmosbay.com>
    
    

    On Thu, Sep 11, 2003 at 07:58:46AM +0200, dada1 wrote:
    >
    > >
    > > I don't have any. But it would be very similar to the in kernel checking
    > > code (see the is_prefetch function in my patches). Just you feed it
    > > the fields from sigcontext in the signal handler and replace __get_user
    > > with a normal memory access.
    >
    > OK will try... but how test it... sounds not easy.

    From your description of the symptoms it sounds like a waste of time.

    >
    > Well, the program is using more than 2Go ram... the core is not written to
    > disk as the machine hangs just *after*

    These bug doesn't cause kernel hangs, just "ordinary" segfaults

    (unless the kernel triggers it, but that happens only very rarely)

    > >
    > > If it is a different instruction it is unrelated.
    > >
    > > It would also only happen when you prefetch ever on unmapped addresses.
    >
    > NULL for example ?

    Yes, NULL would work. But only when there are byte sized reads of parts
    of the prefetched memory that are not aligned to the cache line size
    and hit the second part of a cache line and happens exactly the in
    the right timing window after the prefetch

    (see Richard's description - it really is quite hard to trigger)

    >
    > Typical example of code ;
    >
    > T_cell *ptr, *next ;
    > for (ptr = list.head ; ptr != NULL ; ptr = next) {
    > next = ptr->next ;
    > prefetch(next) ;
    > some_work(ptr) ;
    > }
    >
    > I may replace NULL by &FakeMappedData (allways present in memory)

    That's certainly safer, but slightly slower (need one more register in
    the loop)

    If it doesn't trigger I would probably not bother.

    >
    > >
    > > That sounds like an unrelated issue.
    > >
    > > When user space crashes on this the kernel is unaffected.
    >
    > This is not a kernel crash. But total freeze as all memory is used by
    > network buffers, in no more than 10 seconds.

    Ok, but then you have to diagnose this freeze. I'm not sure why you
    think it must be this prefetch thingy. If the prefetch issue was
    hit then you would just get a normal segfault, not a kernel hang.

    e.g. you could write some kind of reduced test case for it and
    post it to the netdev mailing list (netdev@oss.sgi.com)

    I'm cc'ing it for you.

    > This application receive smalls TCP messages (about 30 bytes), but the
    > network stacks allocates 4KB buffers to store this little messages.

    Most drivers only allocate MTU size in their receive ring
    (normally 1.5K on ethernet). This is rounded to 2K by the memory allocator.

    But most drivers support a rx_copybreak parameter. When the received
    packet is smaller than rx_copybreak it is copied to a freshly allocated
    buffer with the right size.

    In addition the 2.4 stack also supports garbage collection in the TCP
    receive buffers. This means even when a driver doesn't do the rx_copybreak
    trick and the receive queue of a socket fills up it will copy the data
    to fresh, right sized packets by itself.

    Another limit for this scenario is that the network stack has internal
    limits that supposed to avoid this. These are: each socket has a
    fixed receive buffer size and when more data arrives (including packet
    metadata and normal wastage) than the receive buffer allows then it is
    still dropped. In addition TCP has a global memory limit that also kicks
    in. And the network stack has a global queue limit that prevents
    too much data to be queued from the driver to the higher level
    parts (/proc/sys/net/core/netdev_max_backlog). Sometimes the queueing
    can also be controlled on the driver level with driver specific
    knobs.

    This all can be tuned by sysctls in /proc/sys. See Documentation/networking/
    ip-sysctl.txt for more details.

    Also the latest 2.6 kernel finally has a writable /proc/sys/vm/min_free_kbytes
    again. This controls the amount of memory kept free for interrupts.
    Increase that.

    > I posted a test application some days ago about this problem and got no
    > answers/feedback.

    Did you post it to netdev? On linux-kernel such things get often
    lost in the noise.

    Also I would contact the driver maintainer, it could be really a driver
    Issue.

    -Andi

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Maciej W. Rozycki: "Re: RFC: [2.6 patch] better i386 CPU selection"

    Relevant Pages

    • Re: 1352 NUL bytes at the end of a page? (was Re: Assertion `s && s->tree failed: The sag
      ... > usbcore: registered new driver usbfs ... (I just pulled that key at random out of the kernel repository; ... That ought to do a nice test of the CPU, memory, disk, and kernel sans ... I don't see how it could be an in-flight corruption. ...
      (Linux-Kernel)
    • Re: CE6.0 Driver Pointer Marshalling - passing pointers out only?
      ... I've got a pretty complex driver I'm porting up at the moment ... This is an array of pointers to buffers that get set up ... the driver allocates the buffers (via an internal allocation ... routine from its own block of reserved memory). ...
      (microsoft.public.windowsce.platbuilder)
    • Re: Memory allocation problem with 2.6.22 after suspend/resume cycle
      ... after a suspend and resume cycle, the kernel will try to free ... I do however not agree with Andrew's conclusion, as the memory is not ... Can you try another version of the ATI driver? ... Are you 100% sure that your 32-bit kernel configuration reflects the 64-bit ...
      (Linux-Kernel)
    • Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19
      ... > free RAM is at 90%, still I cannot remove that piece of faulty RAM, fix ... Until we have some kind of kernel ... Another example might be a somewhat errant device driver which has ... implement these APIs it is not compatible with memory hotplug. ...
      (Linux-Kernel)
    • memory mapped IO in kernel mode
      ... I'm working on a device driver for a 2.4 kernel and I'm having a heck of a ... And I have a similar driver written for x86 ... The hardware is controlled via memory mapped registers at 0x11e0104 - ... use the handle returned by ioremap for readb and readw... ...
      (comp.os.linux.embedded)