2 NFS problems

From: Stanley, Jon (Jon.Stanley_at_savvis.net)
Date: 04/25/04

  • Next message: James Simmons: "Re: [Linux-fbdev-devel] [PATCH] neofb patches"
    Date:	Sat, 24 Apr 2004 19:28:12 -0500
    To: <linux-kernel@vger.kernel.org>
    
    

    I have two distinct problems possibly related to NFS in the Linux
    kernel. Any assistance would be appreciated, any flames that say "you
    should have looked such-and-such place" are welcome as well, so long as
    they include pointers to said places :-)

    1) A system will become unusable, with the following in
    /var/log/messages:

    Apr 24 22:16:35 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 22:16:35 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.

    Before these messages are the messages that make me believe that NFS is
    either the cause or a victim of this (I'm not sure which)

    Apr 24 22:11:59 <hostname> kernel: nfs: task 7513 can't get a request
    slot

    Here's the full log from the beginning of this event (began at 13:00,
    within several hours it had degraded the system to the point that it was
    not usable any longer - i.e. no ssh, etc.)

    Apr 24 13:00:45 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:00:50 <hostname> last message repeated 7 times
    Apr 24 13:00:56 <hostname> kernel: nfs: server <nfs server> not
    responding, still trying
    Apr 24 13:00:56 <hostname> kernel: nfs: server <nfs server> not
    responding, still trying
    Apr 24 13:00:56 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:01:25 <hostname> last message repeated 13 times
    <irrelevant line removed>
    Apr 24 13:01:36 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:01:36 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    <irrelevant line removed>
    Apr 24 13:01:59 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:01:59 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    <irrelevant line removed>
    Apr 24 13:02:04 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:02:04 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    <irrelevant line removed>
    Apr 24 13:02:15 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:02:16 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    <irrelevant line removed>
    Apr 24 13:02:38 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:02:38 <hostname> kernel: __alloc_pages: 4-order allocation
    failed.
    Apr 24 13:02:46 <hostname> kernel: nfs: task 7463 can't get a request
    slot
    Apr 24 13:02:46 <hostname> kernel: nfs: task 7464 can't get a request
    slot

    This is running RedHat 7.1, with kernel version 2.4.2-2smp (yes, I know
    it's antique - I'm also looking for known bugs that fit this particular
    presentation). The NFS mount is coming from an EMC NS600G. Let me know
    if more information is required.

    2) The second problem, occuring on a separate system, is a little
    harder to articulate, and there are no logs to go along with it, but
    here goes:

    This system, was has mounts from EMC Celerra storage, occasionally has a
    problem whereby one filesystem will become inaccessible (it mounts about
    65 different filesystem across 14 datamovers, IIRC). The problem is
    actually happening right now, and here's the best description:

    When you try to run df, it will just hang forever, requiring you to kill
    your terminal session in order to get out of it. Same when you try to
    do anything with this filesystem. There are multiple machines mounting
    these filesystems, and it does not happen on the same filesystem to all
    of them. Unfortunately, they are all running the same version of the
    kernel (2.4.9-34smp) and distribution (RH 7.2). It also does not happen
    to every filesystem mounted off that datamover - just one of them. The
    combination of all of these factors leads to a very baffling problem.
    Unfortuantely, the customer will not allow us to upgrade the kernel
    without extensive testing on their side first. A reboot of the server
    always fixes the problem, for a time.

    Any suggestions/comments/constructive flames welcome :-)

    -Jon

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: James Simmons: "Re: [Linux-fbdev-devel] [PATCH] neofb patches"

    Relevant Pages

    • Re: Integration of SCST in the mainstream Linux kernel
      ... NFS broke it by having "file handles", and I bet the same thing happens when emulating a SCSI target in user space. ... I quite grant there are many good reasons to do NFS or iSCSI data path in the kernel... ... But you just don't need its complexity if your filesystem must have its own authentication, distributed coordination, multiple-connection management code of its own. ...
      (Linux-Kernel)
    • Re: [PATCH] NFS: Replace null dentries that appear in readdirs list [try #2]
      ... This should now be left to the NFS client. ... which mounts can cross servers, but that uses more or less the same mechanism. ... What's wrong with having the kernel do it? ... Automounting and auto-expiration both. ...
      (Linux-Kernel)
    • NFS (ext3/VFS?) bug in 2.6.8/10
      ... back from NFS accesses of one of our data server machines. ... After investigation, we conclude that a kernel bug is ... there's a really simple recipe to ... Explicit NFS mounts are fine too.) ...
      (Linux-Kernel)
    • Re: [RFC] Advanced XIP File System
      ... I will be submitting a new filesystem for inclusion into the kernel as ... (It mounts but doesn't like doing much else ...
      (Linux-Kernel)
    • Re: panic: mutex Giant owned at nfs_syscalls.c:556
      ... > ffs filesystem. ... > NFS client on 7.0-PRERELEASE and NFS server on 8-CURRENT. ... > Kernel and world synchronized on 8-CUR though. ...
      (freebsd-current)