Re: 2.6.12.2 dies after 24 hours

From: Rob Mueller (robm_at_fastmail.fm)
Date: 07/13/05

  • Next message: Chuck Ebbert: "Re: 2.6.13-rc2-mm2"
    To: "Lars Roland" <lroland@gmail.com>, "Bron Gondwana" <brong@fastmail.fm>
    Date:	Wed, 13 Jul 2005 10:27:10 +1000
    
    

    > > We're also applying the attached patch. There's a bug in reiserfs that
    > > gets tickled by our huge MMAP usage (it's amazing what really busy
    > > Cyrus daemons can do to a server, ouch). It's fixed in generic_write,
    > > so we take the few percent performance hit for something that doesn't
    > > break!
    >
    > Interesting - When I got the problem it was on mail servers under high
    > load (handling 60.000 emails pr. hour) with reiserfs as file system. I
    > have seen this problem on 5 different servers so I am confident that
    > it is not hardware failure.
    >
    > Sometimes the server load just rises and then the server dies other
    > times the load rises but the kernel manages to get it back alive
    > filling up syslog with messages like this

    Sounds like a different issue. The patch Bron included before fixes (or at
    least reduces to the point where it fixes it for us) a problem where
    processes get stuck in D state and are unkillable. A reboot is required to
    remove them. Apparently this is a known bug in ReiserFS (see messages
    below). As noted, the same bug exists in ext3. There appears to have been
    some patches to try and fix it for both reiserfs and ext3, but I'm not sure
    if they're in the mainline kernel yet.

    http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2056.html
    http://hulllug.principalhosting.net/archive/index.php/t-22774.html

    Rob

    ----- Original Message -----
    From: "Vladimir Saveliev" <vs@namesys.com>
    To: "Jeremy Howard" <jhoward@fastmail.fm>
    Cc: "Hans Reiser" <reiser@namesys.com>; <reiserfs-dev@namesys.com>
    Sent: Friday, October 08, 2004 4:57 PM
    Subject: Re: URGENT: Need fix for problem with copy_from_user inside a
    transaction

    > Hello
    >
    > On Thu, 2004-10-07 at 21:35, Jeremy Howard wrote:
    >> On Fri, 01 Oct 2004 09:13:34 -0700, "Hans Reiser" <reiser@namesys.com>
    >> said:
    >> > Chris, can you comment please?
    >>
    >> Also... can you guys suggest any ways to minimise the problem, e.g.
    >> external vs internal journal? metadata vs full journalling? changing the
    >> elevator? ...
    >
    > Would you please check whether the attached patch for
    > ./fs/reiserfs/file.c fixes the problem?
    >
    > Quick benchmark shows that using generic_file_write does not hurt
    > reiserfs performance too much comparing to using original
    > reiserfs_file_write.
    >
    > ---Sequential Output (nosync)--- ---Sequential
    > Input-- --Rnd Seek-
    > -Per Char- --Block--- -Rewrite-- -Per
    > Char- --Block--- --04k (03)-
    > MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
    > K/sec %CPU /sec %CPU
    > generic_file_write 256 16690 99.7 28332 29.9 11528 16.8 12411 77.2
    > 28888 17.3 237.4 2.1
    > reiserfs_file_write 256 15647 86.8 30875 22.1 10822 14.1 11631 72.1
    > 29184 16.1 250.0 2.0
    >
    >
    >>
    >> > Vladimir Saveliev wrote:
    >> >
    >> > >Hello
    >> > >
    >> > >On Fri, 2004-10-01 at 01:53, Hans Reiser wrote:
    >> > >
    >> > >
    >> > >>vs, this is not enough of an answer to share your understanding of
    >> > >>the
    >> > >>problem. Please say much more.
    >> > >>
    >> > >>
    >> > >
    >> > >Reiserfs_write_file locks set of pages, and then tries to copy data to
    >> > >them. If it is to copy data from one of pages which are locked and if
    >> > >that page is not uptodate, pagefault requires to lock that page, but
    >> > >as
    >> > >it is locked already - process deadlocks with itself.
    >> > >
    >> > >
    >> > Is this when copying from one file in a formatted node to another file
    >> > in that node?
    >> >
    >> > >As Chris said - fix is not trivial. Also, it is known that he did
    >> > >already something about it, so, I thought that it would make sence to
    >> > >find first what is his state at this problem.
    >> > >
    >> > >
    >
    > Content-Disposition: attachment; filename=file.c.diff2
    >
    > --- file.c~ 2004-10-02 12:29:33.223660850 +0400
    > +++ file.c 2004-10-08 10:03:03.001561661 +0400
    > @@ -1137,6 +1137,8 @@
    > return result;
    > }
    >
    > + return generic_file_write(file, buf, count, ppos);
    > +
    > if ( unlikely((ssize_t) count < 0 ))
    > return -EINVAL;
    >

    ----- Original Message -----
    From: "Chris Mason" <mason@suse.com>
    To: "Hans Reiser" <reiser@namesys.com>
    Cc: "Vladimir Saveliev" <vs@namesys.com>; "Oleg Drokin"
    <green@linuxhacker.ru>; "Jeremy Howard" <jhoward@fastmail.fm>;
    <reiserfs-dev@namesys.com>
    Sent: Saturday, October 09, 2004 1:05 AM
    Subject: Re: URGENT: Need fix for problem with copy_from_user inside
    atransaction

    > On Fri, 2004-10-08 at 07:46 -0700, Hans Reiser wrote:
    >> No, this is not the right answer.
    >
    >> >>>--- file.c~ 2004-10-02 12:29:33.223660850 +0400
    >> >>>+++ file.c 2004-10-08 10:03:03.001561661 +0400
    >> >>>@@ -1137,6 +1137,8 @@
    >> >>> return result;
    >> >>> }
    >> >>>
    >> >>>+ return generic_file_write(file, buf, count, ppos);
    >> >>>+
    >> >>> if ( unlikely((ssize_t) count < 0 ))
    >> >>> return -EINVAL;
    >
    > It's not right because ext3 has exactly the same bug. The only real
    > solution is to change the order in which things happen during
    > file_write.
    >
    > Right now, we do this:
    >
    > prepare_write() allocates space, starts a transaction
    > copy_from_user() can deadlock here because a transaction is running
    > commit_write() end the transaction
    >
    > This is the same in generic_file_write and reiserfs_file_write, although
    > reiserfs_file_write doesn't call reiserfs_prepare_write, it has the same
    > basic structure in terms of when the transaction starts and ends.
    >
    > The solution is to move most of the work from reiserfs_prepare_write
    > into reiserfs_commit_write. I've made a number of different patches for
    > this, all have had problems, but I'm working through it and will have a
    > real tested fix.
    >
    > Jeremy the best thing you can do right now is to mount your filesystems
    > with -o nolargeio=1, and use the temporary workarounds I sent you
    > before. The -o nolargeio=1 will reduce the amount of work we try to do
    > with each pass in reiserfs_file_write (note that Vladimir's patch above
    > will have very similar effects).
    >
    > -chris

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Chuck Ebbert: "Re: 2.6.13-rc2-mm2"

    Relevant Pages

    • Re: UPHClean v1.5e on W2KAS
      ... >seems that UPHClean will fix the problem. ... >has installed this service on their servers and was there any ... >performance hit or problems? ...
      (microsoft.public.win2000.advanced_server)
    • [2/3] POHMELFS: Documentation.
      ... * Client is able to switch between different servers (if one goes down, ... * Read requests balancing between multiple servers. ... Each transaction contains all information needed to process given command ...
      (Linux-Kernel)
    • Re: POP3 E-mail Issue After Swing Migration
      ... only be using POP3 and SMTP servers under corporate control. ... changing it in the ISA Server Management Console, Server, Configuration, ... how in the heck do I fix this?! ...
      (microsoft.public.windows.server.sbs)
    • RE: Truncated message
      ... I was going to do the same fix this weekend. ... > This is becoming annoying as it is happening on all of our servers. ... > click OK to shutdown the system and reboot into Directory Services Restore ... > ended up having to rebuild the server. ...
      (microsoft.public.win2000.active_directory)
    • Re: BGP & reverse dns
      ... I used the list because I needed to know if there was a quick fix here, ... I know nothing about BGP routing... ... >responsible and whether there is a problem with their name servers. ...
      (freebsd-hackers)