Re: unmount oops in log_do_checkpoint



On Tue, Jan 17, 2006 at 11:23:53PM +0100, Jan Kara wrote:
> > On Tue, Jan 17, 2006 at 05:32:35PM +0100, Jan Kara wrote:
> > > > On Tue, Jan 17, 2006 at 12:59:45PM +0100, Nick Piggin wrote:
> > > >
> > > > Maybe it is because people haven't been turning on their debugging options,
> > > > tsk tsk ;) It only oopses when DEBUG_SLAB and DEBUG_PAGEALLOC are both
> > > > enabled. And only then when the jbd patch is not reverted. Weird.
> > > Hmm, that's really strange, maybe we have some use-after-free
> > > problem or so... I'll see what I can do :).
> > >
> >
> > Are you able to reproduce? If not I can test patches...
> Hmm, I was not able to reproduce the problem even with those debug
> options set :(. As I'm looking into the code it seems that somebody
> managed to free the transaction but did not clear the
> j_checkpoint_transactions pointer. It's even stranger that it's during
> umount time when there should not be much processes playing with the JBD
> structures on that filesystem.
> Attached is the patch that fixes two minor possible problems I've
> found. Neither of them should be causing your oops but one never knows
> :). Also turn on the JBD debugging in config. Maybe it spits something
> useful. If you still see the same oops, I'll create some debugging
> patch.

This patch does the trick. Survived several reboots now while without
the patch it has oopsed 100% of the time so far. Thanks!

I have also attached a full jbd debug output and oops for the vanilla
2.6.16-rc1 case, just in case that helps.

> BTW: the oops during umount is after some activity on the filesystem
> or you just mount & umount?
>

mount,unmount doesn't seem to trigger it, nor does a bit of filesystem
activity. I haven't tracked down exactly what *does* trigger it, but
booting then rebooting does it every time.

Nick

Attachment:dmesg.bad.gz
Description: application/gunzip



Relevant Pages

  • Re: 2.6.18-rc4-mm1
    ... How can I debug this? ... of your oops shows that we were trying to dereference esi, ... Can you try and reproduce this ...
    (Linux-Kernel)
  • Re: oops on 2.6.19-rc6-mm2: deref of 0x28 at permission+0x7
    ... the oops is fully reproducible. ... It is a bit of a long-shot, but this patch might change ... would either of both of you see if you can reproduce the bug with ... And you compile your own kernel. ...
    (Linux-Kernel)
  • Re: [BUG] 2.6.24.4 kernel bug while running ftest03
    ... I've just found that Nick has been recently fixing this function, ... yes it looks like an earlier oops but that should be fixed in ... Is the problem easy to reproduce (preferably without the gov patch ... I couldn't reproduce the bug with ftest03. ...
    (Linux-Kernel)
  • Re: [BUG] 2.6.24.4 kernel bug while running ftest03
    ... I've just found that Nick has been recently fixing this function, ... yes it looks like an earlier oops but that should be fixed in ... Is the problem easy to reproduce (preferably without the gov patch ... I couldn't reproduce the bug with ftest03. ...
    (Linux-Kernel)
  • Re: oops on 2.6.19-rc6-mm2: deref of 0x28 at permission+0x7
    ... the oops is fully reproducible. ... It is a bit of a long-shot, but this patch might change things. ... Jiri and Jiri: would either of both of you see if you can reproduce ...
    (Linux-Kernel)