Oopses / ReiserFS superblock corruption with 2.6.9

From: Marek Szuba (scriptkiddie_at_wp.pl)
Date: 11/16/04

  • Next message: Nick Piggin: "Re: [patch] scheduler: rebalance_tick interval update"
    Date:	Tue, 16 Nov 2004 02:11:00 +0100
    To: linux-kernel@vger.kernel.org
    
    
    

    Hello again,

    During two weeks of running 2.6.9 I was hit by two oopses, one of them
    with rather annoying and potentially disastrous consequences. I have to
    say I'm rather disappointed with this, having not seen an oops for
    almost a year... Anyhow, here's what happened:

    1. The first oops occurred when I attempted to log in under X (X.org
    6.8). The WM (blackbox) started successfully, but when Esetroot tried to
    place the background image in place, the X server crashed and I got
    returned to the wdm prompt - a new one though, as it was located on the
    8th console rather than on the 7th. The problem was reproducible and
    only went away after I'd rebooted the box. Here is the error message:

    Unable to handle kernel paging request at virtual address 02014742
     printing eip:
    c0164823
    *pde = 00000000
    Oops: 0000 [#1]
    PREEMPT
    Modules linked in: mga parport_pc lp parport ohci1394 ieee1394
    emu10k1_gp snd_emu10k1 snd_rawmidi snd_pcm snd_timer snd_seq_device
    snd_ac97_codec snd_page_alloc snd_util_mem snd_hwdep snd soundcore
    hpt366 wacom joydev usbhid uhci_hcd usbcore evdev 8139too mii crc32
    ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables w83781d
    eeprom i2c_sensor i2c_isa i2c_piix4 i2c_core analog gameport rtc
    nls_iso8859_2 nls_cp852 vfat fat nls_base
    CPU: 0
    EIP: 0060:[poll_freewait+35/80] Not tainted VLI
    EFLAGS: 00013212 (2.6.9)
    EIP is at poll_freewait+0x23/0x50
    eax: 00000000 ebx: 0201472a ecx: c14fb260 edx: c150f1f8
    esi: 000f0008 edi: 000f0000 ebp: 00020000 esp: ecf69ee0
    ds: 007b es: 007b ss: 0068
    Process X (pid: 1958, threadinfo=ecf68000 task=ef55faa0)
    Stack: 00000000 00000000 00000012 c0164bcf ecf69f40 00000000 00000000
    00000000
           00020000 00000345 0003f80a 00000000 00000000 0003f80a ecf68000
    ef7f7124
           ef7f7104 ef7f70e4 ef7f7184 ef7f7164 ef7f7144 0001c7d3 00000001
    00000000
    Call Trace:
     [do_select+431/720] do_select+0x1af/0x2d0
     [__pollwait+0/208] __pollwait+0x0/0xd0
     [sys_select+763/1328] sys_select+0x2fb/0x530
     [syscall_call+7/11] syscall_call+0x7/0xb
    Code: c3 8d b4 26 00 00 00 00 57 56 53 8b 44 24 10 8b 78 04 85 ff 74 3a
    89 f6 8b 5f 04 8d 77 08 8d 76 00 8d bc 27 00 00 00 00 83 eb 1c <8b> 43
    18 8d 53 04 e8 42 18 fb ff 8b 03 e8 cb d9 fe ff 39 f3 77

    2. The second error manifested itself in that I couldn't get any
    programs to run all of a sudden. While at first there were only ReiserFS
    warnings on the debug console, eventually one of the programs (a shell
    script, to be exact) threw a "kernel BUG" error in preempt and requested
    me to reboot. The relevant snipped of the log is attached to this
    message in bzip2-compressed form to conserve bandwidth.

    After the reboot things got even more interesting! The system would go
    through almost the whole booting procedure only to generate a
    ReiserFS-related oops (which I cannot quote because it didn't get logged
    anywhere), hang and become completely unresponsive the moment it tried
    to access the gpm executable, located on the same partition the
    aforementioned filesystem warnings referred to; again the problem was
    reproducible, but didn't go away after shutting the machine down for
    some time.

    Having launched memtest on the machine to check if the memory chips I
    had installed a few weeks earlier weren't the source of the problem
    despite having been thoroughly tested with that tool just after the
    installation (and once again they came out clean), I brought the system
    up with a rescue disc and launched reiserfsck on all partitions. Hda6
    came out clean! No such luck with the root partition though - I got told
    immediately that the superblock is corrupted, of which I got a quick
    glance by running df (I only roughly counted the digits, but even so I
    could see my 2 GB on that partition magically expanded into at least
    *tens of terabytes*)... Luckily it seems the data itself was intact as
    everything went back to seemingly normal after rebuilding the
    superblock, not to mention I was able to tar the contents of the whole
    partition in the first place.

    All said and written, I went back to 2.6.7 for the time being - its
    security flaws don't bother me much (the important boxes still run 2.4,
    just in case) and it was the last 2.6 kernel which didn't offer me
    unwanted surprises once in a while. To think of such things happening in
    a supposedly stable kernel... tsk, tsk! Of course I am aware that 2.4.9
    is said to have been a much greater mess; still, somehow the last
    problems I had with that branch were before 2.4.0 (even the famous
    "don't use" release managed to run on one of my boxes for a full day,
    not to mention compiling the next version) and I constantly get hit with
    2.6.x problems (AFAIR the first version I was actually able to use on a
    regular basis was 2.6.4).

    Anyway, hopefully the information I've provided will be useful in
    debugging the problem. If you need any more data, please let me know.

    Best regards,
    Marek

    
    

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/



  • Next message: Nick Piggin: "Re: [patch] scheduler: rebalance_tick interval update"

    Relevant Pages

    • Re: Losing settings as I fumble thru partitioning
      ... where you can select the Partition Manager. ... freespace, and hit Paste. ... you will notice four entries in the MBR. ... set up some new items in "Documents and Settings," and each new one lost the ...
      (microsoft.public.windowsxp.setup_deployment)
    • Re: Could swap space be shared among different linux distributions?
      ... drive, /dev/hda1 for the first partition on it, /dev/hda2 for the second ... different distributions will have *WILDLY* ... since it is never a disk partition of its own even ... *IS* a kernel trick that provides hooks to read to or from the kernel. ...
      (comp.os.linux.setup)
    • Re: win-motherboard
      ... > install Master Booter and it would not install. ... This sounds like you have been able to load the kernel, ... the partition containing /etc/inittab and /sbin/init). ... Then there is the partition containing the root file system. ...
      (comp.os.linux.setup)
    • Re: Restoring HDIO_GETGEO semantics for 2.6 (was: Re: [RFC] Restoring HDIO_GETGEO semantics)
      ... If an installer sees a partition with swap signature, then that is swap space, ... Low probability bugs. ... the kernel does not need the result of this guesswork any longer. ... and leaves the guessing to user space. ...
      (Linux-Kernel)
    • Re: Newbie problems galore
      ... >> install it. ... >> Official kernel on the CD ... I went this route too when I first stuck my toes into Debian waters and what I ... First set up with a 2.4 kernel (with vfat support for your shared partition). ...
      (Debian-User)