Re: Mild filesystem corruption on ext4 (no journal)



Aioanei Rares wrote:
Alan Jenkins wrote:
Hi,

I run ext4 without a journal on my cheap netbook with a 4 gig SSD. I suspect "without a journal" is significant, I don't think I'm doing anything else strange.

When I upgrade libc from 2.7 (debian stable) to 2.9 (debian unstable), the locale breaks every reboot, and I have to repair it by running locale-gen. This happened now when I only upgraded libc, in order to play with signalfd(). It also happened before, when I upgraded the entire machine to debian unstable (which I later reverted).

The problem is that /usr/lib/locale/locale-archive gets corrupted when I reboot. The exact corruption differs with each reboot (i.e. the md5sum differs). Last time, the first ~70K was overwritten with data from xorg.log and my web browsing history. I have copies of the original and corrupted state which I can send, the full file is 1.3 megs, but I can limit it to the first 70K, since that's all that was corrupted.

To try and rule out a faulty userspace program, I marked the file as read-only (chmod a-w) and immutable (chattr +i). After a reboot, the file was still read-only and immutable, yet it still became corrupted.

Also, I ran md5sum in the shutdown scripts, after mounting the root filesystem read-only (which is also preceeded by a sync in a different script). This showed that the file did not appear corrupted at this point. (Though maybe it was ok in page-cache, but corrupted on-disk).

The locale-archive file is read by the libc locale routines using mmap(). The mapping is read only and is not modified. It seems likely that some process has it mapped when the kernel shuts down.

I tried reproducing this by writting a minimal daemon which maps a copy of the locale-archive file, and starting it just before the filesystem is remounted read-only. It didn't work though; this copy of the locale-archive file remained uncorrupted.

I forced a fsck on boot, and the filesystem was reported to be clean. I am currently running with e2fsprogs v1.41.6 (from debian unstable), and a custom-built kernel, 2.6.30-rc7.

Thanks in advance!
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

I suspect, although I might be wrong, that this is not a kernel-related
problem.

"To try and rule out a faulty userspace program, I marked the file as read-only (chmod a-w) and immutable (chattr +i). After a reboot, the file was still read-only and immutable, yet it still became corrupted."

Since the immutable bit is not respected, I tend to think it is a kernel problem. Unless the filesystem isn't getting unmounted/flushed properly for some reason... but I thought the modern kernel had that covered.

I agree it is very suspicious this happens only after upgrading libc. I'll see if I can find an individual change in libc locale-handling that might trigger this.

Thanks
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Mild filesystem corruption on ext4 (no journal)
    ... the locale breaks every reboot, and I have to repair it by running locale-gen. ... The problem is that /usr/lib/locale/locale-archive gets corrupted when I reboot. ... The exact corruption differs with each reboot. ... The problem shows up after you unmount and remount the filesystem. ...
    (Linux-Kernel)
  • Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]
    ... > disk editor or even debug? ... kernel, but unfortunately I think my off-the-shelf solution, knoppix, won't ... the corruption happens on warm reboots, ...
    (Linux-Kernel)
  • Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...)
    ... Kernel crash in xfs_iflush_cluster (was Somebody take a look ... This is a web based game, wich generates a loooot of small files on the corrupted filesystem, and as far as i see, the corruption happens only @ writing, but not when reading. ... The stable server is not selected for replace. ...
    (Linux-Kernel)
  • Re: Opinions on new Fedora Core 2 install with LVM 2 and snapshots?
    ... There are some corruption issues with 3ware 9500's on Opteron boards. ... XFS is in the mainline kernel, and available in the Fedora 2.6 kernels. ... It seems that many of the "ext3 filesystem is corrupt" reports on ...
    (Fedora)
  • Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...)
    ... >Then I can can confirm whether there is corruption on disk or not. ... (Just a little note for remember: tuesday night, i have run the old 2.8.11 xfs_repair on the partiton wich was reported as corrupt by the kernel, but it was clean. ... Today night, when the traffic was in the low period, i have stopped the service, umount the partition, and repeat the xfs_repair on the previously reported partition on more ways. ... I don't think it is memory problem, ...
    (Linux-Kernel)