Re: [RFD] Incremental fsck



Andi Kleen wrote:
Theodore Tso <tytso@xxxxxxx> writes:
Now, there are good reasons for doing periodic checks every N mounts
and after M months. And it has to do with PC class hardware. (Ted's
aphorism: "PC class hardware is cr*p").

If these reasons are good ones (some skepticism here) then the correct
way to really handle this would be to do regular background scrubbing
during runtime; ideally with metadata checksums so that you can actually
detect all corruption.

But since fsck is so slow and disks are so big this whole thing
is a ticking time bomb now. e.g. it is not uncommon to require tens
of minutes or even hours of fsck time and some server that reboots
only every few months will eat that when it happens to reboot.
This means you get a quite long downtime.

Has there been some thought about an incremental fsck?

While an _incremental_ fsck isn't so easy for existing filesystem types,
what is pretty easy to automate is making a read-only snapshot of a
filesystem via LVM/DM and then running e2fsck against that. The kernel
and filesystem have hooks to flush the changes from cache and make the
on-disk state consistent.

You can then set the the ext[234] superblock mount count and last check
time via tune2fs if all is well, or schedule an outage if there are
inconsistencies found.

There is a copy of this script at:
http://osdir.com/ml/linux.lvm.devel/2003-04/msg00001.html

Note that it might need some tweaks to run with DM/LVM2 commands/output,
but is mostly what is needed.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: fsckd - FIXED
    ... filesystem errors on boot, with the "hit control-D to continue or give root ... to fix the errors with fsck. ... filesystem and inability to reboot. ...
    (Debian-User)
  • [SLE] How to set a reiserfs partition fsck occasionally?
    ... my experience is that they can't survive forever in the real world ... faults that persisted across reboots. ... that any filesystem can avoid bitrot forever without an fsck. ...
    (SuSE)
  • Re: Ignorant user overfilled /usr; strange errors followed. Fixed with fsck -y, but what exactly
    ... While attempting to follow filesystem activity (and not realising the ... X/KDE, and found a whole sequence of 'filesystem is full' errors on ... Don't run fsck on a live filesystem; ... filesystem that apparently had no free blocks. ...
    (freebsd-questions)
  • Re: Fw: Help:FW Harddisk has no space!
    ... good to hear what you found using fsck. ... reboot the system after the command. ... > Filesystem 1K-blocks Used Avail Capacity ... > tem on which the correction will take place, ...
    (freebsd-questions)
  • 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
    ... I just tried the 2.6.22-r1 candidate to test whether some bug I have ... schedules on my filesystem after 38 clean mounts. ... The fsck found some unused inodes, ... HTREE directory inode 1163319 has an invalid root node. ...
    (Linux-Kernel)