Re: Filesystem corrupts after power failure



Markvr wrote:
Aragorn wrote:
On Thursday 31 August 2006 16:32, Markvr stood up and addressed the masses
in /comp.os.linux.misc/ as follows...:

I've seen several linux boxes get their file system corrupted after a
sudden power failure. Usually the partitions and the data is still
there, but for whatever reason it won't mount and boot correctly and
I've had to end up rebuilding the box from scratch which is a pain in
the arse as they aren't always on site.

This is normal because the kernel keeps certain file handlers open during
normal operation. An unexpected shutdown due to a sudden loss of power
will therefore not synchronize the buffers with the disk and will not close
all files.

I'm not a linux expert, but it seems that sometimes the GRUB gets
stuffed, or the partitions lose their labels or whatever so they won't
mount properly.

GRUB requires a filesystem to read from, as opposed to LILO, which reads a
binary logical block address to find its second stage loader and the
kernels. Therefore, if the filesystem is corrupted, GRUB is clueless.

Linux seems to handle power failures far more poorly than Win 2K /XP as
I've know very few Windows boxes corrupt this much after a power failure.

That's because Windows already corrupts itself during normal operation. It
takes a power loss to obtain the same effect in GNU/Linux.

I've tried using the sync option in fstab but it slows the box down too
much (a simple file copy goes from 7sec to 1min23sec).

That all depends on what file you are trying to copy, and how your system is
laid out. A good server administrator also splits off several of the UNIX
directory structure onto separate partitions and has several of those
mounted read-only during normal operation.

We tend to use software raid if that has issues with power failures?

Possibly more so than a non-RAID set-up.

At the moment we're using ext3, is there a more reliable filesystem that
will handle power failures better?

No sufficiently advanced filesystem is intended to cope with power outages,
but I think /ext3/ may still be the most reliable in that respect. SGI's
XFS is also a very good filesystem on account of power outage risks, but
XFS caches aggressively and only commits the data to the physical disk at
the last moment to provide for the highest possible speed and efficiency.
Therefore a power failure will make XFS lose all your most recent - and
often not so recent; this depends on the kernel and XFS driver version -
data.

Would using "sync" for the /boot and / partitions stop it from
corrupting the tables?

It is always advised to use /sync/ on the root filesystem for safety
reasons. As */boot,* */usr* and */opt* should be mounted read-only during
normal operation, /sync/ is irrelevant there, but may be preferred
over /async/ whenever those filesystems do need being written to.

These aren't massive enterprise systems, they're mainly firewalls and
small mail servers, so is ReiserFS or any of the others better?

/reiserfs/ is known to sometimes suffer severe filesystem corruption at some
power loss occurrences - as opposed to XFS, which will only lose the data -
but in overall, it is very reliable.

Any advice is appreciated,

What you need is a UPS. Running a server without one is irresponsible. ;-)


Hi thanks for the advice. Yeah, hmmm UPS's! Most of these servers are on customers sites and they have a habit of pressing the wrong buttons etc or a power failure overnight will outlast the UPS.

Then what you do is get serial data from the UPS that you can use to perform an orderly shutdown.

At least a

sync; sync; halt; :-)


Or the backup power supply in a datacentre we co-lo in manages to fail ...outrageous!

The partitions are split out into /, /boot, /var, /home, /usr and /tmp.

I'll try sync'ing the / and /boot partitions and leave the others unsynced.


I would tend not to do that...just have a recovery CD that you can boot up a minimal kernel and FSCK the main drives with. You can probably make that something that lives permanently in the CD slot so it gets invoked whenever the thing reboots.

Hmm. I wonder if you could make the GRUB bit RO in some way also?
.


Quantcast