Re: fsck'd
- From: "John Fleming" <wa9als@xxxxxxxxx>
- Date: Fri, 7 Mar 2008 20:38:07 -0500
On 3/7/08, Douglas A. Tutty <dtutty@xxxxxxxxxxxxx> wrote:
On Fri, Mar 07, 2008 at 08:29:08AM -0500, John Fleming wrote:
Backgroud - I had a well-established LAMP server that was giving some
filesystem errors on boot, with the "hit control-D to continue or give root
password to fix manually" message. It would go ahead and work normally if I
hit Control-D.
You should have fixed it manually. Control-D is in case the person
sitting there doesn't have the root password and is generic to Debian's
single-user mode.
However, I wanted to try to get rid of the error and need for human
intervention in the event of the need for a remote reboot, so I tried
to fix the errors with fsck. Somehow I ended up with a badly trashed
filesystem and inability to reboot.
After much knashing of teeth and consideration of my options, I
installed a new etch system. I installed several benign packages like
apache. I ran update and dist-upgrade to bring the system up to date.
When I ran the upgrade, it told me that it was trying to install an
identical kernel image.
Well, it was a new kernel image with the same version code so that the
new modules would be going in the same directory as the old modules.
The new modules won't work with the old kernel so that if you do
anything to trigger a module load, bad things can happen, which is why
you reboot as soon as the upgrade is complete.
It explained some things about what it was doing about modules, and
then said to be sure to reboot. I did that, but then it gave me that
now-familiar message about how the filesystem has errors, hit
control-D to continue...
I booted with Knoppix, made sure my filesystem on /dev/hda1 was NOT
mounted, and ran fsck -f. It did the 5 passes without mention of
errors. I ran it a second time with same results. However, when I
boot from /dev/hda1, I still get the error about a filesystem with
errors!
Trying to rebuild the server as it was is painful enough - Why would I
be having these filesystem errors? The HDD is relatively new.
Any other way to try to get rid of the boot error before I reinstall
etch again? I hate to do that because I don't understand how these
errors originate, so I don't know why I shouldn't expect them to crop
up again at some point later after another fresh install.
Why does the fsck during boot find errors when the fsck run via
knoppix on the same filesystem return clean?
Don't know why. Here's how I'd proceed:
1. boot with the kernel command line: init=/bin/sh since debian's
single-user mode gives you most filesystems already mounted.
2. run fsck (read the man page to give you the options appropriate
to your root fs); run it on all your filesystems.
3. shutdown -h and power-cycle.
4. run aptitude update then upgrade anything required.
5. reboot. Watch the screen for any errors on shutdown that would
suggest that the system isn't, e.g. remounting the / fs ro
before halt/reboot. If in doubt, set up a serial console and
log the output or set up the console output to go to a printer.
6. If you still have problems, boot knoppix (I use grml) and run
fsck. If this is ext2/3, I'd run -c -c so that the entire disk
gets read to force the drive firmware to re-map any bad sectors.
While this is running, I'd be watching /var/log/syslog for any
errors from the drive.
7. Ensure that you have SMARTmontools installed and run a long
smart test and when its complete, check the results on the
drive.
---
If all else fails, plan for a reinstall (ensure that you have backups).
Then boot knoppix and run wipe on the drive. This fully exercises the
drive to exorsize any gremlins. Then install etch minimal (don't select
any tasks), ensure that aptitude is installed (if it isn't, then apt-get
aptitude), get aptitude set up the way you like with only necessary
packages marked as manual, the rest as automatic, then do an update and
upgrade before you install any other packages.
At each stage, do a shutdown -rF.
Doug.
Doug, thanks for the good ideas - I learned some things from your
considered response. I ended up finally reinstalling etch again.
I've now captured the pertinent part of the boot messages and will
copy below. Why does the filesystem check clean once and then come up
with errors the 2nd time? You mentioned that I should fix it manually
- Well, if I enter the root password at the prompt and try to run fsck
manually, it warns me about the damage I might due to the MOUNTED
filesystem. I mentioned in my earlier post that if I boot into
Knoppix and run fsck, it comes back CLEAN. So I can't seem to repair
it with Knoppix fsck, yet I get the error when I boot from my
/dev/hda1 - the second time in the fsck sequence. Can you shed any
light on this?
Here is the pertinent boot sequence:
Checking root filesystem...fsck 1.40-WIP (14-Nov-2006)
/dev/hda1: clean, 126072/19218432 files, 1420493/38409399 blocks
done.
Setting up system clock..
Cleaning up ifupdown....
Loading kernel modules...loop: loaded (max 8 devices)
done.
Loading device-mapper supportdevice-mapper: ioctl: 4.7.0-ioctl
(2006-06-24) initialized: dm-devel@xxxxxxxxxx
Checking file systems...fsck 1.40-WIP (14-Nov-2006)
/ contains a file system with errors, check forced.
/:
Inodes that were part of a corrupted orphan linked list found.
/: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
fsck died with exit status 4
THANKS! - John
--
To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx
with a subject of "unsubscribe". Trouble? Contact listmaster@xxxxxxxxxxxxxxxx
- Follow-Ups:
- Re: fsck'd
- From: Tzafrir Cohen
- Re: fsck'd
- From: Douglas A. Tutty
- Re: fsck'd
- References:
- fsck'd
- From: John Fleming
- Re: fsck'd
- From: Douglas A. Tutty
- fsck'd
- Prev by Date: Re: [OT] Zip file browsing tool
- Next by Date: Re: fsck'd - FIXED
- Previous by thread: Re: fsck'd
- Next by thread: Re: fsck'd
- Index(es):
Relevant Pages
|