Re: linux hard drive failed, clicking on bootup
- From: "Michael Paoli" <michael1cat@xxxxxxxxx>
- Date: 7 Feb 2007 01:49:47 -0800
Many good points have already been covered in other reply posts. I'll
try to at least mostly touch on some other points of note which I
didn't see as being mentioned, or may have been just covered lightly
or not addressed some point(s) I wanted to make.
Mostly some suggestions on how to better handle such a scenario -
including things which are no longer feasible.
Jeff wrote:
this. Problem started when I reboot my machine after a simple video
card swap.
Had some error and it appeared the Inode got messed up. Typically
A) Backup, backup, backup
B) at least occasionally test/verify backups
C) off-site backups
D) don't forget important meta-data (e.g. enough information so that
you could completely rebuild a replacement system, potentially on
fairly differing hardware, with nothing more than one's off-site
backups and the data/documentation stored with them).
E) hard drives do and will fail; this can mostly only be predicted
statisticly; most notably any drive can fail at any time (with or
without "reason" or "cause" - at least in terms of individual disk
failure being "preventable", for practical purposes).
F) backups fail too - have enough redundancy in media such that some
reasonably expectable failure rate (e.g. 1.5% of media - larger
percentages for media that was last written a fairly long while
ago).
G) the order of the above is somewhat arbitrary (it's not necessarily
strictly by importance or dependencies).
"Inode got messed up."? Why do you suspect inode? Just one? That
doesn't sound likely to be consistent with other symptom(s) you
describe, most notably "hard drive failed, clicking". There may be
problem(s) reading inode(s) and other data if there's a hard drive
problem, however, but a read (or write) problem would be more
generally indicative of the type of problem the operating system would
be having.
Linux is good
at fixing it so I just reboot, but it wouldnt reboot. Next I
LINUX is good at many things, ... but it's rather limited in "fixing"
one's hardware, if the hardware is broken/defective (though it can
work around some known bugs, support some updates of firmware and
such, etc.)
If at all feasible, look at diagnostics first - and save/capture them
if feasible. Knowing more about the nature of the failure will
generally make it more feasible to better isolate the problem, and
also take steps that are more likely to lead to a successful recovery
(at least to the extent feasible).
If one's rather uncertain as to how to proceed, or what may be the
best way to proceed, it may be highly useful to gather some "expert",
or at least relative expert opinions/advice before proceeding - at
least if one is rather to quite concerned about the data, and one can
take the time to get more advice/assistance.
installed fedora 6 on a new
500Gb drive and put the 'bad' drive as slave. After several attempts
Perhaps, but not necessarily - and probably not - the best move.
Adding new hardware, particularly when closely associated with failed
or suspect hardware may only complicate fault isolation. Such action
is essentially adding a new variable, before one's well isolated the
initial failure cause. In this particular scenario, a logical first
step, if one were inclined to make change on the hardware, would be to
reverse the video card changes that were done just before the failure.
If that happened to make all the problems go away, then it wasn't the
hard drive, was it? (unless the hard drive was flawed in a way that
the
flaw only showed up when the change in video hardware was done).
to boot up
(it wouldnt recognize the bad drive and not even boot), it finally
boot up. I ran
e2fsck -b (multiple of 16384) /dev/hdb2
and e2fsck happily started to find errors, which I said 'y' to about
100 fix,
clear and inode stuff. e2fsck ran for about 4 hours (160Gb drive).
That could be a very bad move. fsck (and e2fsck, etc.) fix logical
problems with filesystems. They don't correct hardware problems. The
actions of fsck and similar, are primarily to restore (if feasible for
fsck/e2fsck) logical consistency of the filesystem - it typically
isn't optimized for attempting to preserve as much of one's data as
possible on a flakey or failed/failing hard drive.
A typical better approach (at least in many cases) would be to:
* try a full read test of the filesystem, partition, volume, or full
hard drive, as relevant and appropriate. If zero errors are
encountered, that portion of the drive/data read is probably in
relatively good shape.
* if the above goes well, or one is curious or possibly wants to try
troubleshooting in a slightly different order, one may try doing an
fsck/e2fsck - but with the -n option, so no writing is done to the
filesystem device. If that gives no errors, or only some rather to
quite trivial errors, one's data may be in rather to quite good
shape. The fsck/e2fsck -n check, however, only checks filesystem
logical consistency - it doesn't read all the data of all the files,
so there may still be problems - and there may also be problems with
locations on the disk that don't presently hold data.
Note also that it's safe to do fsck/e2fsck -n on a mounted
filesystem - but if the filesystem is mounted rw, one may get
diagnostics of "problems" that aren't actually problems (fsck/e2fsck
presumes the filesystem isn't in use - if things are changing as it
passes over the data or the filesystem wasn't synced and marked
clean, fsck/e2fsck will generally find at least some "errors").
If the filesystem read test (reading the full device end-to-end) gave
no errors, in most cases, then proceeding to fsck/e2fsck would
generally be the next logical course of action. If feasible, it may
however be useful to copy the entire filesystem device elsewhere (and
not to the same disk if the disk is potentially still suspect), and
work on that copy, rather than the original. If one hasn't already
done an fsck/e2fsck -n, it's probably advisable to do that first -
that will generally give one an overview of "how bad it looks". When
then proceeding to use fsck/e2fsck without the -n option, it's
typically advisable to include the -y option. Unless one is rather to
quite familiar with the filesystem details, one typically doesn't want
to be manually picking between y and n on all the questions one would
be prompted for if neither the -n nor the -y option were specified.
Also, before using fsck without the -n option on the filesystem - or
at least on the original filesystem, if feasible, it may be desirable
to mount the filesystem read-only, and copy the data via something
that reads the files on the filesystem (e.g. tar, cpio, etc.). In
some/many cases that may get you much of your data. In many cases, if
there are problems reading parts of the drive (e.g. specific blocks),
such an approach will often give you diagnostics about the particular
file(s) that are having problems - you may not be able to get those,
or all of those, but you might be able to get everything, or most
everything else. Note also, however, if there are problems with
directories, you may fail to get some or all of the contents,
recursively, of those directories. This can also be tried both before
an fsck/e2fsck without the -n option, and also after (the two results
may be quite similar to identical, or may be quite substantially
different - either one may provide better results, and which one is
"better" may even vary depending on particular file(s)/directories on
the filesystem.
It didnt
appear to finish and seemed like it was hung. The /var/log/messages
file
had a ton of errors for /dev/hdb which I figured were normal
considering its
bad shape. I reboot the system figuring I would re-run e2fsck.....
All that would typically be the case with hardware problems on the
hard drive. The I/O errors can greatly slow down, or even hang
fsck/e2fsck. Diagnostics on the filesystem device (e.g. /dev/hdb)
would typically be indicative of hardware read/write problems, rather
than fsck/e2fsck diagnostics (though it may also complain).
The good drive is /dev/hda and the bad one is /dev/hdb...so on reboot,
all I get is about 5-10 repeated clicking sounds on the bad drive and
it
either wont boot at all with the bad drive on the IDE bus OR it boot
once
but no longer can see /dev/hdb at all.
Likely a hardware problem, and probably a failed drive, but if you
haven't already, you might want to consider some of the hardware
troubleshooting / problem isolation steps I mentioned earlier (and
others have mentioned some similar approaches and areas of
consideration).
I have some very important stuff Id like to recover.
See my comments further above about "expert", "advice" and "concerned
about the data".
If the data on the hard drive is quite important/valuable, it may be
advisable to use, or consider using, data recovery services -
particularly if there's critical/important data that can't be
recovered from backup(s) and/or recreated. Note however that the
probability of successful recovery and amounts of data thereof, may be
quite substantially lower if one did write operations (e.g.
fsck/e2fsck without the -n option, remounting rw, etc.) to the drive
after the first indications of problems), and the costs for such
recovery or attempts thereof, may be significantly higher.
Suggestions (other than the obvious, give up and chaulk it up to
experience)? :)
0) Read this post, or similar materials, before taking the actions one
did before making the original post.
1) [re]read the information in this post (and other similarly
available information). At least some/much of that
advice/information is still applicable or at least partly
applicable even after what your hard drive has thus far been
through.
2) For drives with stiction problems, some other more specific
techniques might be useful, but your description doesn't sound like
a stiction problem (sounds like one or more times your drive still
spun up again, but still had other serious issues). (Of 4 drives I
encountered stiction problems on several months ago, I was able to
get 2 of those 4 drives successfully working again ... at least
long enough to rather securely wipe their data, anyway.)
.
- References:
- linux hard drive failed, clicking on bootup
- From: Jeff
- linux hard drive failed, clicking on bootup
- Prev by Date: Re: Thunderbird and line breaks on replies
- Next by Date: Help installing bootsplash
- Previous by thread: Re: linux hard drive failed, clicking on bootup
- Next by thread: How can I setup a duolicate boot drive
- Index(es):
Relevant Pages
|