Re: [SLE] Problems with initrd after mkinitrd



On Wednesday 28 December 2005 04:50, Patrick Freeman wrote:
> Ok so I looked -- that is cylinder 2082 (sorry about being a dolt but I
> have a tendancy to *fuzz* up that which I don't absolutely need to know
> after I've checked it out

Not a problem, Patrick... I tend to priority-focus on minute details, too.

> I am very interested in what you had said in another post about C,H,S
> being defaulted in the BIOS...

Just to clarify what you've paraphrased, here: I actually only proposed it as
a possibility worth investigating. From the spec ***:

"4MB Flash ROM with AMI* BIOS, Multiboot BBS (BIOS Boot Specification)
[with] IDE drive auto-configure"

I've been tripped up by these built-in 'auto' IDE configuration utilities
before, specifically in the area of CHS<>LBA address translations. If it is
suspect, it deserves looking at if only to rule it out.

> I'm sorry -- I haven't been clear enough. The second flavor occurs on
> *all* drives after updates (where *all* means that the sample set can be
> of either type (fresh-install or imaged)

This contradicts facts that I believed we'd already established. It is a
proverbial "monkey wrench" that fundamentally changes the equation. :-/

I *thought* the purpose of dividing up your test systems into "cloned" vs.
"native installed" was to compare those susceptible to the 'flavor 2'
failures against the "healthy." Now, there is no "healthy!"

If *every* system, native installed and cloned, is susceptible to the post
mkinitrd boot failures, the fact that a drive started out as a clone might be
*exacerbating* the problem, somehow, but the cloning itself *can't* be the
root cause or even a prerequisite. This has two ramifications:

- it brings the possibility back to life that these drives (or the entire IDE
subsystems, for that matter) have an inherent but unidentified
susceptibility. IOW, all of the hardware related possibilities are back on
the table, unless and until specifically tested, vigorously, and ruled out.

- it also greatly increases the liklihood that the software you're compiling
and installing... or the process you're using to install it... is at fault.

The only obvious nexus I can see is your locally compiled driver. It *is* tied
into the *storage* subsystem (the point of failure,) isn't it?

Look, Patrick, I know it seems confusing when you can make the same changes on
many systems and have some succumb and others not, but my previous point
concerning "magnification" comes to play... these systems cannot be exactly
identical, or they'd all fall over or all run.

From that perspective, studying the differences between 'flavor 1' and 'flavor
2' boot failures is evasive because it leaves the major questions
unanswered...

Is it possible for you to just rip out the locally compiled driver, substitute
pieces of the storage subsystem as needed and run some trials? If the boot
failures disappear, you've at least isolated the problem. That is the first
step in identifying and fixing it. Alternatively, you could dual purpose
these trials as preliminary work towards migrating to less problematic
hardware... hopefully to components that won't need the custom driver.

> I have the used blocks ( -D ) output from debugreiserfs from both a
> working and a hung system. ...

I think comparing a healthy 'native' drive to one each of the drives that
don't boot would provide the most fruitful forensic data. Time to recruit the
assistance of a real filesystems expert, maybe even a hard drive engineer...

> ... But this still wouldn't answer your question, I think, since it
> wouldn't tell us if the BIOS can properly address and read those blocks.

In my mind, the liklihood that a BIOS setting or limitation is at fault has
greatly diminished. It would be nice to rule it out, but that is easy enough
to with the right knowledge... :-) See my comment, above.

OK, Patrick, that's the extent of my brain capacity on this problem at this
time. I'll keep abreast of your progress by following this thread, but unless
I see some additional and definitive clues, or maybe some real test results,
there isn't much more that I can add. Have fun and good luck!

regards,

- Carl

--
Check the headers for your unsubscription address
For additional commands send e-mail to suse-linux-e-help@xxxxxxxx
Also check the archives at http://lists.suse.com
Please read the FAQs: suse-linux-e-faq@xxxxxxxx