Re: Fedora Core 2/Windows XP dual boot: selecting Linux doesn't work

On Thu, 15 Dec 2005 09:03:57 +0100, <g.devries@xxxxxxxxxx> wrote:
(fd0) /dev/fd0
(hd0) /dev/hde
(hd1) /dev/hdc
* grub is installed on the /root partition, which is the first
partition on hdc: hdc1, or (hd1,0) in grub notation.
* I make sure that in grub.conf, I use the correct references. Also, I
get rid of the LABEL:
title Fedora Core
   root (hd1,0)
   kernel /vmlinuz-2.6.5-1.358 ro root=/dev/hdc2
   initrd /initrd-2.6.5-1.358.img
title Windows XP
   rootnoverify (hd0,0)
   chainloader +1
* I run "grub-install /dev/hdc1".
* I copy the boot record to a file: "dd if=/dev/hdc1 of=linuxboot.bin
bs=512 count=1".
* I boot to XP, put this linuxboot.bin on C: and add the line
"C:\linuxboot.bin="Linux" to boot.ini.

This all looks perfect to me.

Now when I boot the machine, and choose "Linux", I get a black screen
with the word "GRUB" in the upper left corner, followed by a blinking
cursor. Nothing else happens...

Ouuuch. :(

Putting out the string "GRUB " is about the first thing that stage1 does.
Stage1 is the unpatched 512-byte file, that is patched and copied to
the boot record during setup.

This seems to show that NTLOADER is doing the right thing, it is
loading the file and running its contents.  I would believe it loads
it the into the right place in memory too.  What could it be that went

The next thing stage1 does is to load a sector from a disk number and
address that has been patched into it. This should be the start of
stage1_5 (or stage2 if stage1_5 is not used).  How to find out that,
short off...

Hey, could you do "od -t x1 linuxboot.bin" ? I want the byte at
offset 0x40 into the file, that is 0100 octal. This byte is initialized
to  0xff in the stage1 file, but patched to 0x81 in your setup, to
say "we are booting from disk (hd1)."  Also, the four bytes starting
at offset 0x44 should be the (little-endian) 32-bit number indicating
what sector to read off said drive. It should point suitably into
the first partition. I would expect something like 64 (0x40),
since the first partition starts in sector 63 (1-based).

When running grub-install there is a --debug setting, which should
give fairly complete account of what it does.  I have not used it
much myself, so I don't know exactly how usefull the information is.
I hope you find the time to try it, and post the final part of the
output, so we could all have a chance to learn.  After all, you
don't reinstall grub in new configurations every day.


I saw the comment about Lilo "just works".  I haven't used lilo for
many years, so others should tell, but I just don't know how lilo
achieves to load from a disk without knowing in advance which.  Could
it be that it "just works" in the simple cases, where no "surprising"
disk enumerations occur?  I positively know that lilo too needs to
patch the boot sector it installs with disk addresses of sectors to
load, and I suppose it needs to patch the drive to read from too.
Lilo too depends on Bios calls to do the reading, so it needs to know
if it should pass "0x80" in register dl, or 0xd81, etc. "0x80"
invariably is the drive that has been designated as boot drive in the
bios setup.  The standard Microsoft MBR depends on that, so that is
nailed.  Traditionally it used to be the primary master, as there
used to be no setting in the Bios setup for this.  If there is no
primary master installed, the first disk found was used.  The search
order used to deend on how the partial Bioses on additional disk
controllers are chained together and with the main Bios.  The main
Bios searches the primary master, primary slave, secondary master,
secondary slave.  Since we got boot drive designations in the bios
setup, the designated disk is taken out of this order and placed
first.  Lilo and Grub cannot find out this, at least not in any way
I know about.

I believe having seen a Phoenix Bios specification saying that
the dl register should contain the boot drive when the boot code
is started, but the standard MS boot code discards that and loads
the "active flag" from the partition table into register dl. The
active flag is usually specified as 0x80.

In the present case, you _are_ booting from drive 0x80, but chaining
into something on 0x81. The NTLOADER does not know that, since it is
told to use a file, not a partition.

The more I think about it the more I suspect there is some kind of
operator error involved here, like using a different name when
saving the linuxboot.bin file the second time, and not adjusting
the boot.ini correspondingly.  Oh, there have been some use
cases on my own computers that I have not been able to resolve,
so I am not 100% knowledgeable. (But I know more now than I did
then :)