Re: trouble booting system with I2O hardware RAID

From: Aleksandar Milivojevic (amilivojevic_at_pbl.ca)
Date: 04/11/05

  • Next message: Peter Hutnick: "Installing VNC Without X"
    Date: Mon, 11 Apr 2005 14:00:36 -0500
    To: For users of Fedora Core releases <fedora-list@redhat.com>
    
    

    Aleksandar Milivojevic wrote:
    > Basically, install process seems to go fine, however the machine doesn't
    > wan't to boot after it.
    >
    > The system in question has one of I2O Adaptec RAID controllers. I've
    > configured LVM with one volume group and several volumes. If I boot
    > into the rescue mode, all looks fine and dandy. Anaconda finds the
    > installation, and I can access all volumes.
    >
    > However, when doing "real" boot, it gets into trouble. All required
    > modules are loaded from initrd image (as far as I can tell). The I2O
    > modules are able to locate the RAID devices (I see all partitions
    > reported: /dev/i2o/hda1 (empty, unused), /dev/i2o/hdb1 (/boot), and
    > /dev/i2o/hdb2 (rest of the system under LVM). The only thing different
    > from rescue mode is that i2o/hda and i2o/hdb are reversed (this is
    > strange, but it shouldn't affect things since /boot partition has a
    > label "/boot", and all the rest is under LVM, so everything should be
    > device name independent). I have no idea why i2o device drivers are
    > detecting volumes in different order when loaded from initrd image
    > during boot, and by Anaconda during installation.
    >
    > The last couple of messages printed on the screen are:
    >
    > Creating root device
    > Mounting root file system
    > kjournald starting. Commit interval 5 seconds
    > EXT3-fs: mounted filesystem with ordered data mode.
    > mount: error 2 mounting none
    > Switching to new root
    > WARNING: can't access (null)
    > exec of init ((null)) failed!!!: 14
    > umount /initrd/dev failed: 2
    > Kernel panic - not syncing: Attempted to kill init!

    Ah, found it... I was bitten by that nonsense called file system
    labels. Again. And it even might be that LVM volume information was
    also read from the wrong place. The problem isn't I2O related, and can
    probably happen with any other hardware configuration.

    I'll summarize, so that folks with similar problems in the future know
    what to do.

    Configuration:

    I2O RAID controller with two volumes. First RAID volume is used for the
    system. Second RAID volume is used for some data storage. Since kernel
    assigns them different device names during installation, and when the
    system is booted from the disk after installation, I'll call them
    "system RAID volume" and "data RAID volume". When I reference device
    names, it is just a reference as what name system saw them in particular
    step.

    During installation, i2o device drivers report the volumes in expected
    order. /dev/i2o/hda is the system RAID volume, /dev/i2o/hdb is the data
    RAID volume. Exactly the order they are defined in I2O BIOS. hdb is
    not touched by installation process and it contained single partition
    hdb1. /boot is installed on hda1 and "/boot" file system label written
    onto it. hda2 is configured as LVM physical volume with the rest of the
    system (including root partition).

    After the installation is done, and system reboots, for whatever strange
    reason data RAID volume is detected as /dev/i2o/hda, and system RAID
    volume as /dev/i2o/hdb. This should theoretically work fine since
    device names are never used as-is in system's configuration. However,
    the disks in data RAID volume were previously used (they were not
    clean), and since system detected them first, this was the root of the
    problem. It seems that those disks had (once apon a time) system on
    them, and set of LVM volumes defined, so that was used instead of the
    "real" information from first RAID volume. I'm not sure if disks were
    used connected to this I2O controller, or if they were used somewhere
    else and it just appeared that this information fell into the "right"
    spot when RAID volume was assembled.

    OK, so I wiped all partitions from data RAID volume. This time system
    actually boots (because it can see only partitions on system RAID volume
    that it detected as /dev/i2o/hdb, so it reads correct LVM information).
      But the story does not end here.

    I created single partition on data RAID volume (/dev/i2o/hda), defined
    it as LVM physical volume, and created new volume group with single
    logical volume on it. Created file system, mounted it, updated fstab.
    So far so good. Reboot. Ups, the system doesn't boot, and complains
    about duplicate "/boot" labels. Back into the rescue mode. And sure
    there it was. e2label reports that first partition on data RAID volume
    (which is of type LVM and contains LVM physical volume) and first
    partition on system RAID volume (which is of type Linux native and
    contains ext3 file system) both have label "/boot". Ooops.

    Apperently, Anaconda was smart enough to ignore the label on something
    that was not an file system. Whatever goes on during "real" boot wasn't
    that smart.

    Used e2label to wipe out the label from data RAID volume. This time
    system booted, no problems at all. For good measure I wiped out logical
    volume/group and physical volume from data RAID volume and recreated
    them (didn't wanted to risk e2label used on something that is not file
    system screw some metadata for LVM). All is happy now.

    It could have saved me tons of time and grief if Anaconda checked during
    install process (and detected) conflicting LVM information and
    conflicting file system labels. Or if file system labels were randomly
    generated (insted of using mount point names), like the labels usded by
    MD and LVM drivers.

    Hopefully this info will be usefull to somebody in the future.

    -- 
    Aleksandar Milivojevic <amilivojevic@pbl.ca>    Pollard Banknote Limited
    Systems Administrator                           1499 Buffalo Place
    Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7
    -- 
    fedora-list mailing list
    fedora-list@redhat.com
    To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list
    

  • Next message: Peter Hutnick: "Installing VNC Without X"

    Relevant Pages

    • Re: Windows 2000 server boot disk with network and RAID support
      ... > We have a Dell Power Edge 1600SC with PowerEdge Expandable RAID Controller ... > will now not boot in regular or safe mode. ... Boot off a floppy and change the security settings with either secedit ... Then repair the Windows 2000 installation. ...
      (microsoft.public.win2000.general)
    • RE: System blue screens and reboots endlessly under Windows XP
      ... You should have installed,1st intel chipset installation utility,then ... > * The boot order in the BIOS is the same as it has been since day one - ... > * There are no problems with the RAID array according to the controller. ...
      (microsoft.public.windowsxp.hardware)
    • Promise controller working, but how to boot from it?
      ... installation. ... how do I set up the system to boot from the raid? ... I rerun lilo to be installed in the boot sector of sda. ...
      (comp.os.linux.hardware)
    • Re: Registry error on a SATA Hard Drive
      ... You can Boot from the XP CD-ROM... ... installation it will ask for administrator password... ... If this doesn't work you can always make a slipstream Windows XP SP2... ... raid controller that need the floppy disk driver to be recognize by XP. ...
      (microsoft.public.windowsxp.hardware)
    • Re: ATARAID RH 7.3, need startup disk ... RH9?
      ... After checking around, I built a custom RH9 Boot ISO (with ATARAID, ... --Break the RAID, attempt to access w/o the RAID. ... saving cycles from doing other sophisticated tasks (like LVM). ...
      (linux.redhat)