Re: DANGER!!! Problems with 10.04 installer (RAID devices *will* get corrupted)



On 04/20/2010 10:30 PM, Alvin Thompson wrote:
Long story short: the only way to be safe right now is to physically
remove drives with important data during the install.

I figured out the cause of my RAID problems, and it's a problem with
ubuntu's installer. This will cost people their data if not fixed.
Sorry about the length of this post, but the problem takes a while to
explain.

The following scenario is not the only way your partitions can get
hosed. I simply use it because it's a common use case, it illustrates
what data is where on the hard drives, and it exposes the flaws in the
installer's logic. It also doesn't matter if you don't touch a
particular drive, partition, or file system during the install. The
data on it can still be corrupted.

Suppose you have a hard drive with some partitions on it. On one of
those partitions you have a linux file system which houses your data.
We'll say for the sake of this discussion that sda2 contains an EXT4
file system with your data. So far, so good.

Because this data is too important to rely on a single drive, you decide
to buy some more drives and make a RAID 5 device. You buy 3 more drives
and create similar partitions an them (say, sdb2, sdc2, and sdd2). You
copy the data currently on sda2 somewhere safe, then you use mdadm to
create a RAID5 array with sda2, sdb2, sdc2, and sdd2. The new RAID
device is md0. You create an XFS file system on md0 and move your data
to it*. This is all perfectly fine, but the stage has been set for
disaster with the ubuntu installer.

Later, you decide to do a clean install of ubuntu on sda1 (sda1 is *not*
part of the RAID array), and you get to the partitioning stage and
select manual partitioning. This is where things get really ugly really
fast.

The bug is how the installer detects existing file systems. It simply
reads the raw data in a partition to see if the bits it finds correspond
to a known file system. In the above example, the installer detects the
remnants of the original (non-RAID) file system on sda2 and thinks it's
a current EXT4 file system. Even if you use fdisk to mark sda2's
partition type as 'RAID autodetect' instead of 'linux' (which is no
longer necessary), the installer still detects the partition as having
an EXT4 file system.

Once this 'ghost' file system is detected, the installer gets really
confused about what goes where and will try to write to sda2 during the
install, even if you told the installer to ignore sda2 and just install
to sda1. This corrupts the current XFS file system on md0, and you're
screwed.

The overall flaw here is in the file system detection; you can't just
assume that any sequence of bits you find sitting around on a hard drive
are still current.

A possible solution may be to first check for a RAID superblock, and if
found that trumps all file system detection. I imagine something
similar will have to be done with partitions that are part of an LVM
volume as well.

-Alvin

* In my case, I took a shortcut and created a degraded array (missing
sda2), copied the data from sda2 to the array, added sda2 to the array,
and resynched. I don't think it makes a difference.


This is not a bug I think. You had just changed from a standard
single hard drive to a raid system because your data is so important.
Let me suggest this:

1. Go back to one hard drive.
2. Back up your important data. I use rsync and it works fine.
3. Now load 10.04 and it should be fine.
4. Your raid5 problems are typical.

73 Karl


--

Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D



--
ubuntu-users mailing list
ubuntu-users@xxxxxxxxxxxxxxxx
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users



Relevant Pages

  • Re: SATA RAID-1 redundant boot?
    ... > so that either can boot the system? ... If you set the raid partitions up during the install and make ... of setting up the MBR on both drives. ... I have seen the installer do some really stupid things ...
    (Fedora)
  • Re: partition second hard drive
    ... >> the new installer. ... I tried cfdisk but linux cannot see the ... his dmesg output shows both drives being recognised. ... > that created them - that is, using Windows FDISK to remove the partitions on ...
    (Debian-User)
  • Re: 6.1 install problems with creating partitions
    ... partitions and slices on that drive. ... "unable to make new root file system on ad3sa1" ... As long as you have essentially wiped the drive already, you can use the FreeBSD installer's "Auto Defaults" option to get a look at what the installer is expecting you to do, and then tune that as desired. ...
    (freebsd-questions)
  • Re: Upgrade Time
    ... on one of your different partitions. ... I also have a question about the file system. ... Both drives are using the FAT32 system that ME requires. ... > information that came with the upgrade states that I can convert to NTFS ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: [F9] laptop not booting - reinstall MBR?
    ... yes a different reason for grub to fail -- the file system containing its files cannot be mounted. ... Probing devices to guess BIOS drives. ... There's /, /home, and swap. ... a fsck of /boot and / partitions MAY fix it. ...
    (Fedora)