Bug+fix: PDC20271 RAID detection fails

From: Frantisek Rysanek (Frantisek.Rysanek_at_post.cz)
Date: 09/08/03


Date: Mon, 8 Sep 2003 15:43:33 +0000 (UTC)

Dear everyone and Mr. Van de Ven in particular,

I hope I'm not mailbombing wrong recipients in the
first place. If I do, I apologize.

I was having a problem with my Promise FastTrak TX2000
(AKA PDC20271). My array was not detected by my kernel.
It appears that this problem has been plagueing many
other users of this "RAID" controller.
The kernel versions I tested range from 2.4.18-14
(RedHat) through 2.4.21 and 2.4.22 to 2.4.22-ac1.
All of these suffer the problem.

Owing largely to the simplicity and good coding style
of the pdcraid.c driver, I was able to find a remedy
myself - after a day of adding instrumentation watches
in the detection/init functions. See the attached patch.
It really took a single line of code to repair the bug.

It seems that Mr. Van de Ven already had to contend
the fact that, with modern IDE drives,

 C*H*S != capacity

- the total number of sectors reported by the disk is
not equal to the multiplication product of
Cyls * Heads * Sectors_per_track.
The CHS parameters are nowadays just a shim over LBA
for backwards compatibility.

Which is a bit of a problem when you need to find
the PDC RAID superblock, that is located at the start
of the last track on the disk.

Mr. Van de Ven's approach was to multiply C*H*S and
consider that a valid capacity - and derive the position
of the superblock from that.

My controller (chip revision #2) calculates the position
of the superblock relative to the LBA capacity value
reported by the disk, which is larger than C*H*S.

I do not know whether this behavior is specific to some
new Promise controller/firmware models, or if the bug
just persists right from the start.

I am aware that my "algorithm" may have flaws too.
E.g., I'm not sure if the LBA capacity reported by the disk
is a multiple of track size and if not, how should the tracks
be aligned.

I am also aware of some other deficiencies that might
prove troublesome:

1) the initialization routine in pdcraid.c searches drives
   on all IDE channels in the system, not just on the Promise
   controller. This might lead to misdetections.
   Perhaps the detection routine should start from pci
   or from the pdc202xx*.c devices, looking for PDC adapters
   and their respective IDE channels and disk drives.

2) when loading the superblock, the init routine rounds down
   the LBA offset of the superblock to an integer multiple
   of eight - and goes on to load the superblock as part of
   a 4kB block, addressing the block by its ordinal number.
   Now if the sector-based LBA offset was not a multiple
   of eight, the offset would once again be skewed and detection
   of the array would likely fail.
   Given that the heads and sectors counts are often odd numbers,
   the offset is also quite likely to be an odd number too - a far
   cry from a multiple of eight...
   ==> the loading of the superblock should be done using block
   size of 1 sector (512 bytes), IMO it only means adding a for()
   cycle to iterate over the eight sectors and do the memcpy()'s.

3) There are finally some major improvements to Configure.help
   in 2.4.22-ac1. Still, it lacks some important points in the
   section on CONFIG_PDC202XX_FORCE
   Judging by the contents of drivers/ide/pci/pdc202xx_*.h,
   this option is only effective with the following chips:
     PDC20246,62,63,65,67,70,76
   It has no effect on
     PDC20268,69,71,75,77
   So in my case, it would've saved me quite some time
   if I knew in advance that this is certainly not the culprit
   of my problem.

4) fdisk complains about not being able to re-read partition table
   (ioctl error #22) - you have to reboot to be able to mkfs.

So now I'm able to detect, mount and use the array.
I've tested that with two different pairs of IDE drives.

I have other rants beyond those above.
Please don't feel offended - the following notes
are meant to be friendly and motivating. Perhaps
I could try something myself, right?

To the point: if I impair a disk drive at runtime
(by powering it off in its hot-swap drawer),
the machine gets stuck for a moment, and then
collapses in a waterfall of I/O error messages
from the relevant IDE channel.

If I try to boot from the incomplete array,
the kernel dumps core while probing the array
- the message about null pointer dereference comes
from the ataraid.c driver.

The only way to revive the array and continue using
it would be to enter the BIOS util (or Windows) and
run a rebuild. And then fsck.
Which renders this RAID pretty useless for production
deployment.
I haven't noticed any user-space RAID management
utils, not even /proc entries to look at,
no recognition of the array's status - am I looking
at the wrong places? Any ideas are welcome.

If you feel that perhaps I could do something about
the detection routine, documentation etc., please
encourage me.

Oh I'd forget: to all the Linux developers involved: thanks
for the great job that you're doing. I mean this.

Yours faithfully

Frank Rysanek

########## BEGIN PDC20271.patch ############

--- pdcraid.bak 2003-09-08 07:37:03.000000000 +0200
+++ pdcraid.c 2003-09-08 13:21:10.000000000 +0200
@@ -399,15 +399,56 @@
                 return 0;
         
         
- /* first sector of the last cluster */
+ /* first sector of the last track */
         if (ideinfo->head==0)
                 return 0;
         if (ideinfo->sect==0)
                 return 0;
+ /*
+ * A comment by Frank Rysanek <Frantisek.Rysanek@post.cz>
+ *
+ * struct ide_drive_t contains a lot of information, namely:
+ * cyl = number of cylinders in this disk drive
+ * head = number of heads
+ * sect = sectors per "track" (per head per cylinder)
+ * capacity = total disk capacity in sectors.
+ *
+ * Common sense says that capacity = C*H*S.
+ * Based on my recent experience debugging this driver,
+ * this is not necessarily true!
+ *
+ * It seems that my drives have a few tracks in
+ * excess of the C*H*S integer multiplication product:
+ *
+ * capacity > C*H*S
+ *
+ * It's a well known fact that modern IDE drives translate/hide
+ * the real geometry and prefer to work with LBA-style linear
+ * addressing, and only present some fictious geometry that
+ * fits the old BIOS data type limitations.
+ *
+ * When calculating the LBA offset of the PDC superblock,
+ * the original formula used by this driver derives the cyls
+ * from capacity and H*S, then produces an integer C*H*S,
+ * and subtracts one track.
+ *
+ * For my drives at least, I had to change this: now the
+ * driver accepts the miraculous reported capacity (greater
+ * than CHS) and subtracts one track.
+ *
+ * The algorithm could be different with different models of
+ * the Promise hardware, or even with different firmware revisions.
+ *
+ * This is the original formula:
+
         lba = (ideinfo->capacity / (ideinfo->head*ideinfo->sect));
         lba = lba * (ideinfo->head*ideinfo->sect);
         lba = lba - ideinfo->sect;
 
+ * And this is the repaired one:
+ */
+ lba = ideinfo->capacity - ideinfo->sect;
+
         return lba;
 }
 
@@ -425,7 +466,7 @@
         
         /*
          * Calculate the position of the superblock,
- * it's at first sector of the last cylinder
+ * it's at first sector of the last track
          */
         sb_offset = calc_pdcblock_offset(major,minor)/8;
         /* The /8 transforms sectors into 4Kb blocks */

########## END PDC20271.patch ############

  
########## BEGIN PDC20271.history ############

       #######################################
         Genealogy of kernel support for the
              Promise FastTrak TX2000
                   AKA PDC20271
       #######################################

This listing is admittedly far from complete.

The file locations are relative to $KERNEL_DIR/drivers/ide/

   2.4.18 (vanilla) 20271 added 2.4.18-14 (RedHat)
   ================ --------> =================
no support for PDC20271 does support PDC20271
pdc202xx.c pdc202xx.c
pdcraid.c pdcraid.c

      |
      | some changes
      v
                        major
   2.4.20 (vanilla) changes 2.4.20-20.9 (RedHat)
   ================ - - - - > =================
no support for PDC20271 does support PDC20271
pdc202xx.c pci/pdc202xx_new.c
pdcraid.c raid/pdcraid.c
                                            ^
      : major IDENTICAL |
      : changes +-------------------------+
      : |
      v v
                  
                        minor
   2.4.21 (vanilla) changes 2.4.21-ac4
   ================ -------> =================
does support PDC20271 does support PDC20271
pci/pdc202xx_new.c pci/pdc202xx_new.c
raid/pdcraid.c raid/pdcraid.c

                                            ^
      | minor IDENTICAL |
      | changes +-------------------------+
      | |
      v v
                  
                        improved
   2.4.22 (vanilla) documentation 2.4.22-ac1
   ================ -------> =================
does support PDC20271 does support PDC20271
pci/pdc202xx_new.c pci/pdc202xx_new.c
raid/pdcraid.c raid/pdcraid.c

########## END PDC20271.history ############



Relevant Pages

  • Re: disk recovery help
    ... one obvious way is to change the array layout. ... >the end of the disk, then zeros it from the outside in. ... beginning of the array so the primary superblock should be the first ... would let you verify that the structure looked reasonably sane. ...
    (freebsd-hackers)
  • Re: libata: dma, io error messages
    ... > automatically assume the disk is bad, ... every time when resyncing the array, ... It would be nice if the Linux RAID superblock had per-drive ... UUIDs in addition to the global array UUID. ...
    (Linux-Kernel)
  • Re: Need feedback on the A5200 storage array....
    ... they don't have the money for a big Hitachi array or a fast FC array with ... Use RAID5 on that kind of hardware. ... ten years or so) that had internal RAID5 controllers. ... I can't simply yank a disk and read its ...
    (comp.unix.solaris)
  • Re: HP EVA4000 / IBM DS4300 / EMC CX3-20/40
    ... Both EMC and EVA are great arrays and they will serve you well. ... disk array with the virtual raidsets on top. ... So, the system admin, and the DBAs had to create and manage lots of ...
    (comp.arch.storage)
  • Re: RAID 5 corruption, RAID 1 more stable?
    ... corruption to either the RAID array itself or the file system. ... The disk array to suffer so many errors (for example disk errors ... There is nothing the disk array can do if the host is broken and ...
    (comp.arch.storage)