alim15x3, WDC and (U)DMA

From: cpd (cobber_cpd_at_yahoo.com)
Date: 08/08/03


Date: Fri, 08 Aug 2003 05:46:25 GMT

hi,

about 5 weeks ago i bought a WD2000JB
(http://www.wdc.com/en/products/products.asp?DriveID=38), installed it in my
RH 9 box and set up LVM (for the first time) on it. i've been wrestling with
it on-and-off ever since.

the problem is pretty simple: a hosed filesystem - details below. at the
moment the data i'm storing is expendable, which eases the pain somewhat and
it's allowed me to blow away the filesystem and recreate it when it's become
corrupt. i'm pretty sure i've nailed down the problem to the alim15x3 driver
for the IDE controller *possibly* in conjunction with a WDC drive using
(U)DMA. my knowledge of linux drivers is limited.

i've decided to do a condensed dump of all the notes i've made during the
process of trying to get this working reliably (please excuse the length and
potential irrelevance of some of the content) in the hope that someone has
some suggestions... and that future generations may learn something.

here's a quick overview of the hardware:

    Motherboard: Acer V58LA (http://www.uktsupport.co.uk/acer/mb/acv58la.htm)
    CPU: Pentium-MMX 233
    IDE Controller: Alladin IV (onboard)
    RAM: 384MB generic brand (256 + 128)
    Video: ATI Mach64 VT (onboard)
    LAN: DLink DFE-528TX+
    IDE1 Master: Seagate (oops, don't have the details with me)
    IDE1 Slave: Generic CD-ROM
    IDE2 Master: WDC2000JB
    IDE2 Slave: Seagate

if you would like to see the output of hdparm, the contents of /proc or
something else please email and specify the command.

i did have 3 Seagate's installed and working trouble free before i replaced
one with the WD.

    # mkfs -c (has never reported any bad blocks on the WD)

the maximum memory this motherboard supposedly supports is 256MB, however
memtest (http://www.memtest86.com) with all the extended tests activated
reported no errors and i've never had problems in the past.

because it's my first time using LVM, i initially thought that might have been
the cause of the problem. it's extremely doubtful that it is, but here's what
i've been doing to create my LV, just in case:

    # vi /etc/rc.sysinit (add vgscan, vgchange - once only)
    # vi /etc/rc.d/init.d/halt (add vgchange - once only)
    # fdisk -l /dev/hdc (set up the partition table)

    Disk /dev/hdc: 200.0 GB, 200049647616 bytes
    255 heads, 63 sectors/track, 24321 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

       Device Boot Start End Blocks Id System
    /dev/hdc1 1 3040 24418768+ 8e Linux LVM
    /dev/hdc2 3041 6080 24418800 8e Linux LVM
    /dev/hdc3 6081 9120 24418800 8e Linux LVM
    /dev/hdc4 9121 24321 122102032+ 5 Extended
    /dev/hdc5 9121 12160 24418768+ 8e Linux LVM
    /dev/hdc6 12161 15200 24418768+ 8e Linux LVM
    /dev/hdc7 15201 18240 24418768+ 8e Linux LVM
    /dev/hdc8 18241 21280 24418768+ 8e Linux LVM
    /dev/hdc9 21281 24321 24426801 8e Linux LVM

    # pvcreate -ff /dev/hdc1 /dev/hdc2 /dev/hdc3 /dev/hdc5 /dev/hdc6 /dev/hdc7 /dev/hdc8 /dev/hdc9
    # vgcreate -s 32M vg1 /dev/hdc1 /dev/hdc2 /dev/hdc3 /dev/hdc5 /dev/hdc6 /dev/hdc7 /dev/hdc8 /dev/hdc9
    # vgdisplay | grep VG
    VG Size 186 GB
    # lvcreate -L 186G -n lv1 vg1
    # mkfs -t ext3 /dev/vg1/lv1
    # vi /etc/fstab (once only)

    /dev/vg1/lv1 /mnt/data1 ext3 defaults 1 2

anyway, onto the good stuff - this is what the file system corruption actually
looks like:

    # ls -la
    total 6137963733
    drwxr-xr-x 17 chris chris 4096 Jul 31 01:54 ./
    drwxrwxrwt 8 root root 4096 Jul 31 01:50 ../
    ?rwxr-S-wt 32767 354986359 1980964661 527431223 Oct 27 directory1
    drwxr-xr-x 2 chris chris 4096 Jul 31 01:54 directory2
    <snip>
    drwxr-xr-x 2 chris chris 4096 Jul 31 00:45 directory8
    ?----ws--- 32767 698371637 1653296404 3470634407 Dec 3 1947 directory9
    ?rwsrwxrwt 32767 2802999440 472206431 3590521037 Sep 9 1989 directory10
    p-wSr-sr-- 11805 3712363958 840319880 0 Oct 20 2024 directory11
    <snip>
    drwxr-xr-x 2 chris chris 4096 Jul 30 21:44 directory16
    drwxr-xr-x 2 chris chris 4096 Jul 30 21:42 directory17

this was on the 2nd attempt at configuring LVM. i noted similar behaviour on
the first attempt, both times were after i'd put about 36GB on the disk.

on the third try, i got a trashed partition table, which has only happened
once:

    # fdisk -l /dev/hdc

    <snip>
    /dev/hdc9 21281 24321 24426801 8e OS/2 hidden C: drive

here's what happened when i tried to remove the directories, firstly through
windows explorer (samba share):

    Message from syslogd@connect4 at Thu Jul 31 01:54:44 2003 ...
    connect4 kernel: Kernel panic: EXT3-fs panic (device lvm(58,0)): load_block_bitmap: block_group >= groups_count - block_group = 131071, groups_count = 1488

which also locked up explorer, after killing it and restarting it i was still
able to connect to the box and browse/edit files (including those on the LV).
then i tried removing them from the command line:

    # rm -rf directory1/
    rm: cannot remove directory `directory17//directory17a': Read-only file system
    rm: cannot chdir from `directory16//directory16a/.' to `directory16b': Not a directory

the processes appeared to hang and could not be killed (not even with -KILL).
this was worrying as well:

    # pvdisplay /dev/hdc1 /dev/hdc2 /dev/hdc3 /dev/hdc5 /dev/hdc6 /dev/hdc7 /dev/hdc8 /dev/hdc9
    pvdisplay -- no physical volume identifier on "/dev/hdc2"

    --- Physical volume ---
    PV Name /dev/hdc1
    VG Name vg1
    PV Size 23.29 GB [48837537 secs] / NOT usable 32.19 MB [LVM: 130 KB]
    PV# 1
    PV Status NOT available
    Allocatable yes (but full)
    Cur LV 1
    PE Size (KByte) 32768
    Total PE 744
    Free PE 0
    Allocated PE 744
    PV UUID 8kx7qV-0c14-UZ6o-AhRI-npyC-oItk-4P0ES6

    etc... the other devices all appear to be ok

attempting to browse one of the corrupt directories (directory9) crashed
explorer (again restarting it and reconnecting posed no discernable problems).

googling (eventually) led me to the conclusion that it was probably to do with
the alim15x3 driver. i compiled a clean 2.4.21 kernel (i hope that doesn't
cause problems with vendor "lock outs" :) which booted fine and appears to
work nicely - with the exception of this one drive - similar results to those
i'd already experienced, including the apparent 36GB "corruption threshold".

next on the list was to disable DMA:

ftp://ftp.support.acer-euro.com/desktop/mainboard/V58LA/manual/ap5000dt_ug.zip

the wonderful thing about this particular BIOS is that there's no
obvious "Disable DMA for this IDE device" option. instead, it has this:

    Hard Disk Block Mode ............ [ Auto ]
    Advanced PIO Mode ............... [ Auto ]
    Hard Disk Size > 504MB .......... [ Auto ]
    Hard Disk 32 Bit Access ......... [Enabled ]
    CD-ROM Drive DMA Mode ........... [Enabled ]

so i tried disabling "Hard Disk Block Mode" and "CD-ROM Drive DMA Mode", which
actually showed signs of working. the file system didn't become corrupt until
i'd used around 90GB. talk about progress! :) i plan on testing "PIO Mode"
over the weekend with high hopes - the UDMA mini HOWTO
(http://www-scf.usc.edu/~vibber/linux/Ultra-DMA-9.html) seems to point in this
direction.

once again, any help will be much appreciated.

  - chris



Relevant Pages

  • [SLE] LVM Snapshots work [WAS: Suse 10.1 kernel update?]
    ... Indeed after pulling in the kernel of the day, ... This is the first time I've had any level of consistentcy with any of ... Monitoring the LVM list, the past problems were generic LVM ... On 7/10/06, Greg Freemyer wrote: ...
    (SuSE)
  • Re: Perspective from a Wasillian
    ... Of course all politicians are corrupt. ... did was blatant and stupid. ... It wouldn't be the first time he got an ...
    (rec.sport.golf)
  • RE: files going corrupt
    ... it's not the first time after the initial save. ... becoming corrupt. ... option---and it reopens fine. ... an XML drawing? ...
    (microsoft.public.visio.general)
  • Re: Perspective from a Wasillian
    ... Of course all politicians are corrupt. ... did was blatant and stupid. ... It wouldn't be the first time he got an ...
    (rec.sport.golf)
  • Re: Perspective from a Wasillian
    ... Of course all politicians are corrupt. ... did was blatant and stupid. ... It wouldn't be the first time he got an ...
    (rec.sport.golf)