Nforce2 sporadic hangs/freezes

From: Skaven (skaven_at_rentersresource.com)
Date: 09/22/03


Date: 21 Sep 2003 15:54:04 -0700

I'm about at wit's end. I purchased an Asus A7N8X Deluxe motherboard
+ barton 2800+ + 1G Corsair DDR333 memory. Ever since I installed the
system, I've been getting very very frequent lockups that I can't seem
to track down to any specific piece of hardware (or software, for that
matter)

Here's the way the crash happens:

1. Using the computer normally (listening to XMMS, using gAIM,
browsing with Mozilla, SSH'ed to a campus interactive system, etc)

2. Without any kind of warning, hiccup, cpu load spike, memory spike,
or hard drive activity, the system just stops. Goes catatonic. The
speakers emit a ~1khz tone. The system goes completely blank. Can't
ping it, no cursor movement, no keyboard activity; the numlock key
even quits working. The only way to recover the system is to do a
hard reboot.

In general, the system will do this every ~20 mins if listening to
shoutcast via XMMS. This gets extended to ~40-120 mins if listening
to regular MP3s, and extends to ~6 hours if XMMS isn't running.

However, that's just in general. I've had the system lock during
kernel boot (before init is even spawned), and several times during
fsck. It'll sometimes lock up while running the screensaver, without
XMMS, gAIM, or anything else running.

I've tried various remedies. Switching from a PCI NIC to the onboard
3com didn't help. I've been consistently staying with the latest
prepatch kernel series, hoping that this issue is getting addressed.
It appears that 2.6.0-pre4 helps a lot, but the problem still exists.
2.4.22 is unuseable (locks up every 5-10 minutes), but 2.4.23-pre4 is
decent (20-40min between lockups).

I've tested all of my components individually: ran memtest86, all of
my RAM is okay. Besides, the lockups are not related to memory
useage. I've swapped the CPU for an older t-bred core; same problem.
As I said before, I tried out a new NIC; same problem. Swapped out
the motherboard; same problem. Disabled IO-APIC, same problem.
Disabling DMA extended the freeze time to over 6 hours, but did not
solve the problem (plus it makes my computer run slllllloooooowwww)

Back when 2.4.22 had just been released, this issue was known and was
apparently being addressed by the kernel developers. AMD, for
example, apparently had a bunch of nForce2 systems that were hanging
in this manner. As far as I know they're still having the problem.
There were various suggestions thrown around, such as disabling
io-apic and disabling DMA. However, I haven't seen any chatter about
the issue since then.

Is anybody else having this problem? Any suggestions? It's
frustrating that Windows, of all OSes, runs more stably on my system
than linux does.

Thanks for any help

--Skaven

------ Hardware config ------
Asus A7N8X Deluxe
Barton 2800+
2x 512M Corsair XMS DDR333
Radeon 9000 128M
Creative Audigy2 Platinum
30G IBM DTLA-307030
Mitsumi 36x CDR
Antec Tru380 380W PSU

-- peripherals
USB logitech quickcam express
USB logitech optical mouse
PS/2 logitech keyboard

------ LSPCI output ------
00:00.0 Host bridge: nVidia Corporation: Unknown device 01e0 (rev c1)
00:00.1 RAM memory: nVidia Corporation: Unknown device 01eb (rev c1)
00:00.2 RAM memory: nVidia Corporation: Unknown device 01ee (rev c1)
00:00.3 RAM memory: nVidia Corporation: Unknown device 01ed (rev c1)
00:00.4 RAM memory: nVidia Corporation: Unknown device 01ec (rev c1)
00:00.5 RAM memory: nVidia Corporation: Unknown device 01ef (rev c1)
00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a4)
00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev
a4)
00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev
a4)
00:04.0 Ethernet controller: nVidia Corporation nForce2 Ethernet
Controller (rev a1)
00:08.0 PCI bridge: nVidia Corporation: Unknown device 006c (rev a3)
00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2)
00:0c.0 PCI bridge: nVidia Corporation: Unknown device 006d (rev a3)
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1)
01:08.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
01:08.1 Input device controller: Creative Labs SB Audigy MIDI/Game
port (rev 04)
01:08.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port
(rev 04)
02:01.0 Ethernet controller: 3Com Corporation 3C920B-EMB Integrated
Fast Ethernet Controller (rev 40)
03:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 If
[Radeon 9000] (rev 01)
03:00.1 Display controller: ATI Technologies Inc Radeon R250 [Radeon
9000] (Secondary) (rev 01)

------ LSMOD output ------
Module Size Used by Not tainted
w83781d 22320 0 (unused)
i2c-proc 6260 0 [w83781d]
i2c-nforce2 3176 0 (unused)
i2c-core 13924 0 [w83781d i2c-proc i2c-nforce2]
emu10k1 66568 2
ac97_codec 10440 0 [emu10k1]
3c59x 26448 1
mod_quickcam 37200 0 (unused)

------- DMESG output -------
00:00.0 Host bridge: nVidia Corporation: Unknown device 01e0 (rev c1)
00:00.1 RAM memory: nVidia Corporation: Unknown device 01eb (rev c1)
00:00.2 RAM memory: nVidia Corporation: Unknown device 01ee (rev c1)
00:00.3 RAM memory: nVidia Corporation: Unknown device 01ed (rev c1)
00:00.4 RAM memory: nVidia Corporation: Unknown device 01ec (rev c1)
00:00.5 RAM memory: nVidia Corporation: Unknown device 01ef (rev c1)
00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a4)
00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev
a4)
00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev
a4)
00:04.0 Ethernet controller: nVidia Corporation nForce2 Ethernet
Controller (rev a1)
00:08.0 PCI bridge: nVidia Corporation: Unknown device 006c (rev a3)
00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2)
00:0c.0 PCI bridge: nVidia Corporation: Unknown device 006d (rev a3)
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1)
01:08.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
01:08.1 Input device controller: Creative Labs SB Audigy MIDI/Game
port (rev 04)
01:08.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port
(rev 04)
02:01.0 Ethernet controller: 3Com Corporation 3C920B-EMB Integrated
Fast Ethernet Controller (rev 40)
03:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 If
[Radeon 9000] (rev 01)
03:00.1 Display controller: ATI Technologies Inc Radeon R250 [Radeon
9000] (Secondary) (rev 01)
[skaven@balthasar:~]$
[skaven@balthasar:~]$
[skaven@balthasar:~]$ uname -a
Linux balthasar 2.4.23-pre4 #3 Sun Sep 14 19:07:30 CDT 2003 i686
unknown
[skaven@balthasar:~]$ dmesg
Linux version 2.4.23-pre4 (root@balthasar) (gcc version 3.2.2) #3 Sun
Sep 14 19:07:30 CDT 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
On node 0 totalpages: 262128
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 32752 pages.
Kernel command line: auto BOOT_IMAGE=Linux-balth ro root=305
hdc=ide-scsi hdd=ide-scsi hdg=noprobe
ide_setup: hdc=ide-scsi
ide_setup: hdd=ide-scsi
ide_setup: hdg=noprobe
Found and enabled local APIC!
Initializing CPU#0
Detected 2079.544 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4141.87 BogoMIPS
Memory: 1033092k/1048512k available (1731k kernel code, 15032k
reserved, 619k data, 100k init, 131008k highmem)
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 65536 (order: 6, 262144 bytes)
Page-cache hash table entries: 262144 (order: 8, 1048576 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 0383fbff c1c3fbff 00000000 00000000
CPU: Common caps: 0383fbff c1c3fbff 00000000 00000000
CPU: AMD Athlon(tm) XP 2800+ stepping 00
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 2079.6324 MHz.
..... host bus clock speed is 332.7411 MHz.
cpu: 0, clocks: 3327411, slice: 1663705
CPU0<T0:3327408,T1:1663696,D:7,S:1663705,C:3327411>
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfb490, last bus=3
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router default [10de/01e0] at 00:00.0
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16)
Starting kswapd
allocated 32 pages and 32 bhs reserved for the highmem bounces
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE]
parport0: irq 7 detected
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ
SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
lp0: using parport0 (polling).
Floppy drive(s): fd0 is 1.44M
floppy0: no floppy controllers found
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 941M
agpgart: unsupported bridge
agpgart: no supported devices found.
[drm] Initialized r128 2.2.0 20010917 on minor 0
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
NFORCE2: IDE controller at PCI slot 00:09.0
NFORCE2: chipset revision 162
NFORCE2: not 100% native mode: will probe irqs later
AMD_IDE: Bios didn't set cable bits corectly. Enabling workaround.
AMD_IDE: nVidia Corporation nForce2 IDE (rev a2) UDMA100 controller on
pci00:09.0
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
hda: IBM-DTLA-307030, ATA DISK drive
blk: queue c0394340, I/O limit 4095Mb (mask 0xffffffff)
hdc: CR-48XATE, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 60036480 sectors (30739 MB) w/1916KiB Cache, CHS=3737/255/63,
UDMA(100)
hdc: attached ide-scsi driver.
Partition check:
 hda: hda1 hda2 < hda5 > hda3
SCSI subsystem driver Revision: 1.00
scsi0 : SCSI host adapter emulation for IDE ATAPI devices
  Vendor: MITSUMI Model: CR-48XATE Rev: 1.0E
  Type: CD-ROM ANSI SCSI revision: 02
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
host/usb-uhci.c: $Revision: 1.275 $ time 19:08:35 Sep 14 2003
host/usb-uhci.c: High bandwidth mode enabled
host/usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
PCI: Setting latency timer of device 00:02.0 to 64
host/usb-ohci.c: USB OHCI at membase 0xf880d000, IRQ 10
host/usb-ohci.c: usb-00:02.0, nVidia Corporation nForce2 USB
Controller
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 3 ports detected
PCI: Setting latency timer of device 00:02.1 to 64
host/usb-ohci.c: USB OHCI at membase 0xf880f000, IRQ 10
host/usb-ohci.c: usb-00:02.1, nVidia Corporation nForce2 USB
Controller (#2)
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
hub.c: 3 ports detected
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
Linux video capture interface: v1.00
mice: PS/2 mouse device common for all mice
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
hub.c: new USB device 00:02.1-1, assigned address 2
input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on
usb2:2.0
hub.c: new USB device 00:02.1-2, assigned address 3
usb.c: USB device 3 (vend/prod 0x46d/0x870) is not claimed by any
active driver.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 100k freed
Adding Swap: 248968k swap-space (priority -1)
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,5), internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
usb.c: registered new driver quickcam
USB Quickcam Class ff SubClass ff idVendor 46d idProduct 870
USB Quickcam camera found using: $Id: quickcam.c,v 1.111 2003/01/27
09:41:03 tuukkat Exp $
quickcam: probe of HDCS1000 sensor = 08 02 id: 08
quickcam: HDCS1000 sensor detected
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
See Documentation/networking/vortex.txt
02:01.0: 3Com PCI 3c920 Tornado at 0xa000. Vers LK1.1.18-ac
 00:26:54:0e:c5:2e, IRQ 10
  product code ffff rev 00.0 date 15-31-127
  Internal config register is 1600000, transceivers 0x40.
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/MII interface.
  MII transceiver found at address 2, status 786d.
  Enabling bus-master transmits and whole-frame receives.
02:01.0: scatter/gather enabled. h/w checksums enabled
Creative EMU10K1 PCI Audio Driver, version 0.20a, 19:02:41 Sep 14 2003
emu10k1: Audigy rev 4 model 0x1002 found, IO at 0x9000-0x903f, IRQ 3
ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 (SigmaTel STAC9721/23)
i2c-core.o: i2c core module version 2.8.0 (20030714)
i2c-nforce2.o version 2.8.0 (20030714)
i2c-nforce2.o: nForce2 SMBus adapter at 0x5000
i2c-nforce2.o: nForce2 SMBus adapter at 0x5500
i2c-proc.o version 2.8.0 (20030714)
w83781d.o version 2.8.0 (20030714)