RHEL 3 ES stability problems on DL380 G3 (custom built kernel 2.6.7)
From: Oyvind (oyvind_at_ws.no)
Date: 07/25/04
- Next message: Blue Ice: "Terminal server"
- Previous message: Martin Stone: "Re: whitboxlinux.org"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 25 Jul 2004 22:27:17 +0200
Hi all
I have a stability problem with one of the servers at work, and I really
hope some of you have any ideas as to what may be wrong.
The symptom: In about 1 minutes time, the load goes from less than 1 to more
than 50, and the system becomes unavailable.
I saw this because I had top running at the terminal, and it was still
running, when I got to the server room, showing a load of 140 and 300
processes running (normal is about 120). It did not look like one process
were looping or anything, the top processes were "normal" processes with low
%CPU.
The machine responds to ping requests, but will not display welcome messages
on other ports when I telnet in. On the terminal it will respond to 'date'
but wil hang forever on commands such as 'tail /var/log/mesasages' or
'shutdown -r now'.
The machine is a HP DL380 G3 with 2x3GHz CPU, 2GB RAM, 6x36GB disks on Smart
Array 641 controller in RAID5 configuration with hotspare. We are running
qmail with qmail-scanner, spamassassin etc.
We are running RHEL 3 ES (Redhat Enterprise Linux 3 ES). We had serious HD
performance problems with stock kernels as well as 2.4.21-9 and 2.4.21-15.
Therefore we compiled a new kernel 2.6.7 from kernel.org. The kernel had
built-in support for Smart Array controllers and booted fine, with 4x HD
speed. We are using the TG3 network drivers.
Any clues and ideas will be much appreciated! :)
Regards
Oyvind
This is output from dmesg:
----------------------------------------
6>found SMP MP-table at 000f4fd0
On node 0 totalpages: 524282
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 225280 pages, LIFO batch:16
HighMem zone: 294906 pages, LIFO batch:16
DMI 2.3 present.
ACPI: RSDP (v000 COMPAQ ) @ 0x000f4f70
ACPI: RSDT (v001 COMPAQ P29 0x00000002 Ò 0x0000162e) @ 0x7fffa000
ACPI: FADT (v001 COMPAQ P29 0x00000002 Ò 0x0000162e) @ 0x7fffa040
ACPI: MADT (v001 COMPAQ 00000083 0x00000002 0x00000000) @ 0x7fffa100
ACPI: SPCR (v001 COMPAQ SPCRRBSU 0x00000001 Ò 0x0000162e) @ 0x7fffa1c0
ACPI: DSDT (v001 COMPAQ DSDT 0x00000001 MSFT 0x0100000b) @ 0x00000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
Processor #6 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled)
ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
Using ACPI for processor (LAPIC) configuration information
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: COMPAQ Product ID: PROLIANT APIC at: 0xFEE00000
I/O APIC #2 Version 17 at 0xFEC00000.
I/O APIC #3 Version 17 at 0xFEC01000.
I/O APIC #4 Version 17 at 0xFEC02000.
I/O APIC #5 Version 17 at 0xFEC03000.
Enabling APIC mode: Summit. Using 4 I/O APICs
Processors: 2
Built 1 zonelists
Kernel command line: root=/dev/cciss/c1d0p2
Initializing CPU#0
PID hash table entries: 4096 (order 12: 32768 bytes)
Detected 3056.779 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Memory: 2076340k/2097128k available (1583k kernel code, 19676k reserved,
805k data, 156k init, 1179624k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 6029.31 BogoMIPS
Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 131072 (order: 7, 524288 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: L3 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Xeon(TM) CPU 3.06GHz stepping 05
per-CPU timeslice cutoff: 1463.16 usecs.
task migration cache decay timeout: 2 msecs.
enabled ExtINT on CPU#0
Leaving ESR disabled.
Booting processor 1/6 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
Leaving ESR disabled.
Calibrating delay loop... 6111.23 BogoMIPS
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Intel(R) Xeon(TM) CPU 3.06GHz stepping 09
Total of 2 processors activated (12140.54 BogoMIPS).
WARNING: 1 siblings found for CPU0, should be 2
WARNING: 1 siblings found for CPU1, should be 2
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 3-0, 3-1, 3-2, 3-3, 3-4, 3-5, 3-6, 3-7, 3-8, 3-9,
3-10, 3-11, 3-12, 3-13, 3-14, 3-15, 4-0, 4-1, 4-2, 4-3, 4-4, 4-5, 4-6, 4-7,
4-8, 4-9, 4-10, 4-11, 4-12, 4-13, 4-14, 4-15, 5-0, 5-1, 5-2, 5-3, 5-4, 5-5,
5-6, 5-7, 5-8, 5-9, 5-10, 5-11, 5-12, 5-13, 5-14, 5-15 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 16.
number of IO-APIC #3 registers: 16.
number of IO-APIC #4 registers: 16.
number of IO-APIC #5 registers: 16.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 02000000
....... : arbitration: 02
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 0FF 0F 0 0 0 0 0 1 0 39
02 0FF 0F 0 0 0 0 0 1 0 31
03 0FF 0F 1 1 0 1 0 1 0 41
04 0FF 0F 0 0 0 0 0 1 0 49
05 0FF 0F 1 1 0 1 0 1 0 51
06 0FF 0F 0 0 0 0 0 1 0 59
07 0FF 0F 1 1 0 1 0 1 0 61
08 0FF 0F 0 0 0 0 0 1 0 69
09 0FF 0F 1 1 0 1 0 1 0 71
0a 0FF 0F 1 1 0 1 0 1 0 79
0b 0FF 0F 1 1 0 1 0 1 0 81
0c 0FF 0F 0 0 0 0 0 1 0 89
0d 0FF 0F 0 0 0 0 0 1 0 91
0e 0FF 0F 0 0 0 0 0 1 0 99
0f 0FF 0F 1 1 0 1 0 1 0 A1
IO APIC #3......
.... register #00: 03000000
....... : physical APIC id: 03
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 03000000
....... : arbitration: 03
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 000 00 1 0 0 0 0 0 0 00
02 000 00 1 0 0 0 0 0 0 00
03 000 00 1 0 0 0 0 0 0 00
04 000 00 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 000 00 1 0 0 0 0 0 0 00
07 000 00 1 0 0 0 0 0 0 00
08 000 00 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 000 00 1 0 0 0 0 0 0 00
0d 000 00 1 0 0 0 0 0 0 00
0e 000 00 1 0 0 0 0 0 0 00
0f 000 00 1 0 0 0 0 0 0 00
IO APIC #4......
.... register #00: 04000000
....... : physical APIC id: 04
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 04000000
....... : arbitration: 04
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 000 00 1 0 0 0 0 0 0 00
02 000 00 1 0 0 0 0 0 0 00
03 000 00 1 0 0 0 0 0 0 00
04 000 00 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 000 00 1 0 0 0 0 0 0 00
07 000 00 1 0 0 0 0 0 0 00
08 000 00 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 000 00 1 0 0 0 0 0 0 00
0d 000 00 1 0 0 0 0 0 0 00
0e 000 00 1 0 0 0 0 0 0 00
0f 000 00 1 0 0 0 0 0 0 00
IO APIC #5......
.... register #00: 05000000
....... : physical APIC id: 05
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 05000000
....... : arbitration: 05
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 000 00 1 0 0 0 0 0 0 00
02 000 00 1 0 0 0 0 0 0 00
03 000 00 1 0 0 0 0 0 0 00
04 000 00 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 000 00 1 0 0 0 0 0 0 00
07 000 00 1 0 0 0 0 0 0 00
08 000 00 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 000 00 1 0 0 0 0 0 0 00
0d 000 00 1 0 0 0 0 0 0 00
0e 000 00 1 0 0 0 0 0 0 00
0f 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 3056.0135 MHz.
..... host bus clock speed is 132.0875 MHz.
checking TSC synchronization across 2 CPUs:
BIOS BUG: CPU#0 improperly initialized, has -22 usecs TSC skew! FIXED.
BIOS BUG: CPU#1 improperly initialized, has 22 usecs TSC skew! FIXED.
Brought up 2 CPUs
CPU0: online
domain 0: span 03
groups: 01 02
CPU1: online
domain 0: span 03
groups: 02 01
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf0094, last bus=9
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
mtrr: your CPUs had inconsistent fixed MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
SCSI subsystem initialized
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
PCI: Discovered peer bus 01
PCI: Discovered peer bus 02
PCI: Discovered peer bus 06
PCI: Device 00:00 not found by BIOS
PCI: Device 00:01 not found by BIOS
PCI: Device 00:02 not found by BIOS
PCI: Device 00:78 not found by BIOS
PCI: Device 00:7b not found by BIOS
PCI: Device 00:80 not found by BIOS
PCI: Device 00:82 not found by BIOS
PCI: Device 00:88 not found by BIOS
PCI: Device 00:8a not found by BIOS
vesafb: probe of vesafb0 failed with error -6
apm: BIOS not found.
Starting balanced_irq
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Initializing Cryptographic API
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Real Time Clock Driver v1.12
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
Compaq SMART2 Driver (v 2.6.0)
Compaq CISS Driver (v 2.6.2)
cciss: Device 0xb178 has been found at bus 1 dev 3 func 0
cciss: using DAC cycles
Using anticipatory io scheduler
cciss: Device 0x46 has been found at bus 6 dev 2 func 0
cciss: using DAC cycles
blocks= 284490240 block_size= 512
heads= 255, sectors= 32, cylinders= 34864
cciss/c1d0: p1 p2 p3
divert: not allocating divert_blk for non-ethernet device lo
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller at PCI slot 0000:00:0f.1
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
SvrWks CSB5: simplex device: DMA forced
ide0: BM-DMA at 0x2000-0x2007, BIOS settings: hda:pio, hdb:pio
SvrWks CSB5: simplex device: DMA forced
ide1: BM-DMA at 0x2008-0x200f, BIOS settings: hdc:pio, hdd:pio
hda: CD-224E, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide-floppy driver 0.99.newide
mice: PS/2 mouse device common for all mice
serio: i8042 AUX port at 0x60,0x64 irq 12
input: PS/2 Generic Mouse on isa0060/serio1
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 16384 buckets, 128Kbytes
TCP: Hash tables configured (established 524288 bind 65536)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
NET: Registered protocol family 8
NET: Registered protocol family 20
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: cciss/c1d0p2: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 7503950
EXT3-fs: cciss/c1d0p2: 1 orphan inode deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 156k freed
EXT3 FS on cciss/c1d0p2, internal journal
Adding 2044072k swap on /dev/cciss/c1d0p3. Priority:-1 extents:1
kjournald starting. Commit interval 5 seconds
EXT3 FS on cciss/c1d0p1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
parport_pc: Unknown symbol parport_ieee1284_ecp_read_data
parport_pc: Unknown symbol parport_ieee1284_epp_read_data
parport_pc: Unknown symbol parport_ieee1284_read_nibble
parport_pc: Unknown symbol parport_ieee1284_read_byte
parport_pc: Unknown symbol parport_ieee1284_epp_read_addr
parport_pc: Unknown symbol parport_announce_port
parport_pc: Unknown symbol parport_ieee1284_write_compat
parport_pc: Unknown symbol parport_ieee1284_epp_write_data
parport_pc: Unknown symbol parport_ieee1284_interrupt
parport_pc: Unknown symbol parport_put_port
parport_pc: Unknown symbol parport_ieee1284_epp_write_addr
parport_pc: Unknown symbol parport_remove_port
parport_pc: Unknown symbol parport_register_port
parport_pc: Unknown symbol parport_ieee1284_ecp_write_addr
parport_pc: Unknown symbol parport_ieee1284_ecp_write_data
tg3.c:v3.6 (June 12, 2004)
divert: allocating divert_blk for eth0
eth0: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit)
10/100/1000BaseT Ethernet 00:0f:20:32:c6:dd
eth0: HostTXDS[1] RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0]
WireSpeed[1] TSOcap[1]
divert: allocating divert_blk for eth1
eth1: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit)
10/100/1000BaseT Ethernet 00:0f:20:32:c6:dc
eth1: HostTXDS[1] RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0]
WireSpeed[1] TSOcap[1]
ip_tables: (C) 2000-2002 Netfilter core team
ipt_state: Unknown symbol ip_conntrack_untracked
ipt_state: Unknown symbol ip_conntrack_get
ipt_state: Unknown symbol need_ip_conntrack
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
ipt_state: Unknown symbol ip_conntrack_untracked
ipt_state: Unknown symbol ip_conntrack_get
ipt_state: Unknown symbol need_ip_conntrack
----------------------------------------
- Next message: Blue Ice: "Terminal server"
- Previous message: Martin Stone: "Re: whitboxlinux.org"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|