Re: PROBLEM: total system crash



On Sun, Sep 10, 2006 at 11:18:30PM +0000, Dieter Ferdinand wrote:
hello,
i have some big problems with kernel 2.4.32.

i use asterisk as pbx with an avm b1-card and capi on a dual p3-500
system. this system crash always if i make a call from isdn to asterisk and
from asterisk to isdn dependently which end hangs up first.

i test same on an other system with an avm fritz-card without problem.
the test on a system with single cpu is in planning.

Could you please test 2.4.34-pre2 ?
There was a recent fix for avm-b1 c4 cards causing problems with Pentium3.
I'm not sure at all it is related, but it might be worth trying.

the next problem makes more problems and i need this driver.
i use adaptec scsi-controllers (aha 2940 and other models with the same
driver) in all my scsi-systems.

after some problems with the scsi-system, the system hangs total.
i have only the log at the end of the mail and here are some informations
missing.

this system crash thre time today with the same error. after booting with
kernel 2.6.16.27 i get a message soft-lookup detected and system hangs.
the system is very much swapping at this moment.

this system is a pentium 266 with an ide-harddisk for linux and two scsi-
harddisks for backup und data. i use this system for testing new kernels and
software.

it looks that i have this problem only on heavy load and many paging-
requests.

I would suggest that you carefully check your cables and terminations. From
experience, the 2940 is very sensible to incorrect termination and loosy
cables. Also, you might have some SCSI devices which do not support the
speed you have set.

if you need more informations, i think, i can reproduce this errors every time
if i start some programs which makes a high system load.

It might also be possible that you have an old buggy PCI chipset on this
machine, which was not uncommon at times of P2/266.

but i don't know, how i can catch informations, which you need, to solve this
problem.
i can setup a remote-login to the system for you without access to my other
system-resources.

Well, a remote login will not provide anything more, unfortunately. First,
you have to check that every piece of hardware is correct, and that your
PCI settings in the BIOS are OK.

the next problem is with kernel 2.4 and 2.6. i have some older pentium dual-
systems with intel hx-chipset. on this systems, both kernel crash after some
time, i think the most time on heavy system load. kernel 2.2 works fine with
this systems.

maybe 2.2 puts less stress on the hardware ?

i can accecpt, that a scsi-device is deaktivated, if there are some errors, but
i can't accecpt, that the system crash and hangs.

To be fair, Linux's SCSI stack has always been fragile for me. In many
situations, hardware errors have caused my system to hang. I believe that
SCSI in 2.6 is slightly more robust, but I've not played much with it.

goodby

Regards,
Willy

Sep 10 09:23:12 p-266 (scsi0:A:5:0): Device is disconnected, re-queuing
SCB
Sep 10 09:23:12 p-266 (scsi0:A:5:0): Abort Message Sent
Sep 10 09:23:12 p-266 (scsi0:A:5:0): SCB 3 - Abort Completed.
Sep 10 09:23:12 p-266 scsi0:0:5:0: Attempting to queue a TARGET RESET
message
Sep 10 09:23:12 p-266 scsi0:0:5:0: Command not found
Sep 10 09:23:22 p-266 scsi0:0:5:0: Attempting to queue an ABORT
message
Sep 10 09:23:22 p-266 scsi0: At time of recovery, card was not paused
Sep 10 09:23:22 p-266 scsi0: Dumping Card State while idle, at SEQADDR
0x8
Sep 10 09:23:22 p-266 SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0]
LASTPHASE[0x1]:(P_BUSFREE)
Sep 10 09:23:22 p-266 SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)
SBLKCTL[0x2]:(SELWIDE)
Sep 10 09:23:22 p-266 SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE)
SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED)
Sep 10 09:23:22 p-266 SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x0]
SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO)
Sep 10 09:23:22 p-266 0
SCB_CONTROL[0xc8]:(ULTRAENB|DISCENB|TARGET_SCB)
SCB_SCSIID[0x57]
Sep 10 09:23:22 p-266 1
SCB_CONTROL[0xc8]:(ULTRAENB|DISCENB|TARGET_SCB)
SCB_SCSIID[0x57]
Sep 10 09:23:22 p-266 2 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 3 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 4 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 5 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 6 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 7 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 8 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 9 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 10 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 11 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 12 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 13 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 14 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 15 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:22 p-266 4 SCB_CONTROL[0x48]:(ULTRAENB|DISCENB)
SCB_SCSIID[0x57] SCB_LUN[0x0]
Sep 10 09:23:22 p-266 (scsi0:A:5:0): Device is disconnected, re-queuing
SCB
Sep 10 09:23:22 p-266 (scsi0:A:5:0): Abort Message Sent
Sep 10 09:23:22 p-266 (scsi0:A:5:0): SCB 4 - Abort Completed.
Sep 10 09:23:37 p-266 scsi0:0:5:0: Attempting to queue an ABORT
message
Sep 10 09:23:37 p-266 scsi0: At time of recovery, card was not paused
Sep 10 09:23:37 p-266 scsi0: Dumping Card State while idle, at SEQADDR
0x8
Sep 10 09:23:37 p-266 SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0]
LASTPHASE[0x1]:(P_BUSFREE)
Sep 10 09:23:37 p-266 SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)
SBLKCTL[0x2]:(SELWIDE)
Sep 10 09:23:37 p-266 SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE)
SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED)
Sep 10 09:23:37 p-266 SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x0]
SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO)
Sep 10 09:23:37 p-266 0
SCB_CONTROL[0xc8]:(ULTRAENB|DISCENB|TARGET_SCB)
SCB_SCSIID[0x57]
Sep 10 09:23:37 p-266 1
SCB_CONTROL[0xc8]:(ULTRAENB|DISCENB|TARGET_SCB)
SCB_SCSIID[0x57]
Sep 10 09:23:37 p-266 2 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 3 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 4 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 5 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 6 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 7 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 8 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 9 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 10 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 11 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 12 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 13 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 14 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 15 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:37 p-266 3 SCB_CONTROL[0x48]:(ULTRAENB|DISCENB)
SCB_SCSIID[0x57] SCB_LUN[0x0]
Sep 10 09:23:37 p-266 (scsi0:A:5:0): Device is disconnected, re-queuing
SCB
Sep 10 09:23:37 p-266 (scsi0:A:5:0): Abort Message Sent
Sep 10 09:23:37 p-266 (scsi0:A:5:0): SCB 3 - Abort Completed.
Sep 10 09:23:37 p-266 scsi: device set offline - not ready or command retry
failed after bus reset: host 0 channel 0 id 5 lun 0
Sep 10 09:23:47 p-266 scsi0:0:5:0: Attempting to queue an ABORT
message
Sep 10 09:23:47 p-266 scsi0: At time of recovery, card was not paused
Sep 10 09:23:47 p-266 scsi0: Dumping Card State while idle, at SEQADDR
0x8
Sep 10 09:23:47 p-266 SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0]
LASTPHASE[0x1]:(P_BUSFREE)
Sep 10 09:23:47 p-266 SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)
SBLKCTL[0x2]:(SELWIDE)
Sep 10 09:23:47 p-266 SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE)
SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED)
Sep 10 09:23:47 p-266 SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x0]
SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO)
Sep 10 09:23:47 p-266 0
SCB_CONTROL[0xc8]:(ULTRAENB|DISCENB|TARGET_SCB)
SCB_SCSIID[0x57]
Sep 10 09:23:47 p-266 1
SCB_CONTROL[0xc8]:(ULTRAENB|DISCENB|TARGET_SCB)
SCB_SCSIID[0x57]
Sep 10 09:23:47 p-266 2 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 3 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 4 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 5 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 6 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 7 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 8 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 9 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 10 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 11 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 12 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 13 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 14 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 15 SCB_CONTROL[0x0]
SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
Sep 10 09:23:47 p-266 4 SCB_CONTROL[0x48]:(ULTRAENB|DISCENB)
SCB_SCSIID[0x57] SCB_LUN[0x0]
Sep 10 09:23:47 p-266 (scsi0:A:5:0): Device is disconnected, re-queuing
SCB
Sep 10 09:23:47 p-266 (scsi0:A:5:0): Abort Message Sent
Sep 10 09:23:47 p-266 (scsi0:A:5:0): SCB 4 - Abort Completed.
Sep 10 09:23:47 p-266 scsi: device set offline - not ready or command retry
failed after bus reset: host 0 channel 0 id 5 lun 0
Sep 10 09:23:47 p-266 SCSI disk error : host 0 channel 0 id 5 lun 0 return
code = 30000
Sep 10 09:23:47 p-266 SCSI disk error : host 0 channel 0 id 5 lun 0 return
code = 30000

Schau auch einmal auf meine Homepage (http://go.to/dieter-ferdinand).
Dort findest du Information zu Linux, Novell, Win95, WinNT, ...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • PROBLEM: total system crash
    ... i use asterisk as pbx with an avm b1-card and capi on a dual p3-500 ... this system crash thre time today with the same error. ... kernel 2.6.16.27 i get a message soft-lookup detected and system hangs. ...
    (Linux-Kernel)
  • FC5 - recently introduced system hangs, multiple systems
    ... and ran yum update on the system. ... A couple of weeks ago I needed to re-build my work laptop. ... I think the kernel loaded by yum update was 2.6.18-1.2200. ... Right from the start the laptop has exhibited hangs typically about ...
    (Fedora)
  • Re: 2.4.27 suddenly hangs
    ... > I‘m not sure that the below problem is kernel related, however, I think, ... > Occasionally, the system just hangs, without giving any error messages ... when mgetty was the last. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
    ... On vanilla 2.6.20 suspend to disk works perfectly and suspend to ram works _almost_ perfectly. ... It happens after second CPU is brought back on-line. ... The fact is that I found thanks to RTC tricks, similar to PM_TRACE that system hangs in exactly three places ... As I see that is first time wakeup low page is addressing kernel memory by jumping to it. ...
    (Linux-Kernel)
  • Re: IO-APIC on nforce2 [PATCH] + [PATCH] for nmi_debug=1 + [PATCH] for idle=C1halt, 2.6.5
    ... Shuttle board after a BIOS update does not suffer from these hangs. ... Allen Martin already commented on this once: ... "Likely the root of the problem has to do with the way the Linux kernel is ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)

Loading