Fwd: 2.6.26.x hangs on amd64/smp



From: BERTRAND Joel <joel.bertrand@xxxxxxxxxxx>
Newsgroups: linux.kernel
Subject: 2.6.26.x hangs on amd64/smp
Reply-To: mt1@xxxxxxxxxxx

Hello,

System : debian/testing, tested kernels 2.6.26, 2.6.26.3, 2.6.26.5.
Hardware : core2duo, 4 GB, raid1 software, CFQ scheduler.

I have written a program that work on cartographic data. This program
is started as a daemon and does some fork() (and pthread_create()). I
have seen that it requires 6 GB to work, each process takes 1,5 GB. The
same program works fine under FreeBSD or Solaris (on of course the same
hardware).

When it starts, I can see disk activity (swap), and after 2 or 3
minutes, kernel crashes without any trace (no more disk activity, sysrq
does nothing...). I have reproduced this bug when I was logged on
console. There was no messsage.

If I introduce some nanosleep() syscalls in my code, crash is more
difficult to reproduce.

cauchy:[~] > cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
5855616 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
48829440 blocks [2/2] [UU]

md3 : active raid1 sdb4[1] sda4[0]
101474496 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
128384 blocks [2/2] [UU]

unused devices: <none>

swap in on /dev/md1.

cauchy:[~] > df -h
Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
/dev/md2 46G 28G 16G 64% /
tmpfs 2,0G 0 2,0G 0% /lib/init/rw
udev 10M 124K 9,9M 2% /dev
tmpfs 2,0G 0 2,0G 0% /dev/shm
/dev/md0 122M 60M 56M 52% /boot
/dev/md3 96G 56G 35G 62% /home
cauchy:[~] >

dmesg :
Linux version 2.6.26.5 (root@cauchy) (gcc version 4.3.1 (Debian 4.3.1-9)
) #16 SMP PREEMPT Tue Sep 23 15:54:59 CEST 2008
...
ACPI: BIOS bug: multiple APIC/MADT found, using 0
ACPI: If "acpi_apic_instance=2" works better, notify
linux-acpi@xxxxxxxxxxxxxxx
ACPI: DMI detected: Toshiba
...

.config: see http://www.systella.fr/~bertrand/config.2.6.26.5

Some bad news... I'm now able to reproduce this bug _without_ X.
Test configuration :

debian/testing up to date (minimal system with 2.6.26.5 kernel from
ftp.kernel.org build with gcc-4.3).

I have started my test program on a ssh connection. Console enters
in DPMS mode (power off). When system crashes, screen can be switch on
when I press on a key. But, there is not information on console.
Sysrq key doesn't work anymore. Any disc activities. It is
impossible to log on and I have to reboot with power button...

Regards,

JKB
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Decorators and buffer flushing
    ... I can reproduce the issue in the console. ... a bug, unless for some reason the interpreter is preventing a buffer flush. ... Maybe in the process of reducing your code to find the minimal example, ...
    (comp.lang.python)
  • Re: Decorators and buffer flushing
    ... I can reproduce the issue in the console. ... I'm not convinced it's actually a bug, unless for some reason the interpreter is preventing a buffer flush. ...
    (comp.lang.python)
  • Re: Garbage collector segmentation fault
    ... It is always hard to reproduce a bug in garbage collector as it is ... what the webserver does in the console. ... I'd say try with SVN HEAD and report back. ...
    (comp.lang.ruby)
  • Re: Linux 2.6.17-rc2 - notifier chain problem?
    ... i found the bug. ... I am not able to reproduce the problem you are seeing. ... but I'm using QEMU for a long time now to do kernel ...
    (Linux-Kernel)
  • Re: Delphi Q&A
    ... >> incomplete to reproduce this though. ... The fix a bug you first have to identify it. ... Jeff Overcash (TeamB) On waves of silver I dreamed of gold ...
    (borland.public.delphi.non-technical)