Re: Badness at kernel/rcutree.c:1228
- From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
- Date: Tue, 8 Dec 2009 07:35:58 -0800
On Tue, Dec 08, 2009 at 01:54:38PM +0200, kordex - wrote:
Hey,
I did add the printk and warn_once lines you requested and the output
from them is:
Hierarchical RCU implementation.
rcu_init() smp_processor_id() = 0
rcu_init_percpu_data(0) called
rcu_init_percpu_data(0) called
So both rcu_sched_state.beenonline and rcu_bh_state.beenonline have
been set to 1 by the line immediately preceding the one that printed
out "rcu_init_percpu_data(0) called". This is the only line that
assigns a value to the beenonline field.
My suggestion is to bisect the portion of the boot process immediately
preceding the WARN_ON_ONCE() from the prior boot to locate whatever
is clobbering that value.
Thanx, Paul
Full dmesg at http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.printk.txt--
Captured just right after boot so it does not yet have anything else
than just clean boot messages.
--Mikko Kortelainen
2009/12/8 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
On Mon, Dec 07, 2009 at 08:58:35PM +0200, kordex - wrote:--
Hello,
I encountered this and I am able to reproduce, unfortunately. I was
syncing files from two large HDD drives and I saw difference between
cksum so I first suspected faulty filesystem but it was not that. Then
I ran couple of Seagate Seatools tests on these disks of mine, still
nothing. I suspected there was something bad in rsync(1) code but
still negative as tar worked in the same way. Ofcourse I did the dmesg
checking but after reboot after checking the disks were fine. It
showed up clean. So I went there and tried to resync my non raid disk
with raid0 disk. Still same problem, data mismatch. I read something
from google when installing new sata controller about lib_sata an sil
3114 and seagates disks not working so this time i though of checking
dmesg. And there it was nice little surprise:
------------[ cut here ]------------
Badness at kernel/rcutree.c:1228
NIP: c004ecbc LR: c004f14c CTR: c007bd70
REGS: df34dde0 TRAP: 0700 Not tainted (2.6.32)
MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24024482 XER: 20000000
TASK = d12c2d10[2734] 'buffer' THREAD: df34c000
GPR00: 00000001 df34de90 d12c2d10 c0746498 c07465a0 00b1ed55 00000001 00138c4a
GPR08: 00001032 c07b0000 00009032 00000008 44024488 1001c6ac 1002efd8 1002f064
GPR16: 10060000 10070000 10070000 c073a590 c073a710 c0685204 00000000 c07a5520
GPR24: c07a5520 0000000a df34c000 c0746498 c0746420 c0746528 00000009 c07465a0
NIP [c004ecbc] __rcu_process_callbacks+0x34/0x48c
LR [c004f14c] rcu_process_callbacks+0x38/0x4c
Call Trace:
[df34de90] [c004efec] __rcu_process_callbacks+0x364/0x48c (unreliable)
[df34deb0] [c004f14c] rcu_process_callbacks+0x38/0x4c
[df34ded0] [c0028e8c] __do_softirq+0xa4/0x120
[df34df10] [c0006a0c] do_softirq+0x40/0x58
[df34df20] [c0028c8c] irq_exit+0x38/0x80
[df34df30] [c0010d28] timer_interrupt+0x11c/0x138
[df34df40] [c00143a0] ret_from_except+0x0/0x14
--- Exception: 901 at 0x10002ddc
LR = 0x10002d80
Instruction dump:
7c0802a6 bf410008 7c9f2378 7c7b1b78 90010024 8804000e 2f800000 40be002c
3d20c07b 8009a334 7c000034 5400d97e <0f000000> 2f800000 41be0010 38000001
munin-update: page allocation failure. order:1, mode:0x20
Call Trace:
[c59bfa30] [c0008a24] show_stack+0x4c/0x14c (unreliable)
[c59bfa70] [c0058154] __alloc_pages_nodemask+0x468/0x4b8
[c59bfb00] [c0077448] cache_alloc_refill+0x2d8/0x55c
[c59bfb50] [c0077830] kmem_cache_alloc+0x64/0xb0
[c59bfb70] [c04862fc] sk_prot_alloc+0x2c/0x78
[c59bfb90] [c04863e0] sk_clone+0x20/0x1b0
[c59bfbb0] [c04bca70] inet_csk_clone+0x1c/0x8c
[c59bfbc0] [c04d1358] tcp_create_openreq_child+0x20/0x2c4
[c59bfbe0] [c04d00a0] tcp_v4_syn_recv_sock+0x58/0x17c
[c59bfc00] [c04d11c4] tcp_check_req+0x268/0x3dc
[c59bfc40] [c04cf87c] tcp_v4_do_rcv+0xa0/0x198
[c59bfc70] [c04cfdb8] tcp_v4_rcv+0x444/0x6d4
[c59bfca0] [c04b450c] ip_local_deliver+0x104/0x1d8
[c59bfcc0] [c04b43d0] ip_rcv+0x508/0x540
[c59bfcf0] [c049227c] netif_receive_skb+0x390/0x3bc
[c59bfd20] [c0492344] process_backlog+0x9c/0x134
[c59bfd50] [c0492b44] net_rx_action+0x80/0x190
[c59bfd80] [c0028e8c] __do_softirq+0xa4/0x120
[c59bfdc0] [c0006a0c] do_softirq+0x40/0x58
[c59bfdd0] [c0028db4] local_bh_enable+0x7c/0x9c
[c59bfde0] [c04852c8] release_sock+0x94/0xa8
[c59bfe00] [c04dcd48] inet_stream_connect+0x224/0x29c
[c59bfe50] [c0482a54] sys_connect+0x78/0xa8
[c59bff00] [c0484088] sys_socketcall+0xf0/0x240
[c59bff40] [c0013cf4] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfd51434
LR = 0xff66550
Mem-Info:
DMA per-cpu:
CPU 0: hi: 186, btch: 31 usd: 87
active_anon:6308 inactive_anon:8347 isolated_anon:0
active_file:52970 inactive_file:53351 isolated_file:0
unevictable:0 dirty:12753 writeback:0 unstable:0
free:1728 slab_reclaimable:2017 slab_unreclaimable:1020
mapped:2045 shmem:410 pagetables:431 bounce:0
DMA free:6912kB min:2884kB low:3604kB high:4324kB active_anon:25232kB
inactive_anon:33388kB active_file:211880kB inactive_file:213404kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
mlocked:0kB dirty:51012kB writeback:0kB mapped:8180kB shmem:1640kB
slab_reclaimable:8068kB slab_unreclaimable:4080kB kernel_stack:992kB
pagetables:1724kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1728*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 6912kB
106743 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 779144kB
Total swap = 779144kB
131072 pages RAM
0 pages HighMem
3391 pages reserved
14583 pages shared
121040 pages non-shared
...
Full dmesg: http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.txt
Kernel config: http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.config.txt
Interesting! You appear to be running a UP build on a 32-bit PowerPC
platform, which presumably disables CPU hotplug. You are also running
CONFIG_TREE_RCU, which does require initialization. Your dmesg snippet
seems to start after RCU initialized itself, as it would normally print:
Hierarchical RCU implementation.
Are you able to capture console output that early? If so, could you
please try the attached patch?
The behavior is consistent with the boot CPU thinking that it is not
online, which would indeed be strange. The patch checks on this and
also prints out at the point that ->beenonline should be initialized.
uname -a: Linux navi 2.6.32 #2 Mon Dec 7 13:35:57 EET 2009 ppc GNU/Linux
And no I am not running any patches on this. Same occurred with
2.6.31.6 which I had patched with Con Kolivas' BFS so I upgraded to
latest stable vanilla without any patches and still there it is. You
know, it ain't too interesting to sync 1TB disks 5 times (yea I needed
to do some testing for rsync) with slow controllers like these.
You also have quite a few occurrences of the following splat in your
dmesg, which might well explain your sync-ing woes:
Mem-Info:
DMA per-cpu:
CPU 0: hi: 186, btch: 31 usd: 153
active_anon:8938 inactive_anon:9432 isolated_anon:0
active_file:11943 inactive_file:91997 isolated_file:0
unevictable:0 dirty:4317 writeback:3464 unstable:0
free:1044 slab_reclaimable:1160 slab_unreclaimable:1123
mapped:3207 shmem:411 pagetables:471 bounce:0
DMA free:4176kB min:2884kB low:3604kB high:4324kB active_anon:35752kB
inactive_anon:37728kB active_file:47772kB inactive_file:367988kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
mlocked:0kB dirty:17268kB writeback:13856kB mapped:12828kB shmem:1644kB
slab_reclaimable:4640kB slab_unreclaimable:4492kB kernel_stack:1040kB
pagetables:1884kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1034*4kB 5*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 4176kB
104353 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 779144kB
Total swap = 779144kB
131072 pages RAM
0 pages HighMem
3391 pages reserved
14353 pages shared
120982 pages non-shared
mconf: page allocation failure. order:1, mode:0x20
Call Trace:
[d1197cb0] [c0008a24] show_stack+0x4c/0x14c (unreliable)
[d1197cf0] [c0058154] __alloc_pages_nodemask+0x468/0x4b8
[d1197d80] [c0077448] cache_alloc_refill+0x2d8/0x55c
[d1197dd0] [c0077780] __kmalloc+0xb4/0x100
[d1197df0] [c02aa6bc] tty_buffer_request_room+0xcc/0x148
[d1197e10] [c02aa8f4] tty_insert_flip_string+0x2c/0xa0
[d1197e30] [c02ab678] pty_write+0x40/0x70
[d1197e50] [c02a6244] n_tty_write+0x260/0x3bc
[d1197eb0] [c02a3780] tty_write+0x1bc/0x260
[d1197ef0] [c007b4d4] vfs_write+0xb8/0x180
[d1197f10] [c007b67c] sys_write+0x4c/0x8c
[d1197f40] [c0013cf4] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfef4034
LR = 0xfe97624
Mem-Info:
DMA per-cpu:
CPU 0: hi: 186, btch: 31 usd: 153
active_anon:8938 inactive_anon:9432 isolated_anon:0
active_file:11943 inactive_file:91997 isolated_file:0
unevictable:0 dirty:4317 writeback:3336 unstable:0
free:1044 slab_reclaimable:1160 slab_unreclaimable:1123
mapped:3207 shmem:411 pagetables:471 bounce:0
DMA free:4176kB min:2884kB low:3604kB high:4324kB active_anon:35752kB
inactive_anon:37728kB active_file:47772kB inactive_file:367988kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
mlocked:0kB dirty:17268kB writeback:13344kB mapped:12828kB shmem:1644kB
slab_reclaimable:4640kB slab_unreclaimable:4492kB kernel_stack:1040kB
pagetables:1884kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1034*4kB 5*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 4176kB
104353 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 779144kB
Total swap = 779144kB
131072 pages RAM
0 pages HighMem
3391 pages reserved
14353 pages shared
120982 pages non-shared
mconf: page allocation failure. order:1, mode:0x20
Call Trace:
[d1197cb0] [c0008a24] show_stack+0x4c/0x14c (unreliable)
[d1197cf0] [c0058154] __alloc_pages_nodemask+0x468/0x4b8
[d1197d80] [c0077448] cache_alloc_refill+0x2d8/0x55c
[d1197dd0] [c0077780] __kmalloc+0xb4/0x100
[d1197df0] [c02aa6bc] tty_buffer_request_room+0xcc/0x148
[d1197e10] [c02aa8f4] tty_insert_flip_string+0x2c/0xa0
[d1197e30] [c02ab678] pty_write+0x40/0x70
[d1197e50] [c02a6244] n_tty_write+0x260/0x3bc
[d1197eb0] [c02a3780] tty_write+0x1bc/0x260
[d1197ef0] [c007b4d4] vfs_write+0xb8/0x180
[d1197f10] [c007b67c] sys_write+0x4c/0x8c
[d1197f40] [c0013cf4] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfef4034
Thanx, Paul
------------------------------------------------------------------------
rcu: diagnostic patch for kordex
Make sure that the boot CPU has marked itself as online, print out
the running CPU's ID, and print a message from within rcu_init_percpu_data()
that includes the currently running CPU.
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> not for inclusion
---
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 4001833..4f357c4 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -175,6 +175,8 @@ void __init rcu_init(void)
* this is called early in boot, before either interrupts
* or the scheduler are operational.
*/
+ WARN_ON_ONCE(!cpu_online(smp_processor_id()));
+ printk(KERN_ALERT "rcu_init() smp_processor_id() = %d\n", smp_processor_id());
for_each_online_cpu(i)
rcu_barrier_cpu_hotplug(NULL, CPU_UP_PREPARE, (void *)(long)i);
}
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f3077c0..207125b 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1557,6 +1557,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
rdp->passed_quiesc = 0; /* We could be racing with new GP, */
rdp->qs_pending = 1; /* so set up to respond to current GP. */
rdp->beenonline = 1; /* We have now been online. */
+ printk(KERN_ALERT "rcu_init_percpu_data(%d) called\n", cpu);
rdp->preemptable = preemptable;
rdp->passed_quiesc_completed = lastcomp - 1;
rdp->qlen_last_fqs_check = 0;
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- Badness at kernel/rcutree.c:1228
- From: kordex -
- Re: Badness at kernel/rcutree.c:1228
- From: Paul E. McKenney
- Re: Badness at kernel/rcutree.c:1228
- From: kordex -
- Badness at kernel/rcutree.c:1228
- Prev by Date: Re: [RFC,PATCH 14/14] utrace core
- Next by Date: Re: [RFC] What are the goals for the architecture of an in-kernel IR system?
- Previous by thread: Re: Badness at kernel/rcutree.c:1228
- Next by thread: Re: Badness at kernel/rcutree.c:1228
- Index(es):
Relevant Pages
|