Re: 2.6.18-mm2 - oops in cache_alloc_refill()



On Fri, 29 Sep 2006 12:45:58 PDT, Andrew Morton said:

(Adding a bunch of people to the cc: list now that I have a clue what is
going on....)

I'd expect it's the same bug - slab data structures have gone bad.

*bing*! We have a winner. A quick check showed the kernel wasn't built with
slab debugging enabled, so I turned on the more obvious options, and got
rewarded with a traceback..

Again: how come nobody else is hitting this? Something's different.

gkrellm and wireless (specifically, gkrellm-wifi-0.9.12-3.fc6 from Fedora
Core extras-development). Kernel is still a 2.6.18 with *only* the
origin.patch from -mm2 applied. Note that the gkrellm plugin hasn't had
a change in the code since 01/03/2004 - hopefully there's been no unintentional
API change on the kernel side since then...

Here's the traceback I got:

slab error in verify_redzone_free(): cache `size-32': memory outside object was overwritten
[<c0103ad2>] dump_trace+0x64/0x1cd
[<c0103c4d>] show_trace_log_lvl+0x12/0x25
[<c010415f>] show_trace+0xd/0x10
[<c01041fc>] dump_stack+0x19/0x1b
[<c014c796>] __slab_error+0x17/0x1c
[<c014cdac>] cache_free_debugcheck+0xaf/0x230
[<c014d43e>] kfree+0x59/0x8c
[<c02dc04a>] ioctl_standard_call+0x1da/0x218
[<c02dc275>] wireless_process_ioctl+0x55/0x312
[<c02d3750>] dev_ioctl+0x45f/0x49a
[<c02c92aa>] sock_ioctl+0x1b3/0x1c6
[<c0160322>] do_ioctl+0x22/0x67
[<c01605a5>] vfs_ioctl+0x23e/0x251
[<c01605ff>] sys_ioctl+0x47/0x64
[<c0102cd3>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb

Leftover inexact backtrace:

=======================
de57e16c: redzone 1:0x170fc2a5, redzone 2:0x170fc200.

Repeated, over and over, just about once a second.

A quick strace of gkrellm finds these likely ioctl's causing the problem:

% grep ioctl /tmp/foo2 | sort -u | more
ioctl(13, SIOCGIWESSID, 0xbfbcdb9c) = 0
ioctl(13, SIOCGIWRANGE, 0xbfbcdbdc) = 0
ioctl(13, SIOCGIWRATE, 0xbfbcdbbc) = 0

Since I'm using an orinoco-based card, these 2 look like the most likely
candidates. WE-21 was merged between -mm1 and -mm2, which is why -mm1 was
stable for me. I'll let somebody else argue over what path these took that
I never tripped over them in an earlier -mm before they hit Linus's tree...

commit baef186519c69b11cf7e48c26e75feb1e6173baa
Author: John W. Linville <linville@xxxxxxxxxxxxx>
Date: Fri Sep 8 16:04:05 2006 -0400

[PATCH] WE-21 support (core API)

This is version 21 of the Wireless Extensions. Changelog :
o finishes migrating the ESSID API (remove the +1)
o netdev->get_wireless_stats is no more
o long/short retry

This is a redacted version of a patch originally submitted by Jean
Tourrilhes. I removed most of the additions, in order to minimize
future support requirements for nl80211 (or other WE successor).

CC: Jean Tourrilhes <jt@xxxxxxxxxx>
Signed-off-by: John W. Linville <linville@xxxxxxxxxxxxx>

commit eeec9f1a931262d69811135092c8447d6dccc3e6
Author: Jean Tourrilhes <jt@xxxxxxxxxx>
Date: Tue Aug 29 18:02:31 2006 -0700

[PATCH] WE-21 for orinoco

Signed-off-by: Jean Tourrilhes <jt@xxxxxxxxxx>
Signed-off-by: John W. Linville <linville@xxxxxxxxxxxxx>



Attachment: pgpBmU8mBZUAY.pgp
Description: PGP signature



Relevant Pages

  • Re: RT patch acceptance
    ... judge the complexity of a design for that type of system. ... claim that you cannot judge the complexity of a kernel modification. ... Since the patch in question doesn't actually need that information to ... nanokernel's API up to date with additions to Linux's API that RT people ...
    (Linux-Kernel)
  • Re: log-buf-len dynamic
    ... I have the patch and can live with one ... > different than the one that you compiled the kernel. ... an obvious "fix" which breaks someone's setup. ... So you should prefer the dynamic API too. ...
    (Linux-Kernel)
  • Re: [RFC][PATCH] New message-logging API (kprint)
    ... posted to the LKML and described in Documentation/kprint.txt (see patch). ... The main purpose of this change is provide a unified logging API to the ... I started this thread by posting an idea I had for shrinking the kernel by ...
    (Linux-Kernel)
  • Re: This is [Re:] How to improve the quality of the kernel[?].
    ... The -mm kernel already implements what your proposed PTS would do. ... If patch have no TS ID, ... Thus i can apply for example lguest patches and implement and test new ... How many open source projects use Bugzilla and how many use the Debian BTS? ...
    (Linux-Kernel)
  • [RFC] Making percpu module variables have their own memory.
    ... Someone using the -rt patch found that one of the tracing options caused ... 64K for every CPU to cover all the per_cpu variables used in the kernel ... static void wakeup_softirqd_prio ...
    (Linux-Kernel)