Re: [PATCH 2.6.18-rc4 00/10] Kernel memory leak detector 0.9



On 13/08/06, Michal Piotrowski <michal.k.k.piotrowski@xxxxxxxxx> wrote:
Can you look at this?
=======================================================
[ INFO: possible circular locking dependency detected ]
-------------------------------------------------------
events/0/8 is trying to acquire lock:
(old_style_spin_init){++..}, at: [<c017674f>] memleak_free+0x95/0x157

but task is already holding lock:
(&parent->list_lock){++..}, at: [<c0174f29>] drain_array+0x49/0xad

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&parent->list_lock){++..}:
[<c0140cc7>] check_prevs_add+0x4d/0xaf
[<c01423c1>] __lock_acquire+0x7b1/0x814
[<c01429bc>] lock_acquire+0x5e/0x7e
[<c02f9f7a>] _spin_lock+0x23/0x2f
[<c0174058>] cache_alloc_refill+0x76/0x1d2
[<c0174559>] kmem_cache_alloc+0x73/0xce
[<c01f0c8a>] radix_tree_node_alloc+0x1a/0x51
[<c01f0e3f>] radix_tree_insert+0x51/0xfb
[<c01761f6>] insert_alias+0x85/0xe8
[<c01762a4>] memleak_insert_aliases+0x4b/0xa6
[<c01773f7>] memleak_init+0x44/0xf5
[<c0100ab0>] start_kernel+0x17e/0x1f9
[<c0100210>] 0xc0100210
-> #0 (old_style_spin_init){++..}:
[<c0140cc7>] check_prevs_add+0x4d/0xaf
[<c01423c1>] __lock_acquire+0x7b1/0x814
[<c01429bc>] lock_acquire+0x5e/0x7e
[<c02f9f7a>] _spin_lock+0x23/0x2f
[<c017674f>] memleak_free+0x95/0x157
[<c0174a74>] kmem_cache_free+0x62/0xbc
[<c0172fc8>] slab_destroy+0x48/0x4d
[<c01743b8>] free_block+0xc9/0x101
[<c0174f65>] drain_array+0x85/0xad
[<c017500d>] cache_reap+0x80/0xfe
[<c01394dd>] run_workqueue+0x88/0xc4
[<c0139617>] worker_thread+0xfe/0x131
[<c013c6e1>] kthread+0x82/0xaa
[<c01044c9>] kernel_thread_helper+0x5/0xb

I don't think I fully understand the slab locking, maybe some other
kernel guys could help with this, but lockdep is probably right (the
lock could happen on SMP systems).

It looks like lockdep complains that memleak_lock is acquired while
list_lock is held and there is a possibility that (on a different CPU)
memleak_lock was acquired first, followed by list_lock acquiring and
therefore a deadlock.

The kmemleak+slab locking is a bit complicated because memleak itself
needs to allocate memory and avoid recursive calls to it (the
pointer_cache and the radix_tree allocations). The kmemleak-related
allocations are not tracked by kmemleak.

I see the following solutions:

1. acquire the memleak_lock at the beginning of an alloc/free function
and release it when finished while allowing recursive/nested calls
(and only call the memleak_* functions during the outermost lock).
This would mean ignoring the off-slab management allocations as these
would lead to deadlock because of the recursive call into kmemleak.
This locking should be placed around cache_reap() as well (actually,
around all the entry points in the mm/slab.c file).

2. do an independent (simple) memory management in kmemleak and
probably replace radix_tree with prio_tree as the latter doesn't seem
to require allocations.

The first option is simple to implement but it has the disadvantage of
serializing the slab calls on SMP and also not tracking the mm/slab.c
allocations. The second one would provide full coverage of the kernel
slab allocations but it is probably more difficult to implement.

Any thoughts/suggestions on this?

Thanks.

--
Catalin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: D2006 Hyperthreading Win32
    ... > Critical sections are very often used in applications and the Win32 ... most Memory Managers (including FastMM - which is the ... locks for small, medium, and large allocations. ... to obtain a lock for the size it needs, it can try to obtain a lock for ...
    (borland.public.delphi.non-technical)
  • Re: D2006 Hyperthreading Win32
    ... locks for small, medium, and large allocations. ... If a request is unable to obtain a lock for the size it needs, it can try to obtain a lock for a larger size before waiting. ...
    (borland.public.delphi.non-technical)
  • Re: Multithreaded GC!
    ... will share a heap across multiple threads, ... with "synchronize their allocations" though. ... synchronized using a lock. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: using clustered index to optimize inserts ...
    ... I will try to explain locking in terms of Sybase docs... ... Allpages Locking: Allpages locking locks both data pages and index ... the data page is locked with an exclusive lock. ... Clustered Index: The datarows will be arranged as per the clustered ...
    (comp.databases.sybase)
  • Re: [PATCH 17/18] fs: icache remove inode_lock
    ... If you understand inode locking today, ... can understand the inode scaling series quite easily. ... filesystems to lock down the object without taking a global ... Per-zone is problematic. ...
    (Linux-Kernel)