Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
- From: Catalin Marinas <catalin.marinas@xxxxxxx>
- Date: Thu, 04 Dec 2008 12:14:26 +0000
Hi Paul,
On Wed, 2008-12-03 at 10:12 -0800, Paul E. McKenney wrote:
On Thu, Nov 20, 2008 at 11:30:34AM +0000, Catalin Marinas wrote:
This patch adds the base support for the kernel memory leak
detector. It traces the memory allocation/freeing in a way similar to
the Boehm's conservative garbage collector, the difference being that
the unreferenced objects are not freed but only shown in
/sys/kernel/debug/memleak. Enabling this feature introduces an
overhead to memory allocations.
I have some concerns about your locking/RCU design, please see
interspersed questions and comments.
Thanks for reviewing the locking. This was probably the hardest part of
kmemleak. It had several incarnations and slowly got towards a finer
grained locking.
I'll comment below as well but I'll first show my logic which I added as
comment at the top of the memleak.c file:
* The following locks are used by kmemleak:
*
* - object_list_lock (spinlock): protects the object_list. This is the main
* list holding the metadata (struct memleak_object) for the allocated
* objects. The memleak_object structures are added to the list in the
* create_object() function called from the memleak_alloc() callback. They
* are removed from the list in put_object() if the object->use_count is 0
* - object_tree_lock (rwlock): protects the object_tree_root. When the
* metadata is created in create_object(), it is added to the object prio
* search tree and removed in delete_object() with this lock held
* (write_lock). This lock is also acquired (read_lock) in
* find_and_get_object() when an object needs to be looked up by a pointer
* value (either during scanning or when changing its properties like
* marking as false positive)
* - memleak_object.lock (spinlock): protects a memleak_object. Modifications
* of the metadata (e.g. count) are protected by this lock. Note that some
* members of this structure may be protected by other means (atomic or
* object_list lock). This lock is also held when scanning the corresponding
* object to avoid the kernel freeing it via the memleak_free() callback.
* This is less heavyweight than holding a global lock like object_list_lock
* during scanning
*
* The only mutex used is scan_mutex. This ensures that only one thread may
* scan the memory for unreferenced objects at a time. The gray_list contains
* the objects which are already referenced or marked as false positives and
* need to be scanned. This list is only modified during a scanning episode
* when the scan_mutex is held. At the end of a scan, the gray_list is always
* empty. Note that the memleak_object.use_count is incremented when an object
* is added to the gray_list and therefore cannot be freed.
*
* Freeing a memleak_object is done via an RCU callback invoked from
* put_object() when its use_count is 0 and after removing it from the
* object_list. One of the reasons for the RCU is to delay the freeing and
* avoid a recursive call into the allocator via kmem_cache_free(). Another
* reason is to allow lock-less object_list traversal during memleak_scan().
+static void free_object_rcu(struct rcu_head *rcu)
+{
+ struct hlist_node *elem, *tmp;
+ struct memleak_scan_area *area;
+ struct memleak_object *object =
+ container_of(rcu, struct memleak_object, rcu);
+
+ /* once use_count is 0, there is no code accessing the object */
OK, so we won't pass free_object_rcu() to call_rcu() until use_count
is equal to zero. And once use_count is zero, it is never incremented.
So the point of the RCU grace period is to ensure that all tasks
who didn't quite call get_object() soon enough get done failing
before we free up the object, correct?
Which means that get_object() needs to be under rcu_read_lock()...
My view here is that if use_count is 0, no other thread would be able to
use this object. It will also be removed from the object_list and hence
no other way to get this this object. I think it would have worked
pretty much OK without the RCU *but* the main reason for the grace
period is to avoid a recursive call into kmem_cache_free() since
put_object() is quite likely called via kmem_cache_free() ->
memleak_free() -> delete_object(). The alternative is to use a work
queue or something else but the RCU already had the infrastructure and I
also gained a lock-less object_list traversal in memleak_scan().
+static void put_object(struct memleak_object *object)
+{
+ unsigned long flags;
+
+ if (!atomic_dec_and_test(&object->use_count))
+ return;
+
+ /* should only get here after delete_object was called */
+ BUG_ON(object->flags & OBJECT_ALLOCATED);
+
+ spin_lock_irqsave(&object_list_lock, flags);
We also need to write-hold the object_tree_lock, not?
Not here, the memleak_object is removed from the object_tree in the
delete_object() function (via from the memleak_free callback). If it is
in the object_tree, it should have a use_count >= 1.
+ /* the last reference to this object */
+ list_del_rcu(&object->object_list);
+ call_rcu(&object->rcu, free_object_rcu);
+ spin_unlock_irqrestore(&object_list_lock, flags);
+}
+
+static struct memleak_object *find_and_get_object(unsigned long ptr, int alias)
+{
+ unsigned long flags;
+ struct memleak_object *object;
+
+ read_lock_irqsave(&object_tree_lock, flags);
Use of read_lock_irqsave() will work, but only if -all- deletions are
protected by write-acquiring object_tree_lock.
Yes, deletions (in delete_object) are protected by
write_lock(object_tree_lock).
Or unless there is an enclosing rcu_read_lock() around all calls to
find_and_get_object().
Not needed in my opinion since if it is in the object_tree, it has
use_count >= 1 anyway and the RCU callback for freeing won't be invoked.
[...]+static inline void create_object(unsigned long ptr, size_t size, int ref_count)
+ write_lock_irqsave(&object_tree_lock, flags);
+ node = prio_tree_insert(&object_tree_root, &object->tree_node);
+ if (node != &object->tree_node) {
+ unsigned long flags;
+
+ pr_warning("kmemleak: existing pointer\n");
+ dump_stack();
+
+ object = lookup_object(ptr, 1);
+ spin_lock_irqsave(&object->lock, flags);
+ dump_object_info(object);
+ spin_unlock_irqrestore(&object->lock, flags);
+
+ panic("kmemleak: cannot insert 0x%lx into the object search tree\n",
+ ptr);
+ }
+ write_unlock_irqrestore(&object_tree_lock, flags);
In theory, you are OK here, but only because the below is adding
an element (not removing it). But this code is looking pretty dodgy.
What exactly does object_tree_lock protect? What exactly does
object_list protect?
See my explanation at the beginning. I separated the locking for the
object_list and the object_tree_root (prio search tree). I think I could
have held the object_tree_lock only for the prio_tree_insert() call.
[...]+static void memleak_scan(void)
+ /* reset the reference count (whiten the object) */
+ object->count = 0;
+ if (color_gray(object) && get_object(object))
+ list_add_tail(&object->gray_list, &gray_list);
What prevents other tasks from concurrently adding other objects to
gray_list? Preventing such concurrent adds is absolutely required,
otherwise gray_list will be corrupted.
There is scan_mutex for this.
+static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ struct list_head *n;
+ struct memleak_object *next = NULL;
+ unsigned long flags;
+
+ ++(*pos);
+ if (reported_leaks >= REPORTS_NR)
+ goto out;
+
+ spin_lock_irqsave(&object_list_lock, flags);
Using a spinlock instead of rcu_read_lock() is OK, but only if all
updates are also protected by this same spinlock. Which means that,
given that find_and_get_object read-acquires object_tree_lock, deletions
must be projected both by object_list_lock and by write-acquiring
object_tree_lock. Or all calls to memleak_seq_next need to be covered
by rcu_read_lock().
The spin_lock here is only to retrieve the next object in the list but I
agree that even if the object_list modifications are protected by
object_list_lock, put_object could actually set the use_count to 0 and
get_object return in this function isn't checked. If get_object returns
successfully, I don't think an rcu_read_lock() is needed since
put_object() can no longer invoke free_object_rcu().
Thanks for your time.
--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
- From: Paul E. McKenney
- Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
- References:
- Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
- From: Paul E. McKenney
- Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
- Prev by Date: Re: XFS breakage on m68k (was: Re: linux-next: Tree for December 3)
- Next by Date: Re: [RFC Patch 3/9] Modifying generic debug exception to use virtual debug registers
- Previous by thread: Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
- Next by thread: Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
- Index(es):
Relevant Pages
|