Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
- Date: Wed, 28 May 2008 03:30:26 -0700
On Wed, May 28, 2008 at 12:07:21PM +0200, Jens Axboe wrote:
On Tue, May 27 2008, Paul E. McKenney wrote:
On Tue, May 27, 2008 at 03:35:10PM +0200, Jens Axboe wrote:
On Tue, May 27 2008, Alexey Dobriyan wrote:
On Sat, May 10, 2008 at 02:37:19PM +0400, Alexey Dobriyan wrote:
@@ -41,8 +41,8 @@ int put_io_context(struct io_context *ioc)
rcu_read_lock();
if (ioc->aic && ioc->aic->dtor)
ioc->aic->dtor(ioc->aic);
- rcu_read_unlock();
cfq_dtor(ioc);
+ rcu_read_unlock();
kmem_cache_free(iocontext_cachep, ioc);
return 1;
This helps in sense that 3 times bulk cross-compiles finish to the end.
You'll hear me if another such oops will resurface.
Still looking good?
Yup!
And this with patch in mainline, again with PREEMPT_RCU.
Ping, this happened again with 2.6.26-rc4 and PREEMPT_RCU.
Worrisome... Paul, would you mind taking a quick look at cfq
and see if you can detect why breaks with preempt rcu? It's
clearly a use-after-free symptom, but I don't see how it can
happen.
Some quick and probably off-the-mark questions...
Thanks!
Glad it actually was of help! ;-)
o What is the purpose of __call_for_each_cic()? When called
from call_for_each_cic(), it is under rcu_read_lock(), as
required, but it is also called from cfq_free_io_context(),
which is assigned to the ->dtor and ->exit members of the
cfq_io_context struct. What protects calls through these
members?
(This is for the ->cic_list field of the cfq_io_context structure.
One possibility is that the io_context's ->lock member is held,
but I don't see this. Not that I looked all that hard...)
My suggestion would be to simply change all invocations of
__call_for_each_cic() to instead invoke call_for_each_cic().
The rcu_read_lock()/rcu_read_unlock() pair is pretty
lightweight, even in CONFIG_PREEMPT_RCU.
__call_for_each_cic() is always called under rcu_read_lock(), it merely
exists to avoid a double rcu_read_lock(). Even if it is cheap. The
convention follows the usual __lock_is_already_held() double under
score, but I guess it could do with a comment! There are only two
callers of the function, call_for_each_cic() which does the
rcu_read_lock(), and cfq_free_io_context() which is called from ->dtor
(and holds the rcu_read_lock() and ->trim which actually does not. That
looks like it could be problematic, but it's only called when an io
scheduler module is removed so not really critical. I'll add it, though!
Actually, the task_lock() should be enough there. So no bug, but (again)
it could do with a comment.
Sounds good!
o When calling cfq_slab_kill(), for example from cfq_exit(),
what ensures that all previous RCU callbacks have completed?
I suspect that you need an rcu_barrier() at the beginning
of cfq_slab_kill(), but I could be missing something.
So we have two callers of that, one is from the error path at init time
and is obviously ok. The other does need rcu_barrier()! I'll add that.
OK, that does make my brain hurt less. ;-)
o Updates to the ->ioc_data field of the cfq_io_context
seem to be protected by the request_queue ->queue_lock
field. This seems very strange to me. It is OK if every
cfq_io_context is associated with only one request_queue
structure -- is this the case?
->ioc_data is part of the io_context, not cfq_io_context. And it can be
shared now, so the correct locking for that would be ioc->lock and not
the queue lock. __cfq_exit_single_io_context() is serialized in the
sense that only one process gets to call the exit path.
Makes sense to me!
o What protects the first rcu_dereference() in cfq_cic_lookup()?
There needs to be either an enclose rcu_read_lock() on the
one hand or the ->queue_lock needs to be held.
(My guess is the latter, given the later rcu_assign_pointer()
in this same function, but I don't see a lock acquisition
in the immediate vicinity -- might be further up the function
call stack, though.)
There's no locking going into that function when coming from
cfq_get_io_context(), the other paths (happen) to hold the queue lock
already though.
So the call from cfq_get_io_context() needs an rcu_read_lock()?
Not seeing this in the patch below, but maybe you have it up a
function-call level or two?
o Why is there no grace period associated with the ioc_data?
For example, what happens to the old value of ->ioc_data
after the rcu_assign_pointer() in cfq_cic_lookup()? Readers
might still be referencing the old version, right? If so,
how do we avoid messing them up?
Or are we somehow leveraging the call_rcu() in cfq_cic_free()?
The data belonging to ->ioc_data (the cic, or per-process per-queue
context) is only freed through call_rcu().
Ah, OK, got it.
Any of this at all helpful?
Very, perhaps with a few more rounds we can find some more bugs :-). I'm
attaching a patch below, how does that look?
Looks much improved! Very interested to hear how it does with the testing.
Thanx, Paul
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c--
index 4df3f05..75db529 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1142,6 +1142,9 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
kmem_cache_free(cfq_pool, cfqq);
}
+/*
+ * Must always be called with the rcu_read_lock() held
+ */
static void
__call_for_each_cic(struct io_context *ioc,
void (*func)(struct io_context *, struct cfq_io_context *))
@@ -1197,6 +1200,11 @@ static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
cfq_cic_free(cic);
}
+/*
+ * Must be called with rcu_read_lock() held or preemption otherwise disabled.
+ * Only two callers of this - ->dtor() which is called with the rcu_read_lock(),
+ * and ->trim() which is called with the task lock held
+ */
static void cfq_free_io_context(struct io_context *ioc)
{
/*
@@ -1502,20 +1510,24 @@ static struct cfq_io_context *
cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
{
struct cfq_io_context *cic;
+ unsigned long flags;
void *k;
if (unlikely(!ioc))
return NULL;
+ rcu_read_lock();
+
/*
* we maintain a last-hit cache, to avoid browsing over the tree
*/
cic = rcu_dereference(ioc->ioc_data);
- if (cic && cic->key == cfqd)
+ if (cic && cic->key == cfqd) {
+ rcu_read_unlock();
return cic;
+ }
do {
- rcu_read_lock();
cic = radix_tree_lookup(&ioc->radix_root, (unsigned long) cfqd);
rcu_read_unlock();
if (!cic)
@@ -1524,10 +1536,13 @@ cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
k = cic->key;
if (unlikely(!k)) {
cfq_drop_dead_cic(cfqd, ioc, cic);
+ rcu_read_lock();
continue;
}
+ spin_lock_irqsave(&ioc->lock, flags);
rcu_assign_pointer(ioc->ioc_data, cic);
+ spin_unlock_irqrestore(&ioc->lock, flags);
break;
} while (1);
@@ -2134,6 +2149,11 @@ static void *cfq_init_queue(struct request_queue *q)
static void cfq_slab_kill(void)
{
+ /*
+ * Make sure that all existing RCU callbacks have been processed
+ */
+ rcu_barrier();
+
if (cfq_pool)
kmem_cache_destroy(cfq_pool);
if (cfq_ioc_pool)
--
Jens Axboe
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- From: Jens Axboe
- Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- References:
- Re: 2.6.25-$sha1: RIP call_for_each_cic+0x25/0x50
- From: Jens Axboe
- Re: 2.6.25-$sha1: RIP call_for_each_cic+0x25/0x50
- From: Alexey Dobriyan
- 2.6.25-$sha1: RIP __call_for_each_cic+0x20/0x50
- From: Alexey Dobriyan
- 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- From: Alexey Dobriyan
- Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- From: Jens Axboe
- Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- From: Paul E. McKenney
- Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- From: Jens Axboe
- Re: 2.6.25-$sha1: RIP call_for_each_cic+0x25/0x50
- Prev by Date: Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- Next by Date: Re: Wired behaviour with IPv6 over PPP
- Previous by thread: Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- Next by thread: Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
- Index(es):
Relevant Pages
|