Re: [rfc] optimise unlock_page



On Wed, 9 May 2007, Nick Piggin wrote:
On Wed, May 09, 2007 at 12:41:24AM +0200, Nick Piggin wrote:
On Wed, May 09, 2007 at 07:30:27AM +1000, Benjamin Herrenschmidt wrote:

Waking them all would fix it but at the risk of causing other
problems... Maybe PG_waiters need to actually be a counter but if that
is the case, then it complicates things even more.

It will wake up 1 exclusive waiter, but no limit on non exclusive waiters.
Hmm, but it won't wake up waiters behind the exclusive guy... maybe the
wake up code can check whether the waitqueue is still active after the
wakeup, and set PG_waiters again in that case?

Hm, I don't know if we can do that without a race either...

OTOH, waking all non exclusive waiters may not be a really bad idea.

Not good enough, I'm afraid. It looks like Ben's right and you need
a count - and counts in the page struct are a lot harder to add than
page flags.

I've now played around with the hangs on my three 4CPU machines
(all of them in io_schedule below __lock_page, waiting on pages
which were neither PG_locked nor PG_waiters when I looked).

Seeing Ben's mail, I thought the answer would be just to remove
the "_exclusive" from your three prepare_to_wait_exclusive()s.
That helped, but it didn't eliminate the hangs.

After fiddling around with different ideas for some while, I came
to realize that the ClearPageWaiters (in very misleadingly named
__unlock_page) is hopeless. It's just so easy for it to clear the
PG_waiters that a third task relies upon for wakeup (and which
cannot loop around to set it again, because it simply won't be
woken by unlock_page/__unlock_page without it already being set).

Below is the patch I've applied to see some tests actually running
with your patches, but it's just a joke: absurdly racy and
presumptuous in itself (the "3" stands for us and the cache and one
waiter; I deleted the neighbouring mb and comment, not because I
disagree, but because it's ridiculous to pay so much attention to
such unlikely races when there's much worse nearby). Though I've
not checked: if I've got the counting wrong, then maybe all my
pages are left marked PG_waiters by now.

(I did imagine we could go back to prepare_to_wait_exclusive
once I'd put in the page_count test before ClearPageWaiters;
but apparently not, that still hung.)

My intention had been to apply the patches to what I tested before
with lmbench, to get comparative numbers; but I don't think this
is worth the time, it's too far from being a real solution.

I was puzzled as to how you came up with any performance numbers
yourself, when I could hardly boot. I see you mentioned 2CPU G5,
I guess you need a CPU or two more; or maybe it's that you didn't
watch what happened as it booted, often those hangs recover later.

Hugh

--- a/mm/filemap.c 2007-05-08 20:17:31.000000000 +0100
+++ b/mm/filemap.c 2007-05-09 19:14:03.000000000 +0100
@@ -517,13 +517,8 @@ EXPORT_SYMBOL(wait_on_page_bit);
*/
void fastcall __unlock_page(struct page *page)
{
- ClearPageWaiters(page);
- /*
- * The mb is necessary to enforce ordering between the clear_bit and
- * the read of the waitqueue (to avoid SMP races with a parallel
- * wait_on_page_locked()
- */
- smp_mb__after_clear_bit();
+ if (page_count(page) <= 3 + page_has_buffers(page)+page_mapcount(page))
+ ClearPageWaiters(page);
wake_up_page(page, PG_locked);
}
EXPORT_SYMBOL(__unlock_page);
@@ -558,7 +553,7 @@ void fastcall __lock_page(struct page *p
DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);

do {
- prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
SetPageWaiters(page);
if (likely(PageLocked(page)))
sync_page(page);
@@ -577,7 +572,7 @@ void fastcall __lock_page_nosync(struct
DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);

do {
- prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
SetPageWaiters(page);
if (likely(PageLocked(page)))
io_schedule();
@@ -591,7 +586,7 @@ void fastcall __wait_on_page_locked(stru
DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);

do {
- prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
SetPageWaiters(page);
if (likely(PageLocked(page)))
sync_page(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/