Re: [RFC/PATCH] Optimize zone allocator synchronization



Don Porter wrote:
From: Donald E. Porter <porterde@xxxxxxxxxxxxx>

In the bulk page allocation/free routines in mm/page_alloc.c, the zone
lock is held across all iterations. For certain parallel workloads, I
have found that releasing and reacquiring the lock for each iteration
yields better performance, especially at higher CPU counts. For
instance, kernel compilation is sped up by 5% on an 8 CPU test
machine. In most cases, there is no significant effect on performance
(although the effect tends to be slightly positive). This seems quite
reasonable for the very small scope of the change.

My intuition is that this patch prevents smaller requests from waiting
on larger ones. While grabbing and releasing the lock within the loop
adds a few instructions, it can lower the latency for a particular
thread's allocation which is often on the thread's critical path.
Lowering the average latency for allocation can increase system throughput.

More detailed information, including data from the tests I ran to
validate this change are available at
http://www.cs.utexas.edu/~porterde/kernel-patch.html .

Thanks in advance for your consideration and feedback.

That's an interesting insight. My intuition is that Nick Piggin's recently-posted ticket spinlocks patches[1] will reduce the need for this patch, though it may be useful to have both. Can you benchmark again with only ticket spinlocks, and with ticket spinlocks + this patch? You'll probably want to use 2.6.24-rc1 as your baseline, due to the x86 architecture merge.

-- Chris

[1] http://lkml.org/lkml/2007/11/1/123
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [RFC/PATCH] Optimize zone allocator synchronization
    ... lock is held across all iterations. ... especially at higher CPU counts. ... thread's allocation which is often on the thread's critical path. ... recently-posted ticket spinlocks patcheswill reduce the need for this ...
    (Linux-Kernel)
  • Re: [RFC/PATCH] Optimize zone allocator synchronization
    ... lock is held across all iterations. ... For certain parallel workloads, I ... Lowering the average latency for allocation can increase system throughput. ... recently-posted ticket spinlocks patcheswill reduce the need for this patch, ...
    (Linux-Kernel)
  • Re: To lock or not?
    ... Well, not having multiple lock objects but then locking at such a low level is also bad, but for performance reasons. ... if all you're ever locking is that one DateTime property then you may not need the lock at all. ... I did a quick test and two threads executing 10 million iterations accessing both the setter and getter of the property are consistently 30-40% faster using boxing than using a lock. ... public HolderThread(IDateTimeHolder dth, int iterations) ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: To lock or not?
    ... I would use the same object on which you apply the lock. ... if all you're ever locking is that one DateTime property then ... even the "slow" case took only 1.1 seconds to do 10 million iterations. ... public HolderThread(IDateTimeHolder dth, int iterations) ...
    (microsoft.public.dotnet.languages.csharp)
  • [RFC/PATCH] Optimize zone allocator synchronization
    ... In the bulk page allocation/free routines in mm/page_alloc.c, the zone ... lock is held across all iterations. ... While grabbing and releasing the lock within the loop ... Lowering the average latency for allocation can increase system throughput. ...
    (Linux-Kernel)