Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- From: Nick Piggin <npiggin@xxxxxxx>
- Date: Tue, 20 Jan 2009 14:41:23 +0100
On Tue, Jan 20, 2009 at 01:45:00PM +0100, Ingo Molnar wrote:
* Nick Piggin <npiggin@xxxxxxx> wrote:
On Tue, Jan 20, 2009 at 12:26:34PM +0100, Ingo Molnar wrote:
* Nick Piggin <npiggin@xxxxxxx> wrote:
Hi,
I'm looking at regressions since 2.6.16, and one is lat_mmap has slowed
down. On further investigation, a large part of this is not due to a
_regression_ as such, but the introduction of CONFIG_PARAVIRT=y.
Now, it is true that lat_mmap is basically a microbenchmark, however it
is exercising the memory mapping and page fault handler paths, so we're
talking about pretty important paths here. So I think it should be of
interest.
I've run the tests on a 2s8c AMD Barcelona system, binding the test to
CPU0, and running 100 times (stddev is a bit hard to bring down, and my
scripts needed 100 runs in order to pick up much smaller changes in the
results -- for CONFIG_PARAVIRT, just a couple of runs should show up the
problem).
Times I believe are in nanoseconds for lmbench, anyway lower is better.
non pv AVG=464.22 STD=5.56
paravirt AVG=502.87 STD=7.36
Nearly 10% performance drop here, which is quite a bit... hopefully
people are testing the speed of their PV implementations against non-PV
bare metal :)
Ouch, that looks unacceptably expensive. All the major distros turn
CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express
promise to have no measurable runtime overhead.
( And i suspect the real life mmap cost is probably even more expensive,
as on a Barcelona all of lmbench fits into the cache hence we dont see
any real $cache overhead. )
The PV kernel has over 100K larger text size, nearly 40K alone in mm/ and
kernel/. Definitely we don't see the worst of the icache or branch buffer
overhead on this microbenchmark. (wow, that's a nasty amount of bloat :( )
Jeremy, any ideas where this slowdown comes from and how it could be
fixed?
I had a bit of a poke around the profiles, but nothing stood out.
However oprofile counted 50% more cycles in the kernel with PV than with
non-PV. I'll have to take a look at the user/system times, because 50%
seems ludicrous.... hopefully it's just oprofile noise.
kbuild costs go up a bit (average of 30 builds)
elapsed
non-pv: AVG=53.31s STD=0.99
pv: AVG=53.54s STD=0.94
user
non-pv: AVG=318.63s STD=0.19
pv: AVG=319.33s STD=0.23
system
non-pv: AVG=30.56s STD=0.15
pv: AVG=31.80s STD=0.15
kernel side of the kbuild workload slows down by 4.1%. User time also
increases a bit (probably more cache and branch misses).
If you have a Core2 test-system could you please try tip/master, which
also has your do_page_fault-de-bloating patch applied?
Will try to get one to do some runs on.
Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- From: Nick Piggin
- Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- From: Ingo Molnar
- Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- From: Nick Piggin
- Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- From: Ingo Molnar
- lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- Prev by Date: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO
- Next by Date: Re: [PATCH] tcp: splice as many packets as possible at once
- Previous by thread: Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- Next by thread: Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
- Index(es):
Relevant Pages
|