Re: [RFC] LZO de/compression support - take 6
- From: Daniel Hazelton <dhazelton@xxxxxxxxx>
- Date: Mon, 28 May 2007 16:52:09 -0400
On Monday 28 May 2007 16:18:40 Daniel Hazelton wrote:
On Monday 28 May 2007 13:11:15 Adrian Bunk wrote:
On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
On 5/28/07, Adrian Bunk <bunk@xxxxxxxxx> wrote:
...
- then ensure that it works correctly on all architectures and
Already tested on x86, amd64, ppc (by Bret). I do not have machines
from other archs available. Bret tested 'take 3' version but no
changes were introduced in further revisions that could affect
correctness - but still it will be good to have this version tested
too. Only with inclusion in -mm and testing by much wider user base
can make it to mainline (I suppose nobody uses -mm for production use
anyway).
document why your version is that much faster than the original
version and why you know your optimizations have no side effects
With likely(), unlikely() and noinline *not* defined as NOP's performance
drops:
10000 run averages:
'Tiny LZO':
Combined: 84.9292 usec
Compression: 42.4646 usec
Decompression: 42.4646 usec
'miniLZO':
Combined: 61.3548 usec
Compression: 43.5648 usec
Decompression: 17.79 usec
However, I'm worried that my testbed code - likely the Perl script that
actually loops the test code and collects its output - is somehow faulting,
as the way that the Compression and Decompression code have the exact same
value.
I'm going to toss some debugging output in the script and see if I can spot
the problem.
Okay, checked my code and it was all a problem with cpu_to_le16 getting
defined fully instead of just being a NOP. With that problem corrected the
performance returns:
10000 run averages:
'Tiny LZO':
Combined: 60.1028 usec
Compression: 42.0652 usec
Decompression: 18.0376 usec
'miniLZO':
Combined: 61.0932 usec
Compression: 43.4382 usec
Decompression: 17.655 usec
Combined average shows 'Tiny' to be 1.6% faster
Compression in 'Tiny' is 3.2% faster
Decompression in 'Tiny' is 2.2% slower
All in all, the trade-off in this code, with the overall performance of
the 'Tiny' code being faster than the stock miniLZO code I'd like to say that
I'm certain that the decompressor could probably be sped up more, although I
don't know of a place in the kernel where less than half a microsecond would
make a massive impact. (Only place I can think of where this might have a
negative is in SLAB/SLOB/SLUB, and I don't think that a low-level memory
manager like those is a place for a compression algorithm anyway)
Later today or tommorrow I'll start putting together another part to
this "test-bed" for stress-testing the algorithm. Currently I only have a few
ideas for tests:
(for benchmarking)
1) Random input to compressor
2) large input data
3) real-world data (mmap()'d chunk of /dev/mem as input)
(for stability checking)
4) deliberate corruption of compressed data
5) early finish (realloc() compressed data buffer to less than the full size
of the data)
6) late start (change pointer to start of compressed data so the decompressor
doesn't start at the beginning)
When I have a better idea of how the LZO algorithm works I might try creating
an input data-set for the decompressor that would deliberately overflow the
output buffer. This is supposed to be caught by bounds-checking in
the '_safe' version of the decompressor (the version used in 'Tiny' and
used in the benchmarking of miniLZO), however it might be possible to trick
the decompressor. (If I do manage this and it crashes both 'Tiny'
and 'miniLZO' I *will* report it to Herr Oberhumer as well as see if I can
come up with a quick patch for the problem).
As I said before, any suggestions for how to improve the benchmark code are
welcome, as are suggestions for tests I can add to it for stressing the
new 'TinyLZO' implementation. Latest version of the test-bed attached.
Significant changes include:
likely(), unlikely() and noinline are no longer NOP's
when the marked line in helpers.h is commented out, cpu_to_le16 functions as
expected and is no longer a NOP. (Yes, that means this code is now fully
capable of working correctly on a BE system).
DRH
Attachment:
lzo1x-test-bed-5.tar.bz2
Description: application/tbz
- References:
- [RFC] LZO de/compression support - take 6
- From: Nitin Gupta
- Re: [RFC] LZO de/compression support - take 6
- From: Adrian Bunk
- Re: [RFC] LZO de/compression support - take 6
- From: Daniel Hazelton
- [RFC] LZO de/compression support - take 6
- Prev by Date: Re: [RFC][PATCH -mm 3/3] PM: Disable _request_firmware before hibernation/suspend
- Next by Date: Re: stuff ready to be deleted?
- Previous by thread: Re: [RFC] LZO de/compression support - take 6
- Next by thread: Re: [RFC] LZO de/compression support - take 6
- Index(es):
Relevant Pages
|
Loading