Re: io-scheduler tuning for better read/write ratio



Wu Fengguang schrieb:
On Fri, Jun 26, 2009 at 06:44:06PM +0800, Jens Axboe wrote:
On Fri, Jun 26 2009, Wu Fengguang wrote:
On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote:
Ralf Gross <rg@xxxxxxxxxxxxxxxxxxxxxxx> writes:

Jeff Moyer schrieb:
Jeff Moyer <jmoyer@xxxxxxxxxx> writes:

Ralf Gross <rg@xxxxxxxxxxxxxxxxxxxxxxx> writes:

Casey Dahlin schrieb:
On 06/16/2009 02:40 PM, Ralf Gross wrote:
David Newall schrieb:
Ralf Gross wrote:
write throughput is much higher than the read throughput (40 MB/s
read, 90 MB/s write).

Hm, but I get higher read throughput (160-200 MB/s) if I don't write
to the device at the same time.

Ralf

How specifically are you testing? It could depend a lot on the
particular access patterns you're using to test.

I did the basic tests with tiobench. The real test is a test backup
(bacula) with 2 jobs that create 2 30 GB spool files on that device.
The jobs partially write to the device in parallel. Depending which
spool file reaches the 30 GB first, one starts reading from that file
and writing to tape, while to other is still spooling.

We are missing a lot of details, here. I guess the first thing I'd try
would be bumping up the max_readahead_kb parameter, since I'm guessing
that your backup application isn't driving very deep queue depths. If
that doesn't work, then please provide exact invocations of tiobench
that reprduce the problem or some blktrace output for your real test.

Any news, Ralf?

sorry for the delay. atm there are large backups running and using the
raid device for spooling. So I can't do any tests.

Re. read ahead: I tested different settings from 8Kb to 65Kb, this
didn't help.

I'll do some more tests when the backups are done (3-4 more days).

The default is 128KB, I believe, so it's strange that you would test
smaller values. ;) I would try something along the lines of 1 or 2 MB.

I'm CCing Fengguang in case he has any suggestions.

Jeff, thank you for the forwarding (and sorry for the long delay)!

The read:write (or rather sync:async) ratio control is an IO scheduler
feature. CFQ has parameters slice_sync and slice_async for that.
What's more, CFQ will let async IO wait if there are any in flight
sync IO. This is good, but not quite enough. Normally sync IOs come
one by one, with some small idle time window in between. If we only
start dispatching async IOs after the last sync IO has completed for
eg. 1ms, then we may stop the async background write IOs when there
are active sync foreground read IO stream.

This simple patch aims to address the writes-push-aside-reads problem.
Ralf, you can try applying this patch and run your workload with this
(huge) CFQ parameter:

echo 1000 > /sys/block/sda/queue/iosched/slice_sync

The patch is based on 2.6.30, but can be trivially backported if you
want to use some old kernel.

It may impact overall (sync+async) IO throughput when there are one or
more ongoing sync IO streams, so requires considerable benchmarks and
adjustments.

Thanks,
Fengguang
---

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index a55a9bd..14011b7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
return;

- WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
WARN_ON(cfq_cfqq_slice_new(cfqq));

/*
@@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
* or if we want to idle in case it has no pending requests.
*/
if (cfqd->active_queue == cfqq) {
- const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list);
-
if (cfq_cfqq_slice_new(cfqq)) {
cfq_set_prio_slice(cfqd, cfqq);
cfq_clear_cfqq_slice_new(cfqq);
@@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
*/
if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
cfq_slice_expired(cfqd, 1);
- else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
- sync && !rq_noidle(rq))
+ else if (sync && !rq_noidle(rq) &&
+ !cfq_close_cooperator(cfqd, cfqq, 1))
cfq_arm_slice_timer(cfqd);
}

What's the purpose of this patch? If you have requests pending you don't
want to arm the idle timer and wait, you want to dispatch those.

You are right, please ignore this mindless hacking patch.

Ralf, you can do the read/write ratio in the CFQ scheduler by tuning
the slice_sync/slice_async parameters.

For example,

echo 10 > /sys//block/sda/queue/iosched/slice_async
echo 100 > /sys//block/sda/queue/iosched/slice_sync

gives

-dsk/total-
read writ
66M 25M
65M 20M
49M 32M
84M 19M
46M 28M
61M 23M
55M 25M
67M 23M
76M 18M
46M 31M
56M 29M
54M 23M
76M 20M


writing:

--dsk/md1--
_read _writ
0 150M
0 142M
0 143M
0 112M
0 141M
0 152M
0 132M
0 123M
0 149M


reading:

--dsk/md1--
_read _writ
143M 0
145M 0
160M 0
128M 0
148M 0
140M 0
158M 0
130M 0
122M 0

reading + writing:

--dsk/md1--
_read _writ
55M 76M
41M 83M
64M 81M
64M 83M
63M 68M
56M 117M
41M 61M
64M 87M
64M 69M
61M 87M
67M 81M
64M 33M
63M 68M
56M 76M



while

echo 10 > /sys//block/sda/queue/iosched/slice_async
echo 300 > /sys//block/sda/queue/iosched/slice_sync

gives

-dsk/total-
read writ
102M 11M
82M 10M
100M 12M
86M 10M
95M 11M
102M 3168k
96M 11M
88M 10M
96M 12M

However too large slice_sync may not be desirable.

writing:

--dsk/md1--
_read _writ
0 131M
0 136M
0 145M
0 136M
0 128M
0 150M
0 127M
0 149M
0 127M
0 156M
0 125M
0 142M

reading:

--dsk/md1--
_read _writ
128M 0
160M 0
128M 0
128M 0
160M 0
128M 0
109M 0
128M 0
128M 0
160M 0
128M 0


writing:

--dsk/md1--
_read _writ
0 183M
0 142M
0 137M
0 147M
0 135M
0 147M
0 117M
0 135M
0 156M
0 120M
0 147M
0 135M

reading + writing:

--dsk/md1--
_read _writ
96M 40M
64M 38M
96M 29M
96M 24M
96M 31M
95M 35M
97M 26M
96M 23M
96M 33M
95M 73M
91M 25M


Thanks, this seem to be what I was looking for. I'll change the scheduler
parameter for all spool devices and will run a backup with two concurrent
backups. This will show me if bacula behaves the same as the simple dd test
does.


Ralf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: io-scheduler tuning for better read/write ratio
    ... that your backup application isn't driving very deep queue depths. ... CFQ has parameters slice_sync and slice_async for that. ... Normally sync IOs come ... This simple patch aims to address the writes-push-aside-reads problem. ...
    (Linux-Kernel)
  • Re: io-scheduler tuning for better read/write ratio
    ... but I get higher read throughput if I don't write ... that your backup application isn't driving very deep queue depths. ... Normally sync IOs come ... This simple patch aims to address the writes-push-aside-reads problem. ...
    (Linux-Kernel)
  • syncing or backing up a document
    ... My wife is writing a book and I was wondering if there is a quick and easy ... way to backup or sync the document when she has finished typing. ...
    (microsoft.public.word.newusers)
  • Re: New Device - No Sync
    ... Even before using Live Mesh, I found syncronization of files worked ok, if you didn't sync PIM data so using AS was far less of a hardship once I moved PIM sync to Funambol. ... I think the WinMo guys just haven't had the resources to build a terrific sync product, since the focus has mostly been on enterprise customers and Exchange, which works virtually flawlessly. ... a backup service that offers you unlimited storage space for online backups of your PC's hard drives for $5/month. ...
    (microsoft.public.pocketpc.activesync)
  • Re: Im having two problems with SBS
    ... This happens even though you have not set up any offline ... automatically sync at logout. ... For backup, the built-in program will back up your Exchange Server databases ... individual mailboxes is to have Outlook synchronize the mailboxes to an OST ...
    (microsoft.public.windows.server.sbs)