Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- From: Wu Fengguang <fengguang.wu@xxxxxxxxx>
- Date: Wed, 9 Sep 2009 17:29:01 +0800
On Tue, Sep 08, 2009 at 12:29:36PM -0400, Chris Mason wrote:
On Tue, Sep 08, 2009 at 06:06:23PM +0200, Peter Zijlstra wrote:
On Tue, 2009-09-08 at 13:37 +0300, Artem Bityutskiy wrote:
Hi,
On 09/08/2009 12:23 PM, Jens Axboe wrote:
From: Theodore Ts'o<tytso@xxxxxxx>
Originally, MAX_WRITEBACK_PAGES was hard-coded to 1024 because of a
concern of not holding I_SYNC for too long. (At least, that was the
comment previously.) This doesn't make sense now because the only
time we wait for I_SYNC is if we are calling sync or fsync, and in
that case we need to write out all of the data anyway. Previously
there may have been other code paths that waited on I_SYNC, but not
any more.
According to Christoph, the current writeback size is way too small,
and XFS had a hack that bumped out nr_to_write to four times the value
sent by the VM to be able to saturate medium-sized RAID arrays. This
value was also problematic for ext4 as well, as it caused large files
to be come interleaved on disk by in 8 megabyte chunks (we bumped up
the nr_to_write by a factor of two).
So, in this patch, we make the MAX_WRITEBACK_PAGES a tunable,
max_writeback_mb, and set it to a default value of 128 megabytes.
http://bugzilla.kernel.org/show_bug.cgi?id=13930
Signed-off-by: "Theodore Ts'o"<tytso@xxxxxxx>
Signed-off-by: Jens Axboe<jens.axboe@xxxxxxxxxx>
Would be nice to update doc files like
Documentation/sysctl/vm.txt
Documentation/filesystems/proc.txt
I'm still not convinced this knob is worth the patch and I'm inclined to
flat out NAK it..
The whole point of MAX_WRITEBACK_PAGES seems to occasionally check the
dirty stats again and not write out too much.
The problem is that 'too much' is a very abstract thing. When a process
is stuck in balance_dirty_pages, we want them to do the minimal amount
of work (or waiting) required to get them safely back inside file_write().
It seems that balance_dirty_pages() is not coupled with MAX_WRITEBACK_PAGES.
Instead it uses the much smaller (ratelimit_pages + ratelimit_pages / 2).
So I feel that we could just increase MAX_WRITEBACK_PAGES. It won't
lead to bumpy throttled writes. It does affect fairness of background
writes, ie. small files will have to wait more time for large files.
But I'm fine with MAX_WRITEBACK_PAGES=64MB, which means for desktop
that a large file may only delay others for 1 second, which is small
enough comparing to the 30 second dirty expire time.
On the other hand, I find that the (ratelimit_pages + ratelimit_pages / 2)
used for balance_dirty_pages() may fall below the real number of
dirtied pages, which is not safe if some filesystem choose to dirty
2 * ratelimit_pages before calling balance_dirty_pages_ratelimited_nr().
So, how about this patch?
Thanks,
Fengguang
---
writeback: balance_dirty_pages() shall write more than dirtied pages
Some filesystem may choose to write much more than ratelimit_pages
before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
determine number to write based on real number of dirtied pages.
Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
---
mm/page-writeback.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- linux.orig/mm/page-writeback.c 2009-09-09 16:53:15.000000000 +0800
+++ linux/mm/page-writeback.c 2009-09-09 17:05:14.000000000 +0800
@@ -44,12 +44,12 @@ static long ratelimit_pages = 32;
/*
* When balance_dirty_pages decides that the caller needs to perform some
* non-background writeback, this is how many pages it will attempt to write.
- * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably
+ * It should be somewhat larger than dirtied pages to ensure that reasonably
* large amounts of I/O are submitted.
*/
-static inline long sync_writeback_pages(void)
+static inline long sync_writeback_pages(unsigned long dirtied)
{
- return ratelimit_pages + ratelimit_pages / 2;
+ return dirtied + dirtied / 2;
}
/* The following parameters are exported via /proc/sys/vm */
@@ -481,7 +481,8 @@ get_dirty_limits(unsigned long *pbackgro
* If we're over `background_thresh' then pdflush is woken to perform some
* writeout.
*/
-static void balance_dirty_pages(struct address_space *mapping)
+static void balance_dirty_pages(struct address_space *mapping,
+ unsigned long write_chunk)
{
long nr_reclaimable, bdi_nr_reclaimable;
long nr_writeback, bdi_nr_writeback;
@@ -489,7 +490,6 @@ static void balance_dirty_pages(struct a
unsigned long dirty_thresh;
unsigned long bdi_thresh;
unsigned long pages_written = 0;
- unsigned long write_chunk = sync_writeback_pages();
struct backing_dev_info *bdi = mapping->backing_dev_info;
@@ -634,9 +634,10 @@ void balance_dirty_pages_ratelimited_nr(
p = &__get_cpu_var(ratelimits);
*p += nr_pages_dirtied;
if (unlikely(*p >= ratelimit)) {
+ ratelimit = sync_writeback_pages(*p);
*p = 0;
preempt_enable();
- balance_dirty_pages(mapping);
+ balance_dirty_pages(mapping, ratelimit);
return;
}
preempt_enable();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- From: Christoph Hellwig
- Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- References:
- [PATCH 0/8] Per-bdi writeback flusher threads v19
- From: Jens Axboe
- [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- From: Jens Axboe
- Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- From: Artem Bityutskiy
- Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- From: Peter Zijlstra
- Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- From: Chris Mason
- [PATCH 0/8] Per-bdi writeback flusher threads v19
- Prev by Date: Re: Zaurus suspend saga
- Next by Date: Re: mmotm 2009-09-03-16-35 uploaded
- Previous by thread: Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- Next by thread: Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
- Index(es):
Relevant Pages
|