Re: [PATCH 1/9] io-throttle documentation



On Sat, Apr 18, 2009 at 01:12:45AM +0200, Andrea Righi wrote:
On Fri, Apr 17, 2009 at 01:39:55PM -0400, Vivek Goyal wrote:
On Tue, Apr 14, 2009 at 10:21:12PM +0200, Andrea Righi wrote:

[..]
+4.2. Buffered I/O (write-back) tracking
+
+For buffered writes the scenario is a bit more complex, because the writes in
+the page cache are processed asynchronously by kernel threads (pdflush), using
+a write-back policy. So the real writes to the underlying block devices occur
+in a different I/O context respect to the task that originally generated the
+dirty pages.
+
+The I/O bandwidth controller uses the following solution to resolve this
+problem.
+
+If the operation is a buffered write, we can charge the right cgroup looking at
+the owner of the first page involved in the I/O operation, that gives the
+context that generated the I/O activity at the source. This information can be
+retrieved using the page_cgroup functionality originally provided by the cgroup
+memory controller [4], and now provided specifically by the bio-cgroup
+controller [5].
+
+In this way we can correctly account the I/O cost to the right cgroup, but we
+cannot throttle the current task in this stage, because, in general, it is a
+different task (e.g., pdflush that is processing asynchronously the dirty
+page).
+
+For this reason, all the write-back requests that are not directly submitted by
+the real owner and that need to be throttled are not dispatched immediately in
+submit_bio(). Instead, they are added into an rbtree and processed
+asynchronously by a dedicated kernel thread: kiothrottled.
+

Hi Andrea,

Hi Vivek,


I am trying to go through your patches now and also planning to test it

thanks for trying to test first of all.

out. While reading the documentation async write handling interested
me. IIUC, looks like you are throttling writes once they are being
written to the disk (either by pdflush or in the context of the process
because vm_dirty_ratio crossed etc).

Correct, more exactly in submit_bio().

The difference between synchronous IO and writeback IO is that in the
first case the task itself is throttled via schedule_timeout_killable();
in the second case pdflush is never throttled, the IO requests instead
are simply added into a rbtree and dispatched asynchronously by another
kernel thread (kiothrottled) using a EDF-like scheduling. More exactly,
a deadline is evaluated for each writeback IO request looking at the
cgroup BW and iops/sec limits, then kiothrottled periodically selects
and dispatches the requests with an elapsed deadline.


If that's the case, will a process not see an increased rate of writes
till we are not hitting dirty_background_ratio?

Correct. And this is a good behaviour IMHO. At the same time we have a
smooth BW usage (according to the cgroup limits I mean) even in presence
of writeback IO only.


Secondly, if above is giving acceptable performance resutls, then we
should be able to provide max bw control at IO scheduler level (along
with proportional bw control)?

So instead of doing max bw and proportional bw implementation in two
places with the help of different controllers, I think we can do it
with the help of one controller at one place.

Please do have a look at my patches also to figure out if that's possible
or not. I think it should be possible.

Keeping both at single place should simplify the things.

Absolutely agree to do both proportional and max BW limiting in a single
place. I still need to figure which is the best place, if the IO
scheduler in the elevator, when the IO requests are submitted. A natural
way IMHO is to control the submission of requests, also Andrew seemed to
be convinced about this approach. Anyway, I've already scheduled to test
your patchset and I'd like to see if it's possible to merge our works,
or select the best from ours patchsets.


Hmm..., thinking more about it, it reminded me of one problem though
of doing it at IO scheduler level. Its very hard to provide max bw control
at intermediate logical devices (ex, software raid configurations).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 1/9] io-throttle documentation
    ... +the page cache are processed asynchronously by kernel threads (pdflush), ... +cannot throttle the current task in this stage, because, in general, it is a ... all the write-back requests that are not directly submitted by ... should be able to provide max bw control at IO scheduler level (along ...
    (Linux-Kernel)
  • Re: [dm-devel] Re: dm-ioband: Test results.
    ...   disadvantage that there is no communication with IO scheduler. ... I don't find doing IO control at two layer for this particular reason. ...
    (Linux-Kernel)
  • Re: [patch 0/4] [RFC] Another proportional weight IO controller
    ... I thought that probably we can make this control per request ... I agree with you that 2 layer scheduler *might* invalidate some ... request queue partitioning per cgroup. ... the disk with IO requests and completely use up all of the requests in ...
    (Linux-Kernel)
  • Re: dm-ioband + bio-cgroup benchmarks
    ... I have got excellent results of dm-ioband, that controls the disk I/O ... bandwidth even when it accepts delayed write requests. ... every device we want to control using rules. ...
    (Linux-Kernel)
  • Re: dm-ioband + bio-cgroup benchmarks
    ... I have got excellent results of dm-ioband, ... bandwidth even when it accepts delayed write requests. ... This will provide us the control of releasing ...
    (Linux-Kernel)