Re: RFC: I/O bandwidth controller (was Re: Too many I/O controller patches)



Hi Andrea!

On Thu, 2008-08-07 at 09:46 +0200, Andrea Righi wrote:
Fernando Luis Vázquez Cao wrote:
This RFC ended up being a bit longer than I had originally intended, but
hopefully it will serve as the start of a fruitful discussion.

Thanks for posting this detailed RFC! A few comments below.

As you pointed out, it seems that there is not much consensus building
going on, but that does not mean there is a lack of interest. To get the
ball rolling it is probably a good idea to clarify the state of things
and try to establish what we are trying to accomplish.

*** State of things in the mainstream kernel<BR>
The kernel has had somewhat adavanced I/O control capabilities for quite
some time now: CFQ. But the current CFQ has some problems:
- I/O priority can be set by PID, PGRP, or UID, but...
- ...all the processes that fall within the same class/priority are
scheduled together and arbitrary grouping are not possible.
- Buffered I/O is not handled properly.
- CFQ's IO priority is an attribute of a process that affects all
devices it sends I/O requests to. In other words, with the current
implementation it is not possible to assign per-device IO priorities to
a task.

*** Goals
1. Cgroups-aware I/O scheduling (being able to define arbitrary
groupings of processes and treat each group as a single scheduling
entity).
2. Being able to perform I/O bandwidth control independently on each
device.
3. I/O bandwidth shaping.
4. Scheduler-independent I/O bandwidth control.
5. Usable with stacking devices (md, dm and other devices of that
ilk).
6. I/O tracking (handle buffered and asynchronous I/O properly).

The same above also for IO operations/sec (bandwidth intended not only
in terms of bytes/sec), plus:

7. Optimal bandwidth usage: allow to exceed the IO limits to take
advantage of free/unused IO resources (i.e. allow "bursts" when the
whole physical bandwidth for a block device is not fully used and then
"throttle" again when IO from unlimited cgroups comes into place)

8. "fair throttling": avoid to throttle always the same task within a
cgroup, but try to distribute the throttling among all the tasks
belonging to the throttle cgroup

Thank you for the ideas!

By the way, point "3." above (I/O bandwidth shaping) refers to IO
scheduling algorithms in general. When I wrote the RFC I thought that
once we have the IO tracking and accounting mechanisms in place choosing
and implementing an algorithm (fair throttling, proportional bandwidth
scheduling, etc) would be easy in comparison, and that is the reason a
list was not included.

Once I get more feedback from all of you I will resend a more detailed
RFC that will include your suggestions.

1. & 2.- Cgroups-aware I/O scheduling (being able to define arbitrary
groupings of processes and treat each group as a single scheduling
identity)

We obviously need this because our final goal is to be able to control
the IO generated by a Linux container. The good news is that we already
have the cgroups infrastructure so, regarding this problem, we would
just have to transform our I/O bandwidth controller into a cgroup
subsystem.

This seems to be the easiest part, but the current cgroups
infrastructure has some limitations when it comes to dealing with block
devices: impossibility of creating/removing certain control structures
dynamically and hardcoding of subsystems (i.e. resource controllers).
This makes it difficult to handle block devices that can be hotplugged
and go away at any time (this applies not only to usb storage but also
to some SATA and SCSI devices). To cope with this situation properly we
would need hotplug support in cgroups, but, as suggested before and
discussed in the past (see (0) below), there are some limitations.

Even in the non-hotplug case it would be nice if we could treat each
block I/O device as an independent resource, which means we could do
things like allocating I/O bandwidth on a per-device basis. As long as
performance is not compromised too much, adding some kind of basic
hotplug support to cgroups is probably worth it.

(0) http://lkml.org/lkml/2008/5/21/12

What about using major,minor numbers to identify each device and account
IO statistics? If a device is unplugged we could reset IO statistics
and/or remove IO limitations for that device from userspace (i.e. by a
deamon), but pluggin/unplugging the device would not be blocked/affected
in any case. Or am I oversimplifying the problem?
If a resource we want to control (a block device in this case) is
hot-plugged/unplugged the corresponding cgroup-related structures inside
the kernel need to be allocated/freed dynamically, respectively. The
problem is that this is not always possible. For example, with the
current implementation of cgroups it is not possible to treat each block
device as a different cgroup subsytem/resource controlled, because
subsystems are created at compile time.

3. & 4. & 5. - I/O bandwidth shaping & General design aspects

The implementation of an I/O scheduling algorithm is to a certain extent
influenced by what we are trying to achieve in terms of I/O bandwidth
shaping, but, as discussed below, the required accuracy can determine
the layer where the I/O controller has to reside. Off the top of my
head, there are three basic operations we may want perform:
- I/O nice prioritization: ionice-like approach.
- Proportional bandwidth scheduling: each process/group of processes
has a weight that determines the share of bandwidth they receive.
- I/O limiting: set an upper limit to the bandwidth a group of tasks
can use.

Use a deadline-based IO scheduling could be an interesting path to be
explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
requirements.
Please note that the only thing we can do is to guarantee minimum
bandwidth requirement when there is contention for an IO resource, which
is precisely what a proportional bandwidth scheduler does. An I missing
something?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: RFC: I/O bandwidth controller (was Re: Too many I/O controller patches)
    ... things like allocating I/O bandwidth on a per-device basis. ... hotplug support to cgroups is probably worth it. ... The whole subsystem is created at compile time, but controller data ...
    (Linux-Kernel)
  • [PATCH 2/9] dm-ioband-1.12.1: All-in-one patch
    ... dm-ioband gives bandwidth to each job according ... Dm-ioband can work with any type of I/O scheduler such as the NOOP ... The bandwidth of each job is determined by a bandwidth control policy. ...
    (Linux-Kernel)
  • [PATCH 2/9] dm-ioband-1.13.0: All-in-one patch
    ... dm-ioband gives bandwidth to each job according ... Dm-ioband can work with any type of I/O scheduler such as the NOOP ... The bandwidth of each job is determined by a bandwidth control policy. ...
    (Linux-Kernel)
  • [PATCH 2/9] dm-ioband-1.14.0: All-in-one patch
    ... dm-ioband gives bandwidth to each job according ... Dm-ioband can work with any type of I/O scheduler such as the NOOP ... The bandwidth of each job is determined by a bandwidth control policy. ...
    (Linux-Kernel)
  • [PATCH 1/2] dm-ioband: I/O bandwidth controller v1.12.0: main part
    ... bandwidth controller implemented as a device-mapper driver and can ... The major change of this release is that a new bandwidth control ... and it allows users to set a minimum and maximum I/O bandwidth ... Each ioband group has its own weight and tokens. ...
    (Linux-Kernel)