Re: [PATCH 1/2] lld busy status exporting interface



Hi Andrew,

On Fri, 19 Sep 2008 14:33:44 -0700, Andrew Morton wrote:
On Fri, 19 Sep 2008 10:48:54 -0400 (EDT)
Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx> wrote:

This patch adds an interface to check lld's busy status
from the block layer.
This resolves a performance problem on request stacking devices below.


Some drivers like scsi mid layer stop dispatching request when
they detect busy state on its low-level device like host/bus/device.
It allows other requests to stay in the I/O scheduler's queue
for a chance of merging.

Request stacking drivers like request-based dm should follow
the same logic.
However, there is no generic interface for the stacked device
to check if the underlying device(s) are busy.
If the request stacking driver dispatches and submits requests to
the busy underlying device, the requests will stay in
the underlying device's queue without a chance of merging.
This causes performance problem on burst I/O load.

With this patch, busy state of the underlying device is exported
via the state flag of queue's backing_dev_info. So the request
stacking driver can check it and stop dispatching requests if busy.

The underlying device driver must set/clear the flag appropriately:
ON: when the device driver can't process requests immediately.
OFF: when the device driver can process requests immediately,
including abnormal situations where the device driver needs
to kill all requests.


Signed-off-by: Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx>
Signed-off-by: Jun'ichi Nomura <j-nomura@xxxxxxxxxxxxx>
Cc: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
---
include/linux/backing-dev.h | 8 ++++++++
mm/backing-dev.c | 12 ++++++++++++
2 files changed, 20 insertions(+)

Index: scsi-misc-2.6/include/linux/backing-dev.h
===================================================================
--- scsi-misc-2.6.orig/include/linux/backing-dev.h
+++ scsi-misc-2.6/include/linux/backing-dev.h
@@ -26,6 +26,7 @@ enum bdi_state {
BDI_pdflush, /* A pdflush thread is working this device */
BDI_write_congested, /* The write queue is getting full */
BDI_read_congested, /* The read queue is getting full */
+ BDI_lld_congested, /* The device/host is busy */
BDI_unused, /* Available bits start here */
};

@@ -226,8 +227,15 @@ static inline int bdi_rw_congested(struc
(1 << BDI_write_congested));
}

+static inline int bdi_lld_congested(struct backing_dev_info *bdi)
+{
+ return bdi_congested(bdi, 1 << BDI_lld_congested);
+}
+
void clear_bdi_congested(struct backing_dev_info *bdi, int rw);
void set_bdi_congested(struct backing_dev_info *bdi, int rw);
+void clear_bdi_lld_congested(struct backing_dev_info *bdi);
+void set_bdi_lld_congested(struct backing_dev_info *bdi);
long congestion_wait(int rw, long timeout);


Index: scsi-misc-2.6/mm/backing-dev.c
===================================================================
--- scsi-misc-2.6.orig/mm/backing-dev.c
+++ scsi-misc-2.6/mm/backing-dev.c
@@ -279,6 +279,18 @@ void set_bdi_congested(struct backing_de
}
EXPORT_SYMBOL(set_bdi_congested);

+void clear_bdi_lld_congested(struct backing_dev_info *bdi)
+{
+ clear_bit(BDI_lld_congested, &bdi->state);
+}
+EXPORT_SYMBOL_GPL(clear_bdi_lld_congested);
+
+void set_bdi_lld_congested(struct backing_dev_info *bdi)
+{
+ set_bit(BDI_lld_congested, &bdi->state);
+}
+EXPORT_SYMBOL_GPL(set_bdi_lld_congested);
+
/**
* congestion_wait - wait for a backing_dev to become uncongested
* @rw: READ or WRITE

Is this really the right way to do it?

I think so, but I may not understand what you mean correctly.
So please elaborate your concern if my explanation below doesn't
satisfy what you want to know.


Back in the days when we first did the backing_dev_info.congested_fn()
logic it was decided that there basically was no single place in which
the congested state could be stored.

So we ended up deciding that whenever a caller wants to know a
backing_dev's congested status, it has to call in to the
->congested_fn() and that congested_fn would then call down into all
the constituent low-level drivers/queues/etc asking each one if it is
congested.

bdi_lld_congested() also does that using bdi_congested(), which calls
->congested_fn().
And only real device drivers (e.g. scsi, ide) set/clear the flag.
Stacking drivers like request-based dm don't.
So stacking drivers always check the BDI_lld_congested flag of
the bottom device of the device stack.

BDI_[write|read]_congested flags have been using for queue's
congestion, so that I/O queueing/merging can be proceeded even if
the lld is congested. So I added a new flag.


I mean, as a simple example which is probably wrong - what happens if a
single backing_dev is implemented via two different disks and
controllers, and they both become congested and then one of them comes
uncongested. Is there no way in which the above implemention can
incorrectly flag the backing_dev as being uncongested?

Do you mean that "a single backing_dev via two disks/controllers" is
a dm device (e.g. a dm-multipath device having 2 paths, a dm-mirror
device having 2 disks)?

If so, dm doesn't set/clear the flag, and the decision, whether
the dm device itself is congested or not, depends on dm's target driver.

For example of dm-multipath,
o call bdi_lld_congested() for each path.
o if one of the paths are uncongested, dm-multipath will decide
the dm device is uncongested and dispatch incoming I/Os to
the uncongested path.

For example of dm-mirror,
o call bdi_lld_congested() for each disk.
o if the incoming I/O is READ, same logic as dm-multipath above.
if the incoming I/O is WRITE, dm-mirror will decide the dm device
is uncongested only when all disks are uncongested.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 1/2] lld busy status exporting interface
    ... This resolves a performance problem on request stacking devices below. ... they detect busy state on its low-level device like host/bus/device. ... If the request stacking driver dispatches and submits requests to ... stacking driver can check it and stop dispatching requests if busy. ...
    (Linux-Kernel)
  • Re: which kind of driver to design for AoE protocol?
    ... With your idea of having requests distributed to ... We do have working ATA-over-Ethernet driver shipping. ... drivers have been very concerned about overlapping ... overlapping of CPU operations to really make things fly. ...
    (microsoft.public.development.device.drivers)
  • Re: [dm-devel] Re: dm: bounce_pfn limit added
    ... I2o on this node is able to handle only requests with up to 38 segments. ... In this way underlying device calls blk_recalc_rq_segmentsto recount number ... Unfortunately there is not any checks and when i2o driver handles this incorrect ... the patch propagates the restriction of i2o ...
    (Linux-Kernel)
  • Re: [PATCH] bitmap: fix bitmap_find_free_region()
    ... fixed size coherent region in the board code at start-up, ... the driver allocates depends on what a user-space application requests - ... And the driver does not check ... does if after scanning the bitmap no suitable free region is found? ...
    (Linux-Kernel)
  • [PATCH 1/2] lld busy status exporting interface
    ... This resolves a performance problem on request stacking devices below. ... they detect busy state on its low-level device like host/bus/device. ... If the request stacking driver dispatches and submits requests to ... stacking driver can check it and stop dispatching requests if busy. ...
    (Linux-Kernel)