Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR



James Bottomley wrote:
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote:
..
One thing that could be even better than the patch below,
would be to have it perhaps skip the entire bio that includes
the failed sector, rather than only the bad sector itself.

Er ... define "skip over the bio". A bio is simply a block
representation for a bunch of sg elements coming in to the elevator.

Exactly. Or rather, a block of sg_elements from a single point
of request, is it not?

Mostly what we see in SCSI is a single bio per request, so skipping the
bio is really the current behaviour (to fail the rest of the request).

Very good. That's what it's supposed to do.

But if each request contained only a single bio, then all of Jens'
work on IO scheduling would be for nothing, n'est-ce pas?

In the case where a request consists of multiple bio's
which have been merged under a single request struct,
we really should give at least one attempt to each bio.

This way, in most cases, only the process that requested the
failed sector(s) will see an error, not the innocent victims
that happened to get merged onto the end. Which could be very
critical stuff (or not -- it could be quite random).

So the time factor works out to one disk I/O timeout per failed bio.
That's what would have happened with the NOP scheduler anyway.

On the sytems I'm working with, I don't see huge numbers of bad sectors.
What they tend to show is just one or two bad sectors, widely scattered.

So:
I think doing that might address most concerns expressed here.
Have you got an alternate suggestion, James?

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [Q] Bio traversal trouble?
    ... unable to handle kernel paging request at virtual address 8c1d2071 ... Since the last time this driver was posted it changed considerably and as one of the chances it's now requesting just one hardware sector at a time from the drive as requesting multiple didn't actually work -- I seemed to have fouled up earlier tests somehow. ... I'd also simply like to understand it, so this is doing a manual bio traversal, requesting frames from the hardware as it goes along. ...
    (Linux-Kernel)
  • Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
    ... The SCSI EH handling of that was rather poor at the time, ... Now that we know which sector failed, we ought to be able to skip ... request size on a filesystem is around 64-128kb; ... define "skip over the bio". ...
    (Linux-Kernel)
  • Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
    ... James Bottomley wrote: ... and then fail the entire remaining portions of the request. ... Now that we know which sector failed, we ought to be able to skip ...
    (Linux-Kernel)
  • [RFC PATCH 0/8] rqbased-dm: request-based device-mapper
    ... I'm working on device-mapper multipath. ... and enables mapping at request level instead of bio level. ... The patch could be a basis of better dynamic load balancing. ... I/O mapping after bio merged is needed for better ...
    (Linux-Kernel)
  • Re: about TRIM/DISCARD support and barriers
    ... I just mean writes _to the same sector_. ... My main worry is that this will add considerable overhead to request ... For the rbtree based sorting, we'd have to do a rb_next/rb_prev ... overlap checking if one of the requests is a discard. ...
    (Linux-Kernel)