Re: File Fragmentation
- From: Jean-David Beyer <jdbeyer@xxxxxxxxxxx>
- Date: Wed, 07 Dec 2005 07:19:13 -0500
Aragorn wrote:
> On Tuesday 06 December 2005 14:08, Peter T. Breuer stood up and spoke
> the following words to the masses in /comp.os.linux.misc...:/
>
>
>>Jean-David Beyer <jdbeyer@xxxxxxxxxxx> wrote:
>>
>>>>>partitions. I don't study them actively, but I understand them.
>>>>>What you describe is done at the hardware level,
>>>>
>>>>
>>>>No it isn't.
>>>>
>>>
>>>Well some buffering, including read-ahead and write-behind (and
>>>therefore, reordering of actual hardware transfers to and from the
>>>disk platters) _is_ done in some hardware out there.
>>
>>Sure - but classical IDE only understands one outstanding request at a
>>time, so it's not going to get much elevator reordering out of that.
>>
>>Scsi is better - I think 32 queues requests at a time is quite common.
>
>
> As far as I know, the /aic7xxx/ driver can even go as high as 253
> queues. The /aic79xx/ driver can also do this, but the kernel
> developers have it default to 32.
>
On my system, in aic7xxx.h I see:
/*
* The maximum amount of SCB storage in hardware on a controller.
* This value represents an upper bound. Due to software design,
* we may not be able to use this number.
*/
#define AHD_SCB_MAX 512
/*
* The maximum number of concurrent transactions supported per driver instance.
* Sequencer Control Blocks (SCBs) store per-transaction information.
*/
#define AHD_MAX_QUEUE AHD_SCB_MAX
/*
* Define the size of our QIN and QOUT FIFOs. They must be a power of 2
* in size and accommodate as many transactions as can be queued concurrently.
*/
#define AHD_QIN_SIZE AHD_MAX_QUEUE
#define AHD_QOUT_SIZE AHD_MAX_QUEUE
#define AHD_QIN_WRAP(x) ((x) & (AHD_QIN_SIZE-1))
/*
* The maximum amount of SCB storage we allocate in host memory.
*/
#define AHD_SCB_MAX_ALLOC AHD_MAX_QUEUE
/*
* Ring Buffer of incoming target commands.
* We allocate 256 to simplify the logic in the sequencer
* by using the natural wrap point of an 8bit counter.
*/
#define AHD_TMODE_CMDS 256
If you are right about the limit being set to 32 elsewhere, I did not find
it (but I do not really know where to look).
>
>>However, the kernel queues up to 1000 requests at a time, which quite
>>dwarfs the hardware facilities.
But if it could send a bunch of them to the controller at the same time,
they could be done in the order the controller thinks best. I imagine the
controller does first come first serve (except when running multiple
devices, and then first-come-first serve for each device in parallel), and
the disk drives could reorder that at the individual hardware block level
(to reduce rotational latency time delays), and perhaps also at the cylinder
level (to reduce seek delays).
>>
>>Ditto buffering.
>>
>>
>>>This is certainly true of my Maxtor Atlas III Ultra/320 SCSI hard
>>>drives. But I do not believe Linux file systems can rely on them,
>>>since the manufacturers do not disclose their algorithms for doing
>>>this.
>
>
> This is irrelevant, as the elevator is already optimized by the kernel's
> own algorithm. Whatever optimizations the hardware performs after the
> data has hit the SCSI bus for a write operation will be minimal and are
> unimportant to the OS or the filesystem layer.
>
I agree that the kernel's elevator algorithm optimizes the seeking, but it
cannot really optimize rotational latency times (an order of magnitude
slower, I suppose). Let's see.
For my Maxtor Atlas III drives, that have 512-byte hardware sectors, we have:
Average seek time: 4.5 milliseconds
Track to Track: 0.3 milliseconds
Full stroke: 11. milliseconds
Average rotational latency time: 3.0 milliseconds. (Maximum 6.0
milliseconds, obviously).
So if there are 63 hardware sectors per track (I have no idea these days,
but this is what fdisk is assuming), and the controller decides to read the
entire track, then instead of waiting an average of 3 milliseconds, 6
milliseconds worst case, the drive itself can start reading after about 0.1
milliseconds to read the track, starting at the next physical sector it sees
and then reading the rest as it comes by the read head. This should effect a
30x improvement average, 60x improvement worst case, for the actual reading.
So if the averaage IO cost at the hardware level is 4.5 milliseconds seek +
average rotational latency time of 3 milliseconds + time-to-read track of 6
milliseconds, we get a cost of 13.5 milliseconds.
By hardware buffering in the hard drive, we reduce that to 4.5 + 0.1 + 6 for
a total cost of 10.6 milliseconds or about a 22% improvement.
Now the kernel driver's elevator algorithm (or whatever it uses) probably
reduces the effective average seek times. At the most optimistic, it reduces
the seek time to 0.3 milliseconds; i.e., always arranges that the seeks are
just a single track. (I am not sure how it can do that if there are multiple
partitions on a hard drive, but if the elevator algorithm is low enough in
the architecture, partitions may be invisible.) In that case, the numbers
come to
0.3 + 3.0 + 6.0 = 9.3 case where the hard drive does no buffering and
0.3 + 0.1 + 6.0 = 6.4 for the case where the hard drive does buffer. This is
an improvement of about 31%. So I infer that the better the kernel driver
optimizes seeking, the more important the buffering inside the hard drive
really is.
Things get better if the kernel asked for only a single sector (I am not
sure if it ever does that) and the drive reads the entire track. In that
case if subsequent reads wanted other sectors of the track, they would be in
the buffer already and all the latency times would be zero (except transfer
time between buffer and the kernel memory of course).
>
>>>But since there is an 8 Megabyte RAM in each hard drive, they
>>>certainly have the opportunity to notice if sequential blocks are
>>>being read and to read some extra ones just in case. Furthermore,
>>>they know the read size and the blocks requested, and if they are all
>>>on the same cylinder, they can read starting at any convenient
>>>boundary and get them all in one rotation without waiting for the
>>>start of the bunch.
>
>
> Read buffering is done both by the hardware - at least, in SCSI systems,
> that is - and is again done by the kernel. So it will have the best
> chances of being available when needed, i.e. have the lowest cache
> misses. ;-)
>
>
>>[...]
--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 06:40:00 up 10 days, 17:10, 5 users, load average: 4.08, 4.15, 4.16
.
- References:
- File Fragmentation
- From: Cliff Hewitt
- Re: File Fragmentation
- From: stan
- Re: File Fragmentation
- From: Cliff Hewitt
- Re: File Fragmentation
- From: Peter T. Breuer
- Re: File Fragmentation
- From: Cliff Hewitt
- Re: File Fragmentation
- From: Peter T. Breuer
- Re: File Fragmentation
- From: Cliff Hewitt
- Re: File Fragmentation
- From: Peter T. Breuer
- Re: File Fragmentation
- From: Jean-David Beyer
- Re: File Fragmentation
- From: Peter T. Breuer
- Re: File Fragmentation
- From: Aragorn
- File Fragmentation
- Prev by Date: Re: File Fragmentation
- Next by Date: Re: what is umask?
- Previous by thread: Re: File Fragmentation
- Next by thread: Re: File Fragmentation
- Index(es):
Relevant Pages
|