Re: Newbie question about defragmenting
From: Lew Pitcher (lpitcher_at_sympatico.ca)
Date: 02/20/05
- Next message: Guido Draheim: "Re: 9.2 - galeon"
- Previous message: BenGman: "major error with trying to dual boot"
- In reply to: canalegrnade_at_myway.com: "Newbie question about defragmenting"
- Next in thread: Kevin Nathan: "Re: Newbie question about defragmenting"
- Reply: Kevin Nathan: "Re: Newbie question about defragmenting"
- Reply: Brian: "Re: Newbie question about defragmenting"
- Reply: Timo Pirinen: "Re: Newbie question about defragmenting"
- Reply: Arthur Hagen: "Re: Newbie question about defragmenting"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 20 Feb 2005 11:05:33 -0500
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
canalegrnade@myway.com wrote:
> Hi,
> I grew up with windows file systems FAT16 / 32, and NTFS versions.
[snip]
> So I am wondering if reiser, ext2 and ext3 needs something similar. Or
> are these fs more advanced?
Well, it looks like it's time that I reposted my stock 'defragmentation'
reply. Forgive me if this is too detailed; the basic answer is "Fragmentation
isn't an issue with ext2, ext3 or reiserfs filesystems. It happens (which is
to be expected with /any/ filesystem), but has little or no impact on
performance."
Now, on to the stock answer....
In a single-user, single-tasking OS, it's best to keep all the data blocks for
a given file together, because most of the disk accesses over a given period
of time will be against a single file. In this scenario, the read-write heads
of your HD advance sequentially through the hard disk. In the same sort of
system, if your file is fragmented, the read-write heads jump all over the
place, adding seek time to the hard disk access time.
In a multi-user, multi-tasking, multi-threaded OS, many files are being
accessed at any time, and, if left unregulated, the disk read-write heads
would jump all over the place all the time. Even with 'defragmented' files,
there would be as much seek-time delay as there would be with a single-user
single-tasking OS and fragmented files.
Fortunately, multi-user, multi-tasking, multi-threaded OSs are usually built
smarter than that. Since file access is multiplexed from the point of view of
the device (multiple file accesses from multiple, unrelated processes, with no
order imposed on the sequence of blocks requested), the device driver
incorporates logic to accomodate the performance hits, like reordering the
requests into something sensible for the device (i.e an "elevator" algorithm
or the like).
In other words, fragmentation is a concern when one (and only one) process
access data from one (and only one) file. When more than one file is involved,
the disk addresses being requested are 'fragmented' with respect to the
sequence that the driver has to service them, and thus it doesn't matter to
the device driver whether or not a file was fragmented.
To illustrate:
I have two programs executing simultaneously, each reading two different files.
The files are organized sequentially (unfragmented) on disk...
1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 4.4
Program 1 reads file 1, block 1
file 1, block 2
file 2, block 1
file 2, block 2
file 2, block 3
file 1, block 3
Program 2 reads file 3, block 1
file 4, block 1
file 3, block 2
file 4, block 2
file 3, block 3
file 4, block 4
The OS scheduler causes the programs to be scheduled and executed such that
the device driver receives requests
file 3, block 1
file 1, block 1
file 4, block 1
file 1, block 2
file 3, block 2
file 2, block 1
file 4, block 2
file 2, block 2
file 3, block 3
file 2, block 3
file 4, block 4
file 1, block 3
Graphically, this looks like...
1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 4.4
$------------------------------>:3.1:
:1.1:<--------------------------'
`----------------------------------------->:4.1:
:1.2:<------------------------------------'
`-------------------------->:3.2:
:2.1:<----------------'
`------------------------------->:4.2:
:2.2:<--------------------------'
`---------------->:3.3:
:2.3:<-----------'
`------------------------------->:4.4:
:1.3:<---------------------------------------------'
As you can see, the accesses are already 'fragmented' and we haven't even
reached the disk yet (up to this point, the access have been against 'logical'
addresses). I have to stress this, the above situation is no different from an
MSDOS single file physical access against a fragmented file.
So, how do we minimize the effect seen above? If you are MSDOS, you reorder
the blocks on disk to match the (presumed) order in which they will be
requested. On the other hand, if you are Linux, you reorder the requests into
a regular sequence that minimizes disk access using something like an elevator
algorithm. You also read ahead on the drive (optimizing disk access), buffer
most of the file data in memory, and you only write dirty blocks. In other
words, you minimize the effect of 'file fragmentation' as part of the other
optimizations you perform on the access requests before you execute them.
Now, this is not to say that 'file fragmentation' is a good thing. It's just
that 'file fragmentation' doesn't have the impact here that it would have in
MSDOS-based systems. The performance difference between a 'file fragmented'
Linux file system and a 'file unfragmented' Linux file system is minimal to
none, where the same performance difference under MSDOS would be huge.
Under the right circumstances, fragmentation is a neutral thing, neither bad
nor good. As to defraging a Linux filesystem (ext2fs), there are tools
available, but (because of the design of the system) these tools are rarely
(if ever) needed or used. That's the impact of designing up front the
multi-processing/multi-tasking multi-user capacity of the OS into it's
facilities, rather than tacking multi-processing/multi-tasking multi-user
support on to an inherently single-processing/single-tasking single-user system.
Peter T Bruer's comments
And, I'll add Peter T Breuer's <ptb@lab.it.uc3m.es> comments from
Message-ID: <lo73t9.bdt.ln@news.it.uc3m.es>, posted on
Wed, 05 Dec 2001 23:52:52 GMT ...
All "fragmented" drives are better than "unfragmented" ones on a multiuser
multitasking o/s. The point is that the machine is doing many things
simultaneously, so it has to jump arround even if one task is interested in
only one file. Tehre will be up to a hundred tasks doing i/o simultaneously.
Yes, all disk drivers use elevator algorithms, in any o/s.
But to answer your question, ext2s spreads blocks out evenly through the disk,
using various strategies (well, a single mixed strategy).. This reduces the
average seek time on a single elevator pass.
Peter
Eric P. McCoy's comments
And I'll conclude with Eric P. McCoy's <ctr2sprt@yahoo.com> comments from
Message-ID: <87wv019qqt.fsf@providence.local>, posted on
Wed, 05 Dec 2001 23:52:52 GMT ...
"Linux filesystems" is a little misleading. e2fs doesn't generally have
fragmentation issues, for certain definitions of "fragmentation."
The short answer is this: e2fs splits the disks up into block groups, which
are contiguous regions of blocks. The group will contain a certain number of
inodes and (data) blocks. When you create an inode, Linux probably chooses the
group with the largest number of free (data) blocks. When you write to an
inode, Linux will preferentially allocate (data) blocks in the same group as
the inode. When it has to, it will move on to another (later) group, but will
still try to keep the blocks together.
The end result of this is that data is generally fragmented by only a few
blocks, and almost always travels in the same direction. That's as opposed to
the front-to-back fragmentation which could, and frequently did, occur in FAT
and its derivatives.
The above works great until the file system is nearly full, at which point
free blocks are scattered all across the disk is discontiguous locations. This
is why, on a nearly-full file system (above 95% or so), e2fs performance will
degrade substantially.
Other file systems (HPFS in particular) are similar, but call groups "bands"
or "stripes" instead. HPFS is actually worse than e2fs when nearly full,
because it uses pseudo B-trees for the directory structure which periodically
need to be rebalanced. The problem there is that, when the file system is
nearly full, directories may need to be rebalanced into many different groups,
which will obviously cause enormous slowdowns. e2fs uses a crummy, paleolithic
array for its directories, which results in far worse performance overall, but
wins out in this one narrow case (or can, depending on what's done to the
directory).
Sorry, but most people on this group know better than to mention "file
systems" and "explain" in the same sentence when I am around.
Eric McCoy <ctr2sprt@yahoo.com>
- --
Lew Pitcher
Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFCGLVNagVFX4UWr64RAkziAJ9TSnbQG6L/LII+S+G02nJUlPLsvwCgv4FY
mIGpAhqokuCu+cDT0UcBOTw=
=zpTQ
-----END PGP SIGNATURE-----
- Next message: Guido Draheim: "Re: 9.2 - galeon"
- Previous message: BenGman: "major error with trying to dual boot"
- In reply to: canalegrnade_at_myway.com: "Newbie question about defragmenting"
- Next in thread: Kevin Nathan: "Re: Newbie question about defragmenting"
- Reply: Kevin Nathan: "Re: Newbie question about defragmenting"
- Reply: Brian: "Re: Newbie question about defragmenting"
- Reply: Timo Pirinen: "Re: Newbie question about defragmenting"
- Reply: Arthur Hagen: "Re: Newbie question about defragmenting"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|