Re: Caching control
- From: phil-news-nospam@xxxxxxxx
- Date: 6 Mar 2009 19:32:01 GMT
On Wed, 4 Mar 2009 12:01:33 -0800 (PST) David Schwartz <davids@xxxxxxxxxxxxx> wrote:
| On Mar 2, 7:37?pm, phil-news-nos...@xxxxxxxx wrote:
|
|> A feature that would be useful is one where the level of caching can be
|> controlled for a descriptor. ?An ioctl() call should be fine. ?The value
|> given would specify the maximum number of page-size units of caching to
|> keep for the descriptor. ?This would be a _maximum_ and the kernel would
|> be allowed to cache less than this amount (which would happen if it does
|> the actual physical I/O faster than the process calls write). ?But if the
|> process does call write() more often, the kernel would prevent more from
|> being written by blocking that write() call.
|
| There's kind of an unwritten rule that when you propose a new feature,
| you have to propose at least one use case. The more reasonable the use
| case, and the more awful the best solution that doesn't require a new
| feature, the better your proposal.
Don't forget that other rule: you also have to implement it yourself
It's a silly rule, but it is asserted for jusy about every feature suggestion.
It's silly because not everyone has the background knowledge of all the other
code involved to accomplish the implementation anywhere nearly as quickly as
someone who does have that background. Or, maybe it is the case that Linux
is simply unable to do this.
| Is there some problem that this solves best? Or is there a whole class
| of problems that this solves better than everything else? If not, the
| feature wouldn't be useful.
The problem it solves is to avoid flooding RAM with a large number of pages
of data that are merely going to be written to disk. When a program is going
to write a large amount of data (at least 2 times as much as there is RAM,
and maybe a lot more), there is no point in caching that data beyond just
enough to keep the I/O rate going at full speed. What happens is that this
flooding of RAM with useless cache causes other processes to be swapped out.
That act of swapping out, and back in again, slows everything down, and the
total amount of work that can get done is reduced. If the swap space is on
the same I/O channel, or even on the same disk drive, as where the bulk data
is being written, it slows down that data writing, too.
One use case is populating a disk with an initial system install, using a
formatted and mounted filesystem, and a stream of files coming from somewhere.
More pages will need to be cached for this use case to gain advantages of the
elevator logic for ordering disk writes. But it doesn't require a massive
amount of cache. Somewhere around 16MB to 128MB would be plenty.
Another use case is similar to above, but the raw disk or partition image is
what is being written. In this case, no elevator action is needed at all,
unless the disk is in use for something else, too. Images are written in a
sequential manner. Caching of just 2 to 4 times the largest I/O write unit
is the maximum needed.
Another use case that is more common is making backups to external hard drives.
This is more and more commonly done. It could be done as a raw partition for
an unmounted filesystem using a program like "dd". Otherwise it would be done
for a file tree with a program like "rsync".
BTW, one way I have done to work around the RAM flooding problem is turn off
swap altogether. I sized my system the usual ways and figured I needed around
2GB to 3GB of RAM for what I do, not considering the bulk writing. I rounded
that up to 4GB, then doubled it to 8GB. If I had used swap space I probably
would have 4GB of RAM and 2GB to 4GB of swap. This way I have just as much
memory. Now when I do bulk writes, it still floods RAM, but the impact is
limited. It can "dismiss" unmodified pages from existing processes, which
means they have to be swapped back in from their original place (executable
file or library) again. But fewer pages are affected, and only half the I/O
is needed for the ones that are affected. It definitely works better.
Another thing I have done to avoid the RAM flooding is to run my own program
that uses the O_DIRECT option on the open() call to the device. This is only
usable for copying raw images. It does slow down the I/O somewhat. It is for
this program I started wondering about a syncronized two-process writing
strategy of which one possible approach was asked about in another thread.
--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
.
- Follow-Ups:
- Re: Caching control
- From: David Schwartz
- Re: Caching control
- References:
- Caching control
- From: phil-news-nospam
- Re: Caching control
- From: David Schwartz
- Caching control
- Prev by Date: Re: how to tell when a process is blocked in write()
- Next by Date: Re: Caching control
- Previous by thread: Re: Caching control
- Next by thread: Re: Caching control
- Index(es):
Relevant Pages
|