Re: Caching control



On Mar 6, 11:32 am, phil-news-nos...@xxxxxxxx wrote:

Don't forget that other rule:  you also have to implement it yourself

No, there's no such rule.

It's a silly rule, but it is asserted for jusy about every feature suggestion.

If it's only useful to you, you have to implement it yourself.

It's silly because not everyone has the background knowledge of all the other
code involved to accomplish the implementation anywhere nearly as quickly as
someone who does have that background.  Or, maybe it is the case that Linux
is simply unable to do this.

Nobody's going to investigate a suggestion without a use case.

| Is there some problem that this solves best? Or is there a whole class
| of problems that this solves better than everything else? If not, the
| feature wouldn't be useful.

The problem it solves is to avoid flooding RAM with a large number of pages
of data that are merely going to be written to disk.  When a program is going
to write a large amount of data (at least 2 times as much as there is RAM,
and maybe a lot more), there is no point in caching that data beyond just
enough to keep the I/O rate going at full speed.  What happens is that this
flooding of RAM with useless cache causes other processes to be swapped out.
That act of swapping out, and back in again, slows everything down, and the
total amount of work that can get done is reduced.  If the swap space is on
the same I/O channel, or even on the same disk drive, as where the bulk data
is being written, it slows down that data writing, too.

If this happens, it's a bug in the operating system's caching logic
(or it's badly tuned, or it's a case the logic just handles badly). It
should not allow disk cache to grow large enough to push the working
set into swap.

In any event, for this use case, there is a much better solution,
posix_fadvise(POSIX_FADV_NOREUSE). This is better for three reasons:

1) It's standardized.

2) It tells the operating system the *reason* you don't want the data
kept in cache.

3) It allows the operating system to decide what to do to best handle
that situation rather than you forcing a particular solution that may
or may not be right.

One use case is populating a disk with an initial system install, using a
formatted and mounted filesystem, and a stream of files coming from somewhere.
More pages will need to be cached for this use case to gain advantages of the
elevator logic for ordering disk writes.  But it doesn't require a massive
amount of cache.  Somewhere around 16MB to 128MB would be plenty.

Again, this is what posix_fadvise is for.

Another use case is similar to above, but the raw disk or partition image is
what is being written.  In this case, no elevator action is needed at all,
unless the disk is in use for something else, too.  Images are written in a
sequential manner.  Caching of just 2 to 4 times the largest I/O write unit
is the maximum needed.

Same answer.

BTW, one way I have done to work around the RAM flooding problem is turn off
swap altogether.  I sized my system the usual ways and figured I needed around
2GB to 3GB of RAM for what I do, not considering the bulk writing.  I rounded
that up to 4GB, then doubled it to 8GB.  If I had used swap space I probably
would have 4GB of RAM and 2GB to 4GB of swap.  This way I have just as much
memory.  Now when I do bulk writes, it still floods RAM, but the impact is
limited.  It can "dismiss" unmodified pages from existing processes, which
means they have to be swapped back in from their original place (executable
file or library) again.  But fewer pages are affected, and only half the I/O
is needed for the ones that are affected.  It definitely works better.

This sounds like some kind of tuning problem. Is this a recent Linux
kernel? Does it have default vm tuning? Unlimited writing should *not*
cause recently-active pages to swap out.

Another thing I have done to avoid the RAM flooding is to run my own program
that uses the O_DIRECT option on the open() call to the device.  This is only
usable for copying raw images.  It does slow down the I/O somewhat.  It is for
this program I started wondering about a syncronized two-process writing
strategy of which one possible approach was asked about in another thread..

You should not be having this problem. You should invest some time in
figuring out why you do. Have you tinkered with settings like
overcommit_memory, overcommit_ratio, swappiness, min_free_kbytes,
vfs_cache_pressure, dirty_ratio, and so on?

DS
.



Relevant Pages

  • Re: swap_pager complaints but not using swap
    ... using a whole 96K of swap. ... I don't see any disk related ... suffers from the same lack of fair i/o scheduling that user processes ... into queues for each drive such that the arm moves from request to request ...
    (freebsd-questions)
  • Re: [RFC] VM: I have a dream...
    ... Michael Loftis writes: ... >> beneficial if there are more than one disk in the system so that i/o ... These swap files may be activated at run time ...
    (Linux-Kernel)
  • Re: Hard disk speed - Maybe OT
    ... the fact that swap is used at all on a machine that is not multi-user means it's time to get more RAM. ... Using swap results in disk thrashing and disk I/O suffers. ...
    (alt.os.linux.suse)
  • Re: partition size questions
    ... Remember that writing 1G of memeory to disk takes forever. ... swap is only written as needed. ...
    (alt.os.linux.suse)
  • Re: running Linux with no swap space (but lots of RAM)
    ... You'd be better off having 8GB of RAM and 4GB of swap. ... takes place is that the system tries to cache more of the pages of I/O ... data to disk, how much RAM would be needed to do that? ...
    (comp.os.linux.development.system)