Re: page cache - process memory - disk read

From: Tommy Reynolds (TommyReynolds_at_yahoo.com)
Date: 09/23/05


Date: Fri, 23 Sep 2005 11:09:11 -0500

On Thu, 22 Sep 2005 22:47:31 -0700, hopehope_123 wrote:

> Where does the pages which are read from the disk is mapped inside a
> process?

They are not mapped if you are simply doing read{v}/write{v} type
I/O. Disk file content is placed into the kernel page cache and
*copied* to your program buffer. If you are performing asynchronous
I/O, this copy might not take place and the data can be written
directly into the application buffer.
 
> Imagine a simple c program:
> char *buf;
> buf = (char *)malloc(1000);
> This buf is allocated inside the heap , or in detail it is first reserved
> (in swap file) ,and on demand , it is allocated (inside the phy.ram)

Well, neither one actually. All malloc(3) does is to ensure that
there is a 1000-byte hole in your process virtual memory. You get
back the VM address of that hole, but no change is made to the VM
setup yet. In particular, the physical memory is *not* given to the
process yet.
 
> Then imagine that the code has the following line:
> *buf = 65 ;
> So i modifed the contents the buffer.

Yes. It is at this point in the program that exactly one virtual
memory page is obtained from the page cache.

You can watch this happen if you single-step through the program in
one window and repeatedly view /proc/<pid>/maps in another.

> As far as i know , if the pages which are modified like this , needs to be
> written to the swap disk in order to be reused again .

Sorry, no. Just because a page gets marked as dirty doesn't mean
that it every sees the swap device.

> So if these pages pageout by the os , they are written to the swap file.

It depends. If the dirty page is holding some modified data file
content, we just write that page back to the disk file. If a dirty
page is for anonymous pages, such as your malloc(3)'ed example, only
then is the page a candidate for the swap device.

> Now consider that ,i start to read data files by using read ,or
> readv or pread system calls. ( readv and pread are the io calls which
> oracle or other database systems uses on unix)
> fdes=open("/data/spss/x1.dat",O_RDONLY);
> while (fdes)
> {
> printf("%d\n",read(fdes, buf,sz));
> }
> How the data blocks which are cached inside the page cache mapped to the
> process?

The read(2) call checks if the desired data is already held in the
page cache. If not, a disk read is started to fill that page cache
contents. Once the data are in the page cache, they are essentially
memcpy(3)'ed to the application buffer. No VM mapping at all.

> If the file is read first time by this process , the file must be read
> from the disk. Then the blocks are cached inside the page cache. The page
> cache has no backing storage inside the swap file ,
> but instead it is directly mapped from the data file itself.
> (Is this correct?)

OK, as long as you leave the swap device out of this. Evicting pages
from memory and recovering them from the swap device is a different
(although related) functionality from file I/O.

> When my process reads data from the disk , then does the page that is
> inside the page cache copied into the process map ? Or is it shared and no
> copy takes place?

Ordinary file I/O always takes two copy operations:

1) Disk content is copied from the media into kernel memory, the page
  cache; often by DMA hardware on the disk controller, so this is
  relatively fast.

2) Bytes are copied between the kernel page cache and the application
   buffer by the kernel equivalent of memcpy(3). This is much slower
   than copy #1 above.

Various schemes exist to speed up or eliminate copy #2. Check into
asynchronous I/O, for an example.

The mmap(2) system call, with all of its many parameters, is a way to
let the application get access to selected kernel memory. The VM map
for the process is altered so loads and stores using the pointer that
mmap(2) directly addresses the associated kernel memory.

Cheers



Relevant Pages

  • Re: Caching control
    ... |> | invalidate/unmap them in order to discard the data from memory. ... |> writing out to disk. ... | easy to discard as clean disk cache. ... stating that a specific amount of RAM can be used only for I/O ...
    (comp.os.linux.development.system)
  • Re: Scheduler: Process priority fed back to parent?
    ... > cache information about individual executables. ... > a daemon which stores it on disk in such a way ... > that the kernel can efficiently get at it. ... it looks at the cache for interactivity history to estimate ...
    (Linux-Kernel)
  • Re: Question on Queuing Disk I/O in Interrupt-driven System
    ... I had the thought that since interrupts seem to screw up disk I/O ... Kernel performs context switch into Proc B ...
    (comp.sys.apple2.programmer)
  • Re: Scheduler: Process priority fed back to parent?
    ... There would have to be some sort of cache. ... > The kernel already does disk access to load a process... ... > would be an artificial one which always exists for this, or the priority ...
    (Linux-Kernel)
  • RE: Direct io on block device has performance regression on 2.6.x kernel
    ... industry standard transaction processing database benchmark on 2.6 kernel, ... result showing large performance regression. ... The reason I posted the pseudo disk driver is for people to see the effect ... little bit on each I/O call, ...
    (Linux-Kernel)