Re: Inter-process communication - does data hit disk?

From: Jamie Vicary (Xjamie.Xvicary_at_Xmansf.Xox.Xac.Xuk)
Date: 08/12/04


Date: Thu, 12 Aug 2004 12:37:54 +0100

Joe,

        Thanks for the response.

Joe Knapka wrote:
> Jamie Vicary <jamie.vicary@mansf.ox.ac.uk> writes:
>
> > Dear all,
> >
> > I have two processes running on the same workstation, and I want them
> > to communicate with each other. I don't want them to have to
> > synchronise themselves with each other, as by definition that would
> > require one of them to wait for a bit, and robustness is very
> > important - so sockets are out. I want to use files.
>
> You should probably investigate the socket API further. There is no
> necessary synchronization when using sockets. Checking whether data is
> available on a socket, or whether a socket is writable, without
> blocking is easy (man select). You might also consider FIFOs (man
> mkfifo). In any case, whether you use sockets or FIFOs or files,
> you'll probably need to use select() to decide if and when to read
> data.

How big is a socket buffer anyway? On the order of tens of kilobytes?
What determines the size? Is the size modifiable?

> > How much would this slow me down?
>
> Anything actually getting written out to disk and then read back
> in will slow you down *a lot*. Orders of magnitude, compared with
> keeping everything in RAM.
>
> > I might have to transfer gigabytes
> > of data between my processes. Now, presumably, when you write
> > something to a file it doesn't actually 'hit disk'
>
> It certainly will hit disk if you write enough data to require the
> kernel to free some buffers -- which you probably will if you're going
> to be writing gigabytes of data. *Any* IPC mechanism is going to
> eventually hit some form of throughput constraint. With sockets, for
> example, when the kernel can't give you more buffer space it will
> block your send() operations. Of course, if that happens it means that
> your consumer cannot handle the data as fast as your producer is
> generating it, so you *want* this behavior.

Okay - perhaps my emphasis was a bit wrong. I will rarely, if ever, want
to transfer gigabytes of data; I just want the protocol I use to be
*potentially scalable* to such large datasets.

> > - it stays in RAM
> > a) because the computer might be too busy at the moment to coordinate
> > the complicated business of sending commands to the hard disk, and b)
> > because the file data might well be needed in the near future. Is this
> > true?
>
> That depends on a lot of factors. Things that need to get written out
> do. If you're writing to a file on disk, from the kernel's point of
> view that definitely counts as "stuff that needs to get written out",
> 'cause hey, why else would you bother writing to a disk file, eh? It
> won't happen synchronously, but it will happen eventually (like when
> some random user types "sync").
>
>
> > If this is true, does that mean that my process which is receiving the
> > data will be able to read it from disk as fast as if it was accessing
> > data stored in a very large array of its own? If not, why not?
>
> No, because sometimes (and you can't predict exactly when), the kernel
> is going to decide to write some of your data out in order to free RAM
> resources for other purposes. When that happens, your data is evicted
> from RAM in favor of something else, and your consumer process is
> going to need to read that data back in. So you've just made a
> needless trip to and from the disk, taking potentially a great many
> microseconds.

Ah - but that's just it. What sort of time are we talking about before
the data hits disk and is deleted from RAM? Milliseconds? Minutes?
Somewhere in between? What if the RAM isn't being used much? The kernel
might just keep the file information in RAM until the system is shut
down (or someone types "sync"), becauase a file that has just been
created is more likely than any other random file on your hard disk to
need to be used in the near future.

> Performance-wise, it's far better to let the producer
> block and let the consumer catch up, leaving your code CPU-bound (all
> other things being equal). (But if your program uses much more memory
> than the size of physical RAM, it's likely some of your data will be
> swapped out at some point anyway, whether you use disk files
> explicitly or not.)
>
> > A ramdisk isn't the answer to my needs, because I just can't count on
> > there being enough RAM for the data on any machine I have to run my
> > programs on.
>
> Then you *are* going to block waiting for your data channel sometimes,
> either on the read or the write end, or both. Sorry. You could think
> of a FIFO as a specialized ramdisk that permits only reading from one
> end and writing to the other, and blocks the reader if the disk is
> empty and the write if it's full. You can't do much better than that,
> and it doesn't make sense to try to build that functionality yourself
> on top of the filesystem, since it's already been done for you. And
> further, you can think of a Unix socket as a full-duplex version of
> the same idea. (To a first approximation anyway.)

Well - I suppose my point was that if you try and put too much stuff on
a ramdisk, the operation will fail, and I don't want my process that's
sending the data to have to worry about that, it's got too many other
things to do. The same goes for sockets. Hard disks, on the other hand,
have the property that if you try and send more data than the RAM can
hold, then you'll have to wait for some data to actually hit the disk
and your write() command will take longer, but at least it will complete
without failing (making the reasonable assumption that I have an
infinite amount of hard disk space.) In that sense, using the file
system is very robust.

        - and! -

using the file system has the advantage that, at least /sometimes/ when
the file is read within a certain time T of it being initially written,
the data will be read /very fast/ because it will still reside in RAM,
and no actual HD spinning will actuall take place until after the data
has been read and we don't care about its file-system representation any
more. Some simple experimentation will answer this, I'm sure.

> > I am currently developing in Perl and C++ but I will want to be moving
> > to Fortran90 and C
>
> Gracious. You seem to enjoy giant leaps backward... :-)

Indeed! I'm adapting some software we have, and I'm starting with the
most recent stuff but will eventually have to move on to the old stuff.

> > - but am I right in thinking the issues I am
> > thinking about in this post are OS specific, not language specific?
>
> Language is probably not too important, provided the places that need
> to be fast are executed in native code (and not, say, interpreted or
> byte-compiled).

        Regards,

                Jamie Vicary



Relevant Pages

  • Re: Hard disk capacity
    ... >Edition, 256MB Shared DDR SDRAM, 40GB HARD DISK ... Is this enough disk space for typical applications? ... If the computer does not have RAM, what types of files will you be unable ... The disk space requirements for application programs are included ...
    (microsoft.public.windowsxp.hardware)
  • Re: Inter-process communication - does data hit disk?
    ... > How big is a socket buffer anyway? ... If your dataset gets substantially larger than physical RAM, ... going to have to wait for the disk sometimes, ... more than a tiny amount of data in the pipe at a time". ...
    (comp.os.linux.development.apps)
  • Re: Dual processor system
    ... Most video editing is done in a streaming manner; ... start and end points and a script and the data is streamed off the disk ... 1Gb RAM) for the entire time but the hard disk light only ...
    (uk.comp.homebuilt)
  • Re: PC very sluggish after installing new hard drive
    ... How much RAM memory?Try Ctrl+Alt+Delete to select Task ... How large is your hard disk and how much free disk space? ... Is the hard disk formatted as fat32 or NTFS? ... > drives. ...
    (microsoft.public.windowsxp.perform_maintain)
  • Re: OT/drift: when is a RAMdisk an appropriate solution
    ... include the "ram disk" component in your project. ... Sometimes, a physical RAM ... only to wind up going directly to regular old ordinary memory, ... testing the file system software. ...
    (comp.lang.c)