Re: Inter-process communication - does data hit disk?

From: Joe Knapka (jknapka_at_kneuro.net)
Date: 08/12/04


Date: Thu, 12 Aug 2004 16:37:30 GMT

Jamie Vicary <Xjamie.Xvicary@Xmansf.Xox.Xac.Xuk> writes:

> Joe,
>
> Thanks for the response.

You're welcome. Massive unmarked snippage below; I think we'll
be able to keep things straight ;-)
 
> How big is a socket buffer anyway? On the order of tens of kilobytes?
> What determines the size? Is the size modifiable?

Yes. man setsockopt ; man 7 socket

>
> Okay - perhaps my emphasis was a bit wrong. I will rarely, if ever,
> want to transfer gigabytes of data; I just want the protocol I use to
> be *potentially scalable* to such large datasets.

If your dataset gets substantially larger than physical RAM, you are
going to have to wait for the disk sometimes, no matter what IPC
mechanism you use. Sockets are the clearly correct choice for
streaming large amounts of data between processes (or machines). The
kernel's socket implementation is going to be tuned for that, since
that's the application the socket subsystem is designed to address.

> Ah - but that's just it. What sort of time are we talking about before
> the data hits disk and is deleted from RAM? Milliseconds? Minutes?
> Somewhere in between?

That depends entirely on what other activity is going on, and on
how much data you're pushing around.

> What if the RAM isn't being used much?

That assumption contradicts your other statements. If you're going to
be handling datasets pushing or exceeding physical RAM capacity, then
the RAM is going to be in use; and if you aren't, then yes, your data
will stay in RAM for the most part. But if you use disk files the
kernel will still write the data out, which is needless work.

> The
> kernel might <---

Don't guess about performance consequences of your design decisions.
Either know, or do realistic experiments to find out.

> just keep the file information in RAM until the system is
> shut down (or someone types "sync"), becauase a file that has just
> been created is more likely than any other random file on your hard
> disk to need to be used in the near future.

Yes, but the kernel will still write the data out as it can. It must,
because by opening a file on the disk, you're telling the kernel that
the data you're going to write there is so mind-bogglingly important
that you want to have a permanent, on-disk record of it.

> > Performance-wise, it's far better to let the
> > producer block and let the consumer catch up, leaving your code
> > CPU-bound (all other things being equal).

I'll pause here to emphasize, with bold print and several underlines,
the preceding sentence.

> Well - I suppose my point was that if you try and put too much stuff
> on a ramdisk, the operation will fail, and I don't want my process
> that's sending the data to have to worry about that, it's got too many
> other things to do. The same goes for sockets.

No. A socket won't fail if the send buffer fills; it will simply
block the writer until there's space available. You don't have to
write any special code to make that happen; it's the natural behavior
of sockets. If you fill a RAM disk, OTOH, writes *will* just fail,
until space is made available by explicit deletion of material on the
RAM disk; and the abortively-written data will go to bit heaven. Your
app might be able to tolerate such lossage, but most likely it would
just be a bug.

> Hard disks, on the
> other hand, have the property that if you try and send more data than
> the RAM can hold, then you'll have to wait for some data to actually
> hit the disk and your write() command will take longer, but at least
> it will complete without failing (making the reasonable assumption
> that I have an infinite amount of hard disk space.)

What if you fill the disk? Same scenario as above.

> In that sense,
> using the file system is very robust.
>
> - and! -
>
> using the file system has the advantage that, at least /sometimes/
> when the file is read within a certain time T of it being initially
> written, the data will be read /very fast/ because it will still
> reside in RAM, and no actual HD spinning will actuall take place until
> after the data has been read and we don't care about its file-system
> representation any more. Some simple experimentation will answer this,
> I'm sure.

Sockets have the further advantage that your data will never make a
needless trip to disk and back, so /all/ access to data written
through a socket will be /very fast/, because it will /always/ reside
in RAM, unless your machine is so heavily into swap-land that it has
to evict active pages to keep up with some other activity. In that
case, you are doomed performance-wise anyway.

And really, the issue shouldn't be "how much data can I shove into
this IPC channel before performance penalties ensue". The issue should
be, "how can I make my consumer fast enough that I never have to have
more than a tiny amount of data in the pipe at a time". If you're
decoding a video stream in one process and piping it to another
process for disply, for example, you don't care that your IPC pipe
can't accept the entire 500MB stream in a single chunk, because if the
pipe ever gets that full you're doomed anyway. You care only about
making the consumer (the display process) fast enough that there never
has to be more than a few dozen K of data in that pipe at a time.

Cheers,

-- Joe

-- 
"We sat and watched as this whole       <-- (Died Pretty -- "Springenfall")
 blue sky turned to black..."
... Re-defeat Bush in '04.
--
pub  1024D/BA496D2B 2004-05-14 Joseph A Knapka
     Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4  C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.


Relevant Pages

  • Re: Inter-process communication - does data hit disk?
    ... > You should probably investigate the socket API further. ... > It certainly will hit disk if you write enough data to require the ... >> the complicated business of sending commands to the hard disk, ... > is going to decide to write some of your data out in order to free RAM ...
    (comp.os.linux.development.apps)
  • Re: Re[2]: sendfile to nonblocking socket
    ... 2G from disk, but not writing them to socket. ... splice(from disk, to pipe) with SPLICE_F_NONBLOCK ... I am not sure splice() to socket is actually implemented with 0-copy ...
    (Linux-Kernel)
  • Re: Detecting multiple class loaders
    ... We can detect multiple processes by locking the file on disk and throwing an error; ... When this class gets used in a webapp, Websphere, like most other app servers, creates a classloader just for that webapp. ... someone else has the socket and is writing. ...
    (comp.lang.java.programmer)
  • Re: CFFA for IIc and IIc+?
    ... slot 6 - 5.25 disk ... "Apple II" data into and out of the machine. ... an SD card socket, a phantom clock "socket", plus a 3.3v regulator and ... or CF card I don't think the implementation should depend in it. ...
    (comp.sys.apple2)
  • Socket data out of order
    ... Then I start it all by getting a buffer from the pool, and do a BeginReceive on the socket. ... In the write completion handler I return the context and buffer to my buffer pool. ... Without the disk writing I get about 60 MBits / second maximum received on the socket. ...
    (microsoft.public.dotnet.framework)