parallel vs. serial disk access

From: Jens Kramel (no_at_spam.org)
Date: 07/12/05


Date: Tue, 12 Jul 2005 11:47:05 +0200

Hi NG,

last week I received some very useful answers to my question about the
parallel flows when sending a file over a network, so think it's a good idea
to ask you again this time... (Just wanted to say thanks again :)
Background is still that I'm creating a solution for network file transfers.
Of course, it's been done a thousand times before - but that doesn't mean it
can't be done any better, does it?

The problem is: When accessing a system with a single client, the whole
system is "dedicated" to that client. Disk i/o, read-ahead and network
transfer work perfectly together. But as the number of clients increases,
the system seems to get to its knees a little too fast, in my opinion.
To give an example: The disk of my server can read a single uncached file at
about 50MB/s. Reading two uncached files (multithreaded) runs at 8MB/s for
each, and accessing three files simultaneously runs at 4MB/s per thread - so
we have an aggregated speed of only 12MB/s for three clients. (I tried
reading with different chunk-sizes, but that didn't change the result)
I guess, the explanation for this heavy performance loss is the i/o
scheduler, which tries to achieve a good relation between transfer rates and
response times and so it ends up in continously jumping over the disk to
read chunks of every file, adding a lot of seek-time to the i/o operations.

My suggestion now would be, simply not to read multithreaded. I mean, what
is it good for when it obviously tends to slows things down extremely. I
think, in situations where files are being read sequentially, it would be
the best solution to read the files in chunks of one or two MB (tradeoff
between ram usage, continuous disk access and responsiveness) and switch to
the next file after each chunk has been read. This should reduce seeks times
dramatically.
But on the other hand, this would mean, that a client, which tries to get a
file with a size of only 1KB, has to wait until the other clients each got
2MB. This would be unfair (at least from a user's point of view ;). So my
idea here was to use multiples queues for the read requests - assignment
based on the size of a read request. The server application then would
switch rotationally between the queues and when there are several big
requests in the one queue and a small request in the small-request-queue,
the small request would be handled soon and wouldn't have to wait for the
big requests to finish. This wouldn't be completely fair, but I think it
would be a really good tradeoff.

So, let's finally come to my question, which is rather simple after all
these pre-thoughts: It all looks to easy. Did I forget something? I mean,
would there really be no advantage in parallel i/o except that the i/o
scheduler would care about the tradeoff and I would save the little time to
implement the queued reading?

Thanks in advance
Jens



Relevant Pages

  • Re: Asynchronous calls to a web service - 100,000 a minute
    ... I'm designing the client app to make the requests and i'm thinking ... These will use i/o completion ports, which implicitly use a thread pool but which don't require you to implement threading explicitly. ... Network i/o is going to be a LOT slower than your CPU bandwidth, and once you've completely saturated your network connection, it doesn't matter how efficient the rest of the code is, it can't go any faster. ...
    (microsoft.public.dotnet.framework)
  • DHCP Server
    ... I am running dhcpd on my FreeBSD machine and have 2 clients that requests an ... The one client is connected directly to the same network as the ... the other client is connected via a bridge on the first client. ...
    (freebsd-questions)
  • RE: Lost my outlook contact... :(
    ... the network configuration is started from a web page located ... client computer, you will see a welcome page to invite you to start the ... local user profiles to the domain user profile. ... Before joining client computers to the network, ...
    (microsoft.public.windows.server.sbs)
  • Re: SMS 2.0 and SMS 2003 Running at same time in same domain.
    ... the clients are on the network. ... The operating system reported error 53: ... Possible cause: The client is offline. ... Verify that the client is connected to the network and that the SMS ...
    (microsoft.public.sms.setup)
  • RE: Printing from Win9x clients stops
    ... Open Server Management. ... then right-click the name of the computer running Windows Small Business ... >From the client computer: ... The Select Network Component Type ...
    (microsoft.public.windows.server.sbs)

Loading