Re: epoll,threading



On Wed, May 30, 2007 at 07:12:01PM +0900, Tejun Heo wrote:
Hello,

Willy Tarreau wrote:
In my experience, it's not much the context switch by itself which
causes performance degradation, but the fact that with threads, you
have to put mutexes everywhere. And frankly, walking a list with
locks everywhere is quite slower than doing it in one run at a rate
of 3 or 4 cycles per entry. Also, local storage in function returns
is not possible anymore, and some functions even need to malloc()
instead of returning statically allocated data. I believe this is the
reason for openssl being twice as slow when compiled thread-safe than
in native mode.

So in fact, converting a threaded program to a pure async model
should not improve it much because of the initial architectural
design. But a program written from scratch to be purely async should
perform better simply because it has less operations to perform. And
there's no magics here : less cycles spend synchronizing and locking
= more cycles available for the real job.

The thing is that the synchronization overhead is something you'll have
to pay anyway to support multiple processors.

But you don't need to sync *everything*. It is doable to have 1 thread
per processor, each with their own data, and sync the minimum information
(eg: statistics).

Actually, supporting
multiple processors on an async program is beyond painful. Either you
have to restrict all locking to busy locks or introduce new state for
each possibly blocking synchronization point and what happens if they
have to nest? You kind of end up with stackable state thingie - an
extremely restricted stack.

I have not said it is simple, I said that when it is justified, it is doable.

If you're really serious about performance and scalability, you just
have to support multiple processors and if you do it right the
performance overhead shouldn't be too high. Common servers will soon
have 8 cores on two physical processors - paying some overhead for
synchronization is pretty good deal for scalability.

In my experience with web caches, epoll or similar for idle clients
and thread per active client scaled and performed pretty well - it
needed more memory but the performance wasn't worse than
asynchronous design and doing complex server in async model is a
lot of pain.

It's true that an async model is a lot of pain. But it's always where
I got the best performance. For instance, with epoll(), I can achieve
20000 HTTP reqs/s with 40000 concurrent sessions. The best
performance I have observed from threaded competitors was an order of
magnitude below on either value (sometimes both).

Well, it all depends on how you do it but an order of magnitude
performance difference sounds too much to me. Memory-wise scalability
can be worse by orders of magnitude.

It is very often a problem because system limits have not evolved as fast
as requirements.

You need to restrict per-thread
stack size and use epoll for idle threads, if you wanna scale. Workers
+ async monitoring of idle clients scale pretty well.

I agree with a small pool of workers. But they must be dedicated to CPU
only, and perform no I/O. Then you can have 1 thread/CPU.

However, I agree that few uses really require to spend time writing
and debugging async programs.

Yeap, also there are several things which just are too painful in async
server - e.g. adding coordination with another server (virus scan,
sharing cached data), implementing pluggable extension framwork for
third parties (and what happens if they should be able to stack!), and
maintaining the damn thing while trying to add a few features. :-)

IMHO, complex pure async server doesn't really make sense anymore.

That's clearly not my opinion, but I don't want to enter a flamewar on
the subject, it's not interesting. As long as people like us will push
the system to limits using either model, at least there will be
references for comparisons :-)

Cheers
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: epoll,threading
    ... converting a threaded program to a pure async model ... The thing is that the synchronization overhead is something you'll have ... multiple processors on an async program is beyond painful. ... server - e.g. adding coordination with another server (virus scan, ...
    (Linux-Kernel)
  • Problem with popen
    ... Under linux x86 I'm running a stress test which opens some tcp server ... applications and then about 70 clients in each their thread. ... The server however is started as an async process using fork and then ... sync and async processes. ...
    (comp.unix.programmer)
  • RE: Need Help with Page can not be displayed form my web service
    ... On the server side, when you make a call with Begin/End pairing from the ... async request - ... >the web service. ...
    (microsoft.public.dotnet.framework.aspnet.webservices)
  • Re: Async Remoting With Callback
    ... >>>side async calls don't really add up to support a lot of transaction. ... Are you suggesting that there is something inherently slow about async remoting that is ... I have a major server project that does that. ... client) is being suspended for the duration of the remote call itself. ...
    (microsoft.public.dotnet.framework.remoting)
  • Re: AppDomain Resetting
    ... But I would suggest that you think about some async mechanism to actually ... run your requests if they are taking this long. ... I am running this app on a Windows 2003 server and SQL ... > issue and I have played with the settings at the server and app level on ...
    (microsoft.public.dotnet.framework.aspnet)