Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Evgeniy Polyakov <johnpol@xxxxxxxxxxx>
- Date: Sun, 25 Feb 2007 22:42:51 +0300
On Sun, Feb 25, 2007 at 08:04:15PM +0100, Ingo Molnar (mingo@xxxxxxx) wrote:
* Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
Kevent is a _very_ small entity and there is _no_ cost of requeueing
(well, there is list_add guarded by lock) - after it is done, process
can start real work. With rescheduling there are _too_ many things to
be done before we can start new work. [...]
actually, no. For example a wakeup too is fundamentally a list_add
guarded by a lock. Take a look at try_to_wake_up(). The rest you see
there is just extra frills that relate to things like 'load-balancing
the requests over multiple CPUs [which i'm sure kevent users would
request in the future too]'.
wake_up() as a call is pretty simple and fast, but its result - it is
slow. I did not run reschedulingtests with kernel thread, but posix
threads (they do look like a kernel thread) have significant overhead
there. In early developemnt days of M:N threading library I tested
rescheduling performance of the POSIX threads - I created pool of
threads and 'sent' a message using futex wait/wake - such performance of
the userspace threading library (I tested erlang) was 10 times slower.
[...] We have to change registers, change address space, various tlb
bits and so on - we have to do it, since task describes very heavy
entity - the whole process. [...]
but ... 'threadlets' are called thread-lets because they are not full
processes, they are threads. There's no TLB state in that case. There's
indeed register state associated with them, and currently there can
certainly be quite a bit of overhead in a context switch - but not in
register saving. We do user-space register saving not in the scheduler
but upon /every system call/. Fundamentally a kernel thread is just its
EIP/ESP [on x86, similar on other architectures] - which can be
saved/restored in near zero time. All the rest is something we added for
good /work queueing/ reasons - and those same extras should either be
eliminated if they turn out to be not so good reasons after all, or they
will be wanted for kevents too eventually, once it matures as a work
queueing solution.
If things decreases performance noticebly, it is a bad things, but it is
matter of taste. Anyway, kevents are very small, threads are very big,
and both are the way they are exactly on purpose - threads serve for
processing of any generic code, kevents are used for event waiting - IO
is such an event, it does not require a lot of infrastructure to handle,
it only nees some simple bits, so it can be optimized to be extremely
fast, with huge infrastructure behind each IO (like in case when it is a
separated thread) it can not be done effectively.
I think it is _too_ heavy to have such a monster structure like
task(thread/process) and related overhead just to do an IO.
i think you are really, really mistaken if you believe that the fact
that whole tasks/threads or processes can be 'monster structures',
somehow has any relevance to scheduling/task-queueing performance and
scalability. It does not matter how large a task's address space is -
scheduling only relates to the minimal context that is in the CPU. And
most of that context we save upon /every system call entry/, and restore
it upon every system call return. If it's so expensive to manipulate,
why can the Linux kernel do a full system call in ~150 cycles? That's
cheaper than the access latency to a single DRAM page.
I meant not its size, but the whole infrastructure, which surrounds
task. If it is that lightweight, why don't we have posix thread per IO?
One question is that mmap/allocation of the stack is too slow (and it is
very slow indeed, that is why glibc and M:N threading lib caches
allocated stacks), another one is kernel/userspace boundary crossing,
next one are tlb flushes, then copies.
Why userspace rescheduling is in order of tens times faster than
kernel/user?
for the same reason has it no relevance that the full kevent-based
webserver is a 'monster structure' - still a single request's basic
queueing operation is cheap. The same is true to tasks/threads.
To move that tasks there must be done too may steps, and although each
one can be quite fast, the whole process of rescheduling in the case of
thousands running threads creates too big overhead per task to drop
performance.
Really, you dont even have to know or assume anything about the
scheduler, just lets do some elementary math here:
the reqs/sec your sendfile+kevent based webserver can do is 7900 per
sec. Lets assume you will write further great kevent code which will
optimize it further and it goes up to 10,100 reqs per sec (100 usecs per
request), ok? Then also try how many reschedules/sec can your Athon64
3500 box do. My guess is: about a million per second (1 usec per
reschedule), perhaps a bit more.
Let's calculate: disk bandwidth is about gigabytes per second (cached
case), so to transfer 10k file we need about 10 usec - 10% of it will be
spent in rescheduling (if there will be only one if any).
Network is an order of 10 slower (1gbit for example), but there are much
more blockings, so to transfer 10k we will have let's say 5 blocks, i.e.
5 reschedulings - another 5%, so we wasted 15% of our time in
rescheduling.
Event in turn is a 30 bytes copy (plus of course own overhead, but it is
still faster - it is faster just because ukevet size is smaller than
pt_regs :).
Interesting discussion, that will be very fun if kevent will lose badly
:)
Now lets assume that a threadlet based server would have to
context-switch for /every single/ request served. That's totally
over-estimating it, even with lots of slow clients, but lets assume it,
to judge the worst-case impact.
So if you had to schedule once per every request served, you'd have to
add 1 usec to your 100 usecs cost, making it 101 usecs. That would bring
your 10,100 requests per sec to 10,000 requests/sec, under a threadlet
model of operation. Put differently: it will cost you only 1% in
performance to schedule once for every request. Or lets assume the task
is totally cache-cold and you'd have to add 4 usecs for its scheduling -
that'd still only be 4%. So where is the fat?
I need to move home or I will sleep on the street, otherwise I would already
ran a test and started to eat a hat (present me a red one like Lan Cox
had), or watch how you to do it :)
Give me several hours.
Ingo
--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Davide Libenzi
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ingo Molnar
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ingo Molnar
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- References:
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ulrich Drepper
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ingo Molnar
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Evgeniy Polyakov
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ingo Molnar
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Evgeniy Polyakov
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ingo Molnar
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Evgeniy Polyakov
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ingo Molnar
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Evgeniy Polyakov
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- From: Ingo Molnar
- Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- Prev by Date: Re: GPL vs non-GPL device drivers
- Next by Date: Re: [PATCH 12/14] x86_64 irq: Add constants for the reserved IRQ vectors.
- Previous by thread: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- Next by thread: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
- Index(es):
Relevant Pages
|
|