Re: [PATCH 2 of 4] Introduce i386 fibril scheduling




* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

On Sat, 3 Feb 2007, Ingo Molnar wrote:

Well, in my picture, 'only if you block' is a pure thread
utilization decision: bounce a piece of work to another thread if
this thread cannot complete it. (if the kernel is lucky enough that
the user context told it "it's fine to do that".)

Sure, you can do it that way too. But at that point, your argument
that we shouldn't do it with fibrils is wrong: you'd still need
basically the exact same setup that Zach does in his fibril stuff, and
the exact same hook in the scheduler, testing the exact same value
("do we have a pending queue of work").

did i ever lose a single word of complaint about those bits? Those are
not an issue to me. They can be applied to kernel threads just as much.

As i babbled in the very first email about this topic:

| 1) improve our basic #1 design gradually. If something is a
| bottleneck, if the scheduler has grown too fat, cut some slack. If
| micro-threads or fibrils offer anything nice for our basic thread
| model: integrate it into the kernel.

i should have said explicitly that to flip user-space from one kernel
thread to another one (upon blocking or per request) is a nice thing and
we should integrate that into the kernel's thread model.

But really, being a scheduler guy i was much more concerned about the
duplication and problems caused by the fibril concept itself - which
duplication and complexity makes up 80% of Zach's submitted patchset.
For example this bit:

[PATCH 3 of 4] Teach paths to wake a specific void * target

would totally go away if we used kernel threads for this. In the fibril
approach this is where the mess starts. Either a 'normal' wakeup has to
wake up all fibrils, or we have to make damn sure that a wakeup that in
reality goes to a fibril is never woken via wake_up/wake_up_process.

( Furthremore, i tried to include user-space micro-threads in the
argument as well, which Evgeniy Polyako raised not so long ago related
to the kevent patchset. All these micro-thread things are of a similar
genre. )

i totally agree that the API /should/ be the main focus - but i didnt
pick the topic and most of the patchset's current size is due to the IMO
avoidable fibril concept.

regarding the API, i dont really agree with the current form and design
of Zach's interface.

fundamentally, the basic entity of this thing should be a /system call/,
not the artificial fibril thing:

+struct asys_call {
+ struct asys_result *result;
+ struct fibril fibril;
+};

i.e. the basic entity should be something that represents a system call,
with its up to 6 arguments, the later return code, state, flags and two
list entries:

struct async_syscall {
unsigned long nr;
unsigned long args[6];
long err;
unsigned long state;
unsigned long flags;
struct list_head list;
struct list_head wait_list;
unsigned long __pad[2];
};

(64 bytes on 32-bit, 128 bytes on 64-bit)

furthermore, i think this API should be fundamentally vectored and
fundamentally async, and hence could solve another issue as well:
submitting many little pieces of work of different IO domains in one go.

[ detail: there should be no traditional signals used at all (Zach's
stuff doesnt use them, and correctly so), only if the async syscall
that is performed generates a signal. ]

The normal and most optimal workflow should be a user-space ring-buffer
of these constant-size struct async_syscall entries:

struct async_syscall ringbuffer[1024];

LIST_HEAD(submitted);
LIST_HEAD(pending);
LIST_HEAD(completed);

the 3 list heads are both known to the kernel and to user-space, and are
actively managed by both. The kernel drives the execution of the async
system calls based on the 'submitted' list head (until it empties it)
and moves them over to the 'pending' list. User-space can complete async
syscalls based on the 'completed' list. (but a sycall can optinally be
marked as 'autocomplete' as well via the 'flags' field, in that case
it's not moved to the 'completed' list but simply removed from the
'pending' list. This can be useful for system calls that have some
implicit notification effect.)

( Note: optionally, a helper kernel-thread, when it finishes processing
a syscall, could also asynchronously check the 'submitted' list and
pick up new work. That would allow the submission of new syscalls
without any entry into the kernel. So for example on an SMT system,
this could result in essence one CPU could running in pure user-space
submitting async syscalls via the ringbuffer, while another CPU would
in essence be running pure kernel-space, executing those entries. )

another crutial bit is the waiting on pending work. But because every
pending syscall entity is either already completed or has a real kernel
thread associated with it, that bit is mostly trivial: user-space can
wait on 'any' pending syscall to complete, or it could wait for a
specific list of syscalls to complete (using the ->wait_list). It could
also wait on 'a minimum number of N syscalls to complete' - to create
batching of execution. And of course it can periodically check the
'completed' list head if it has a constant and highly parallel flow of
workload - that way the 'waiting' does not actually have to happen most
of the time.

Looks like we can hit many birds with this single stone: AIO, vectored
syscalls, finegrained system-call parallelism. Hm?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: mysql scaling questions
    ... same file which is running into exclusive locking in the kernel ... Isn't this common for software developed for Linux? ... Even if Linux magically has faster syscalls somehow, they are still not zero cost so avoiding huge numbers of unnecessary trips ... Also I believe glibc caches getpid() in libc (again that ...
    (freebsd-performance)
  • Re: Linux Kernel Security - Can it ever be 100%
    ... > Linux kernel will ever be as good as I would like it to be. ... > of a typical microprocessor instruction set. ... It depends on your workload, but under most, yes, syscalls are very ... > limited to just those routines where this transaction was time-critical, ...
    (Debian-User)
  • Re: Use of C99 int types
    ... Hence the kernel, in compiler terms, is not part of "the ... translation environment (userspace) under particular control options ... but without using syscalls I can do nothing whatsoever. ...
    (Linux-Kernel)
  • Re: Timers and timing, was: MySQL Performance 6.0rc1
    ... How would a userland ... > ELF symbol table in the kernel as well. ... syscalls, we don't really need any real ELF symbols or something like ... The main point of all this is not to solve existing ABI compatibility ...
    (freebsd-current)
  • App verifier, HOOK, shim and how wince works
    ... What I wanted to do first was to hook system calls, I have managed to hooks some functions like CreateFile, RegCreateKeyE, ... ... Interesting part is on one hand 0xFFFFC800 because this memory address corresponds to the address where the kernel is loaded on ARM platform and on the other hand 0xF000AFDC because this corresponds to an exception and the kernel knows it corresponds to a system call. ... For instance CreateFile is implemented inside filesys.exe but how can I hook syscalls when pServer is NULL ...
    (microsoft.public.windowsce.embedded)