Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Ingo Molnar <mingo@xxxxxxx>
- Date: Wed, 26 Aug 2009 09:28:08 +0200
* Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> wrote:
* Frederic Weisbecker (fweisbec@xxxxxxxxx) wrote:
On Tue, Aug 25, 2009 at 03:51:11PM -0400, Mathieu Desnoyers wrote:
* Frederic Weisbecker (fweisbec@xxxxxxxxx) wrote:
On Tue, Aug 25, 2009 at 02:31:19PM -0400, Mathieu Desnoyers wrote:
(Well, I do not have time currently to look into the gory details
(sorry), but let's try to take a step back from the problem.)
The design proposal for this kthread behavior wrt syscalls is based on a
very specific and current kernel behavior, that may happen to change and
that I have actually seen proven incorrect. For instance, some
proprietary Linux driver does very odd things with system calls within
kernel threads, like invoking them with int 0x80.
Yes, this is odd, but do we really want to tie the tracer that much to
the actual OS implementation specificities ?
I really can't see the point in doing this. I don't expect the kernel
behaviour to change soon and have explicit syscalls interrupts done
from it. It's not about a current kernel implementation fashion,
it's about kernel design sanity that is not likely to go backward.
Is it worth it to trace kernel threads, maintain their tracing
specificities (such as workarounds with ret_from_fork that implies)
just because we want to support tracing on some silly proprietary drivers?
That sounds like a recipe for endless breakages and missing bits of
instrumentation.
So my advice would be: if we want to trace the syscall entry/exit paths,
let's trace them for the _whole_ system, and find ways to make it work
for corner-cases rather than finding clever ways to diminish
instrumentation coverage.
If developers of out of tree drivers want to implement buggy things
that would never be accepted after a minimal review here, and then instrument
their bugs, then I would suggest them to implement their own ad hoc instrumentation,
really :-/
What's the point in supporting out of tree bugs?
Well, the only advantage of doing this would be to support reverse engineering
in tiny and rare corner cases. Not that worth the effort.
Given the ret from fork example happens to be the first event fired
after the thread is created, we should be able to deal with this problem
by initializing the thread structure used by syscall exit tracing to an
initial "ret from fork" value.
Mathieu
It means we have to support and check this corner case in every archs
that support syscall tracing, deal with crashes because we omitted it, etc...
For all the things I've explained above I don't think it's worth the effort.
But it's just my opinion...
Then we might want to explicitly require that calls to sys_*() system
calls made from within the kernel pass through another instrumentation
mechanism. IMHO, that would make sense. It would cover both system calls
made from kernel threads and system calls made from within a system call
or trap.
Mathieu
Well, we can't really set a tracepoint per sys_*() function. Or more
precisely we already have them, automagically generated and relying on
sysenter ptrace path.
But if we want to check which syscalls are called from kernel threads, we have:
- kthread() -> do_exit()
The entry point of every kernel threads (except "kthreadd") is
kthread(). It calls do_exit() in the end.
If we want to trace the exit of a kernel thread, we can put
a tracepoint there instead of do_exit() which results would
be intermixed with sys_exit() tracing.
- kthreadd :: create_kthread() -> kernel_thread() -> do_fork()
A creation of a thread is the result of the kthreadd thread fork().
If we want to trace the creation of kernel threads, we can again do that
in the upper level: kernel_thread().
But does that inform us about who created the thread? All we would see
is kthreadd that forks. This is a very poor information compared
to a userspace fork() that tells us who really created the new process.
Instead what we want is probably to trace kthread_create() which inserts the
job of a thread creation in the kthreadd thread, so that we know
_who_ asked for this thread creation (process that requested it and callsite).
And that's much more rich in information.
Well, you can even climb in an upper layer and look if this is a workqueue,
a kernel/async.c thread, a slow work, etc...
- kernel_execve() -> sys_execve()
We can execute user apps from kernel through call_usermodehelper().
And we can trace kernel_execve() or again in an upper layer
like call_usermodehelper()
- ... I guess there are other examples
The kernel calls syscalls through wrappers, and tracing these
wrappers, depending of the desired level of informations we want
(choose your layer), are much more verbose / rich in
informations.
What you describe looks a lot like the approach I use in the LTTng
tree. Actually, the main point I am trying to make here is: if we
rely only on tracing at the syscall entry/exit level for, say,
monitoring all uses of e.g. sys_open(), we might be caught
offguard by internal sys_open() uses within the kernel.
There's a lot of 'internal' file opening going on within the kernel
that ptrace does not notice - see all the filp_open() calls.
Lets worry about this only if it's a true issue.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Mathieu Desnoyers
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- References:
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Frederic Weisbecker
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Hendrik Brueckner
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Mathieu Desnoyers
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Frederic Weisbecker
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Frederic Weisbecker
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Mathieu Desnoyers
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Frederic Weisbecker
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Mathieu Desnoyers
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Frederic Weisbecker
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- From: Mathieu Desnoyers
- Re: [PATCH 08/12] add trace events for each syscall entry/exit
- Prev by Date: Re: [PATCH] tracing/profile: Fix profile_disable vs module_unload
- Next by Date: Re: RFC: THE OFFLINE SCHEDULER
- Previous by thread: Re: [PATCH 08/12] add trace events for each syscall entry/exit
- Next by thread: Re: [PATCH 08/12] add trace events for each syscall entry/exit
- Index(es):
Relevant Pages
|