Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Steven Rostedt <rostedt@xxxxxxxxxxx>
- Date: Wed, 26 Nov 2008 11:36:42 -0500 (EST)
On Tue, 25 Nov 2008, Eric W. Biederman wrote:
Steven Rostedt <rostedt@xxxxxxxxxxx> writes:
On Tue, 25 Nov 2008, Dave Hansen wrote:
I think the end result was, if this file can only be changed by root, then
we do not need to worry about namespaces. This file is a privileged file
that can only be modified by root.
If someday we decide to let non admin users touch this file, then we would
need to care about this. This file may actually be modified in the future
by users, so this may become an issue.
This really has very little to do with root vs non-root users. In fact,
we're working towards having cases where we have many "root" users, even
those inside namespaces. It is also quite possible for a normal root
user to fork into a new pid namespace. In that case, root simply won't
be able to use this feature because something like:
echo $$ /debugfs/tracing/set_ftrace_pid
just won't work. Let's look at a bit of the code.
+static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
+{
+ if (current->pid != ftrace_pid_trace)
+ return;
+
+ ftrace_pid_function(ip, parent_ip);
+}
One thing this doesn't deal with is pid wraparound. Does that matter?
Should not. This is just a way to trace a particular process. Currently
it traces all processes. If we wrap, then we trace the process with the
new pid. This should not be an issue.
So. Using raw pid numbers in the kernel is bad form. The internal
representation should be struct pid pointers as much as we can make
them.
I would 100% prefer it if ftrace_pid_func was written in terms of struct
pid. That does guarantee you don't have pid wrap around issues.
It almost makes it clear
+static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
+{
+ if (task_pid(current) == ftrace_pid_trace)
+ return;
+
+ ftrace_pid_function(ip, parent_ip);
+}
We don't need locks to access the pid of current.
That version does not bother me. I'm not worried about locks as much as
I am about recursion. If that "task_pid()" ever became a function that is
traced by mcount, then it will end up in a recursive loop, and will crash
the system.
If you want to fix this a bit, instead of saving off the pid_t in
ftrace_pid_trace, you should save a 'struct pid'. You can get the
'struct pid' for a particular task by doing a find_get_pid(pid_t). You
can then compare that pid_t to current by doing a
pid_task(struct_pid_that_i_saved, PIDTYPE_PID). That will also protect
against pid wraparound.
The find_get_pid() is handy because it will do the pid_t lookup in the
context of the current task's pid namespace, which is what you want, I
think.
Nope, we can not call that in this context. ftrace_pid_func is called
directly from mcount, without any protection.
Of course you can't. But at the same time find_get_pid() is always
supposed to be called on the user space pid ingress path.
struct pid *find_get_pid(pid_t nr)
{
struct pid *pid;
rcu_read_lock();
pid = get_pid(find_vpid(nr));
rcu_read_unlock();
return pid;
}
EXPORT_SYMBOL_GPL(find_get_pid);
This means find_get_pid will call mcount which will call ftrace_pid_func,
and back again. This can also happen with rcu_read_{un}lock() and
get_pid() and find_vpid().
We can not do anything special here.
I don't see the whole path. But here is the deal.
1) Using struct pid and the proper find_get_pid() means that a user with
the proper capabilities/permissions who happens to be running in a pid
namespace can call this and it will just work.
2) The current best practices in the current are to:
- call find_get_pid() when you capture a user space pid.
- call put_pid() when you are done with it.
Perhaps that is just:
put_pid(ftrace_pid_trace);
ftrace_pid_trace = find_get_pid(user_provided_pid_value);
This may be fine.
3) If you ultimately want to support the full gamut:
thread/process/process group/session. You will need
to use struct pid pointer comparisons.
4) When I looked at the place you were concerned about races
a) you were concerned about the wrong race.
b) I don't see a race.
c) You were filtering for the tid of a linux task not
the tgid of a process. So the code didn't seem to
be doing what you thought it was doing.
5) I keep thinking current->pid should be removed some day.
So let's do this properly if we can. This is a privileged operation
so we don't need to support people without the proper capabilities
doing this. Or multiple comparisons or anything silly like that. But
doing this with struct pid comparisons seems cleaner and more maintainable. And that
really should matter.
Just so you understand what I'm concerned about:
$ objdump -dr kernel/pid.o
[...]
0000025f <find_get_pid>:
25f: 55 push %ebp
260: 89 e5 mov %esp,%ebp
262: 53 push %ebx
263: e8 fc ff ff ff call 264 <find_get_pid+0x5>
264: R_386_PC32 mcount
268: 89 c3 mov %eax,%ebx
26a: b8 01 00 00 00 mov $0x1,%eax
looking in arch/x86/kernel/entry_32.S:
ENTRY(mcount)
cmpl $0, function_trace_stop
jne ftrace_stub
cmpl $ftrace_stub, ftrace_trace_function
jnz trace
[...]
trace:
pushl %eax
pushl %ecx
pushl %edx
movl 0xc(%esp), %eax
movl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax
call *ftrace_trace_function
looking in kerne/trace/ftrace.c:
if (ftrace_pid_trace >= 0) {
set_ftrace_pid_function(func);
func = ftrace_pid_func;
}
ftrace_trace_function = func;
And we then have
static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
{
if (current->pid != ftrace_pid_trace)
return;
ftrace_pid_function(ip, parent_ip);
}
Now by having ftrace_pid_func call find_get_pid, we have the function flow
of...
schedule() /* any traced function */
--> mcount
--> *ftrace_trace_function == ftrace_pid_func
ftrace_pid_func
--> find_get_pid
--> mcount
--> *ftrace_trace_function == ftrace_pid_func
ftrace_pid_func
--> find_get_pid
[ ad infinitum ]
The comparison must be very careful not to call anything that will also
trace. I can add code to catch this recursion, but this is overhead I do
not want to add. Remember, this is called on every function call.
If we do the work at the time we set ftrace_pid_trace and we can do simple
pointer comparisons in the ftrace_pid_func, I will be happy with that. I'm
still learning about this pid namespace, so I'll probably screw it up a
few more times ;-)
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- [PATCH 0/3] ftrace: new features
- From: Steven Rostedt
- [PATCH 1/3] ftrace: add function tracing to single thread
- From: Steven Rostedt
- Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Andrew Morton
- Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Steven Rostedt
- Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Andrew Morton
- Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Steven Rostedt
- Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Dave Hansen
- Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Steven Rostedt
- Re: [PATCH 1/3] ftrace: add function tracing to single thread
- From: Eric W. Biederman
- [PATCH 0/3] ftrace: new features
- Prev by Date: Re: [PATCH 1/1] HID: Apple alu wireless keyboards are bluetooth devices
- Next by Date: [PATCH for 2.6.28] NAND: fsl_upm: fix build problem with 2.6.28-rc2
- Previous by thread: Re: [PATCH 1/3] ftrace: add function tracing to single thread
- Next by thread: Re: [PATCH 1/3] ftrace: add function tracing to single thread
- Index(es):
Relevant Pages
|