Re: Creating tasks on restart: userspace vs kernel
- From: Alexey Dobriyan <adobriyan@xxxxxxxxx>
- Date: Wed, 15 Apr 2009 01:01:50 +0400
On Tue, Apr 14, 2009 at 04:10:53PM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
In the end correctness of chopping will be equal to how good userI want to extend existing kernel interface to leverage fork/clone
understands that two task_struct's are independent of each other.
But it will still be a useful tool for many use cases, like batch cpu jobs,You want to do small part in userspace and consequently end up with hacks
some servers, vnc sessions (if you want graphics) etc. Imagine you run
'octave' for a week and must reboot now - 'octave' wouldn't care if
you checkpointed it and then restart with a different pid !
<3> Clone with pid:
To restart processes from userspace, there needs to be a way to
request a specific pid--in the current pid_ns--for the child process
(clearly, if it isn't in use).
Why is it a disadvantage ? to Linus, a syscall clone_with_pid()
"sounds like a _wonderful_ attack vector against badly written
user-land software...". Actually, getting a specific pid is possible
without this syscall. But the point is that it's undesirable to have
this functionality unrestricted.
So one option is to require root privileges. Another option is to
restrict such action in pid_ns created by the same user. Even more so,
restrict to only containers that are being restarted.
both userspace-visible and in-kernel.
from user space, AND to allow the flexibility mentioned above (which
you conveniently ignored).
All hacks are in-kernel, aren't they ?
mktree.c can be vieved as hack, why not?
Lol .. I meant "all kernel hacks are in-kernel" :)
The whole existence of these requirements. You want new syscall or SET_NEX_PID
or /proc file or something.
Or embed it into a restart(2) call with special argument.
As for asking for a specific pid from user space, it can be done by:
* a new syscall (restricted to user-owned-namespace or CAP_SYS_ADMIN)
* a sys_restart(... SET_NEXT_PID) interface specific for restart (ugh)
* setting a special /proc/PID/next_id file which is consulted by fork
/proc/*/next_id was disscussed and hopefully died, but no.
and in all cases, limit this so it can only allowed in a restarting
container, under the proper security model (again, e.g., Serge's
suggestion).
Pids aren't special, they are struct pid, dynamically allocated andThey are special. And I allow them not to be restored, as well, if
refcounted just like any other structtures.
They _become_ special for you intended method of restart.
the use case so wishes.
The use case is to restore as much as possible to the same state as
equal as possible. Not going with fork_with_pid() in any form helps
kernel to ensure correctness of restore and helps to avoid surprise
failure modes from user POV.
You also have flags in nsproxy image (or where?) like "do clone withNope. Read the code.
CLONE_NEWUTS".
Which code?
static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
{
...
new_uts = cr_obj_add_ptr(ctx, nsproxy->uts_ns,
&hh->uts_ref, CR_OBJ_UTSNS, 0);
if (new_uts < 0) {
ret = new_uts;
goto out;
}
hh->flags = 0;
if (new_uts)
===> hh->flags |= CLONE_NEWUTS;
ret = cr_write_obj(ctx, &h, hh);
...
This is unneeded!That's exactly how it's done.
nsproxy (or task_struct) image have reference (objref/position) to uts_ns image.
On restart, one lookups object by reference or restore it if needed,
takes refcount and glue. Just like with every other two structures.
Not for uts_ns and future namespaces.
ret = cr_restore_utsns(ctx, hh->uts_ref, hh->flags);
^^^^^^^^^
comes from disk
Where else would it come from ? that's part of the state saved during
checkpoint.
This is bogus part saved during checkpoint.
To restore nsproxy you only need references to uts_ns, ipc_ns, mnt_ns,
pid_ns and net_nsm, no flags.
You can try to be smart and, consequently, end up with checks where
a) flags tell to unshare uts_ns, but reference is the same, and
b) flags don't tell to unshare, but reference is different.
This is unneeded code coming from the way you restore nsproxy
incorrectly.
That's for nested UTS namespaces,
Just to clear terminology, UTS namespaces aren't nested, only
PID namespaces are.
where a task in container called unshare().--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- Creating tasks on restart: userspace vs kernel
- From: Oren Laadan
- Re: Creating tasks on restart: userspace vs kernel
- From: Alexey Dobriyan
- Re: Creating tasks on restart: userspace vs kernel
- From: Oren Laadan
- Re: Creating tasks on restart: userspace vs kernel
- From: Alexey Dobriyan
- Re: Creating tasks on restart: userspace vs kernel
- From: Oren Laadan
- Creating tasks on restart: userspace vs kernel
- Prev by Date: [PATCH 1/2] Add block_write_full_page_endio for passing endio handler
- Next by Date: [PATCH 2/2] ext3: add data=guarded mode
- Previous by thread: Re: Creating tasks on restart: userspace vs kernel
- Next by thread: C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel)
- Index(es):
Relevant Pages
|