Re: [RFC][PATCH 04/20] pspace: Allow multiple instaces of the process id namespace



On Mon, Feb 20, 2006 at 05:11:40PM +0300, Kirill Korotaev wrote:
I would disagree with you. These discussions IMHO led us to the wrong
direction.

Can I ask a bunch of questions which are related to other
virtualization issues, but which are not addressed by Eric anyhow?

- How are you planning to make hierarchical namespaces for such
resources as IPC? Sockets? Unix sockets?

in the same way as for resources or filesystems -
management is within the parent, usage within the
child

So taking example with IPC, you propose the following:
- parent is able to setup limits on segments, sizes, messages etc.
- parent doesn't see child objects itself, i.e. it is unable to share
segments with a child, send messages to child etc.
Am I correct?

that is limits, not resources, but for them, you are
right ...

Provided I got it correctly, how does this differ from the situation,
when one container is granted rights to manage another container?
So where is hierarchy?

this is permissions, yet another *space we have not
really covered that well ...

Moreover, granting/revoking rights is more fine grained I suppose. And
it is more secure, since uses the model - allow only things which are
safe, while heirarchy uses model "allow everything" to do with a child
and leads to possible DoS.

DoS of what? of course a hierarhical model would allow
to control 'everything' for/within a child space. this
doesn't mean that it would escape the refinements it
has received from the parent ...

in this way it is _very_ different from one container
being able to administrate a completely different
container (as there is no inheritance), which would
easily cause the beforementioned DoS situation ...

Process tree is hierarchical by it's nature. But what is
heirarchical IPC and other resources?

for resources it's simple, you have to 'give away'
a certain share to your children, which in turn will
enable them to utilize them up to the given amount,
or (depending on the implementation) up to the total
amount of the parent.

Again, how does this differ from the situation when one container is
granted to manage another one? In this case it grant some portion of
it's resources to anyone he wishes.

well, if one container can give away resources to
another one, and can maintain that context, as well
as inspect tasks and resources inside that context,
then the structure is hierarchical in my book :)

Take a look at this from another angle:
You have a child container, which was granted 1/2 of your resources.
But since parent consumed 3/4 of it, child will never be able to get
his 1/2 portion. And child will be unable to find out the reason for
resources allocation denies.

well, this is called overbooking, and if you want
to allow that, you ahve to accept the consequences
(doesn't mean I'm against it :)

(check out ckrm as a proof of concept, and example
how it should not be done :)
let's be more friendly to each other :)

I'm not being 'unfriendly' here, we all agree that
the basic idea of ckrm was really great, but the
implementation and complexity makes it somewhat
unusable ...

And no one ever told me why we need heierarchy at all. No any _real_
use cases. But it's ok.

there are many use cases here, first, the VPS within
a VPS (of course, not the most useful one, but the
most obvious one), then there are all kind of test,
build and security scenarios which can benefit from
hierarchical structures and 'delegated' administrative
power (just think web space management)

If you are talking about management, then see my prev paragraph.
Rights can be granted. Can you provide some other example, what do you
want from hierarchy?

see my reply to your paragraph :)

- Eric wants to introduce name spaces, but totally forgots how much
they are tied with each other. IPC refers to netlinks, network
refers to proc and sysctl, etc. It is some rare cases when you will
be able to use namespaces without full OpenVZ/vserver containers.

well, we already visited the following:

- filesystem namespaces (works quite fine completely
independant of all other)

it is tightly interconnected with unix sockets, proc, sysfs, ipc, and
I'm sure something else :)

well, for procfs and sysfs that does not come
surprisingly, as they _are_ filesystems after
all, named sockets do belong there too, for ipc
I do not see the direct conenction as long as
the elements are not represented by files ...

Herber, Eric, I'm not against namespaces. Actualy OpenVZ doesn't care
whether we have single container or namespaces, I'm just trying to
show you, that all of them are not that separate namespaces as you are
trying to think of them.

I guess all of the involved parties _know_ that
they are somewhat interconencted, and as I said
several times now, I think we should try to split
them and separate 'em wherever possible, sometimes
even by changing existing semantics and structures

- pid spaces (again they are quite fine without any
other namespace)
only if we remove all these pid uses from fown, netlinks etc.

again, some cleanup is probably required ...

- resource spaces (another one which makes perfect
sense to have without the others)
which one? give me an example please.

- cpu resources (scheduler)

the fact that some things like proc or netlink is tied
to networking and ipc IMHO is just a sign that those
interfaces need revisiting and proper isolation or
virtualization ...
it needs virtualization, really. But being virtualized they are still
tied to the subsystems they were.

- How long these namespaces live? And which management interface
each of them will have?
well, the lifetime of such a space is very likely to
be at least the time of the users, including all
created sockets, file-handles, ipc resources, etc ...
So if you have a socket in TCP_FIN_WAIT1 state, which can live long
time, what do you do with it?

IMHO you have two options there:

- zap the sockets when you destroy the context
(a machine which is rebooted will do the same :)

- keep them hanging around for some days, until the
timeout is reached and they go away quite naturally

note: this is something which does not even have to
be a design decision right now, as the zapping can
be done later quite easily ...

Full example: the process dies, but network space is still alive due
to such a socket. You won't be able to reuse the address:port until it
died. I'm curios about how do you propose to handle similar issues in
separate namespaces?

see above, btw, the network space would only be
partially alive, i.e. no new sockets could be created
within it, or it might even be persistant and you can
'handle' the sockets after a restart as if they were
on a 'normal' machine ...

(btw, this issue is quite common on a real machine, no?)

Also as a continuation of this example, if all the processes exited,
how can you manage namespaces which leaked? where should you go to see
these sockets if /proc is tightly related to pspace on the task, but
there are no tasks?

that is where an administrative interfaces makes sense
you can review and investigate those spaces and take
proper actions if you like ...

So I really hate that we concentrated on discussion of VPIDs,
while there are more general and global questions on the whole
virtualization itself.

well, I was not the one rolling out the 'perfect'
vpid solution ...

ha ha :) won't start flaming with you.

First of all, I don't think syscalls like
"do_something and exec" should be introduced.
Linux-VServer does not have any of those syscalls
and it works quite well, so why should we need such
syscalls?
My question is the same! Why?

because we don't need them?!

I have no problem at all to discuss a general plan
(hey I though we were already doing so :) or move
to some other area (like networking :)
Yup. Would be nice to switch to networking, IPC or something like this.

best,
Herbert

Kirill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [RFC][PATCH 04/20] pspace: Allow multiple instaces of the process id namespace
    ... virtualization issues, but which are not addressed by Eric anyhow? ... How are you planning to make hierarchical namespaces for such resources as IPC? ... Unix sockets? ... parent doesn't see child objects itself, i.e. it is unable to share segments with a child, send messages to child etc. ...
    (Linux-Kernel)
  • Re: Child Domain Structure
    ... child subdomian at both locations if there is a dc available or have i ... users a the remote location and possible child domain ... only really need to access a few resources, file shares etc at the central ... parent domain very rarely need to connect to the remote location. ...
    (microsoft.public.windows.server.active_directory)
  • Re: [Xen-devel] Re: [ANNOUNCE] Xen high-performance x86 virtualization
    ... > oversubscribe the resources they've got, ... We use a proportional fair-share scheduler (Borrowed Virtual ... Memory: Each guest is allocated a share of memory when it is started ... other virtualization of x86 that is capable of running full OSes. ...
    (Linux-Kernel)
  • Re: Specifying a resource for mf.sys
    ... Yes PNP does enumrate the child devices, the problem is that I need to ... >> PCMCIA card that is compliant with the PCMCIA Multifunction standard. ... >> child function inf files. ... > enumerated by PnP as individual devices with their own resources? ...
    (microsoft.public.development.device.drivers)
  • Re: History of copy on write
    ... inherited by the child that you do not want to pass to the exec. ... Including resources, file descriptors, that the original programmer did ... should use appropriate security mechanisms at various kinds ...
    (comp.arch)