Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30

From: Daniel Phillips (phillips_at_redhat.com)
Date: 07/10/04

  • Next message: Alexandre Oliva: "Re: GCC 3.4 and broken inlining."
    To: sdake@mvista.com
    Date:	Sat, 10 Jul 2004 16:57:06 -0400
    
    

    On Saturday 10 July 2004 13:59, Steven Dake wrote:
    > > I'm not saying you're wrong, but I can think of an advantage you
    > > didn't mention: a service living in kernel will inherit the
    > > PF_MEMALLOC state of the process that called it, that is, a VM
    > > cache flushing task. A userspace service will not. A cluster
    > > block device in kernel may need to invoke some service in userspace
    > > at an inconvenient time.
    > >
    > > For example, suppose somebody spills coffee into a network node
    > > while another network node is in PF_MEMALLOC state, busily trying
    > > to write out dirty file data to it. The kernel block device now
    > > needs to yell to the user space service to go get it a new network
    > > connection. But the userspace service may need to allocate some
    > > memory to do that, and, whoops, the kernel won't give it any
    > > because it is in PF_MEMALLOC state. Now what?
    >
    > overload conditions that have caused the kernel to run low on memory
    > are a difficult problem, even for kernel components. Currently
    > openais includes "memory pools" which preallocate data structures.
    > While that work is not yet complete, the intent is to ensure every
    > data area is preallocated so the openais executive (the thing that
    > does all of the work) doesn't ever request extra memory once it
    > becomes operational.
    >
    > This of course, leads to problems in the following system calls which
    > openais uses extensively:
    > sys_poll
    > sys_recvmsg
    > sys_sendmsg
    >
    > which require the allocations of memory with GFP_KERNEL, which can
    > then fail returning ENOMEM to userland. The openais protocol
    > currently can handle low memory failures in recvmsg and sendmsg.
    > This is because it uses a protocol designed to operate on lossy
    > networks.
    >
    > The poll system call problem will be rectified by utilizing
    > sys_epoll_wait which does not allocate any memory (the poll data is
    > preallocated).

    But if the user space service is sitting in the kernel's dirty memory
    writeout path, you have a real problem: the low memory condition may
    never get resolved, rendering your userspace service autistic.
    Meanwhile, whoever is generating the dirty memory just keeps spinning
    and spinning, generating more of it, ensuring that if the system does
    survive the first incident, there's another, worse traffic jam coming
    down the pipe. To trigger this deadlock, a kernel filesystem or block
    device module just has to lose its cluster connection(s) at the wrong
    time.

    > I hope that helps atleast answer that some r&d is underway to solve
    > this particular overload problem in userspace.

    I'm certain there's a solution, but until it is demonstrated and proved,
    any userspace cluster services must be regarded with narrow squinty
    eyes.

    > > Though I admit I haven't read through the whole code tree, there
    > > doesn't seem to be a distributed lock manager there. Maybe that is
    > > because it's so tightly coded I missed it?
    >
    > There is as of yet no implementation of the SAF AIS dlock API in
    > openais. The work requires about 4 weeks of development for someone
    > well-skilled. I'd expect a contribution for this API in the
    > timeframes that make GFS interesting.

    I suspect you have underestimated the amount of development time
    required.

    > I'd invite you, or others interested in these sorts of services, to
    > contribute that code, if interested.

    Humble suggestion: try grabbing the Red Hat (Sistina) DLM code and see
    if you can hack it to do what you want. Just write a kernel module
    that exports the DLM interface to userspace in the desired form.

       http://sources.redhat.com/cluster/dlm/

    Regards,

    Daniel
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Alexandre Oliva: "Re: GCC 3.4 and broken inlining."

    Relevant Pages

    • Re: [PATCH] Remove process freezer from suspend to RAM pathway
      ... Atomically sends SIGSTOP to all userspace processes in a non-trappable way, except the calling process and any process which is ptracing it. ... I don't think it matters whether it's userspace or kernel that does the suspending and I'm yet to see a good reason for it to be done from userspace. ... You don't actually care if its sleeping in the kernel somewhere, just as long as it doesn't allocate much memory. ... One CPU turns off all interrupts on itself and takes an atomic snapshot of kernel memory into the previously allocated storage. ...
      (Linux-Kernel)
    • Re: [PATCH] Remove process freezer from suspend to RAM pathway
      ... snapshot). ... swsusp and uswsusp do now) to only half the amount of memory. ... Atomically sends SIGSTOP to all userspace processes in a non- ... Kernel starts freeing memory and swapping stuff out to make ...
      (Linux-Kernel)
    • Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30
      ... >> in userspace, so why not do it there. ... The kernel block device now needs to yell to the user space ... > need to allocate some memory to do that, and, whoops, the kernel won't give ... preallocated so the openais executive (the thing that does all of the ...
      (Linux-Kernel)
    • Re: Sharing memory between kernelspace and userspace
      ... deallocate, on a totally dynamic basis, userspace ... Let userspace allocate shared memory visible to multiple ... and pass that into the kernel for it to write to. ...
      (Linux-Kernel)
    • [UNIX] Linux Kernel do_brk() Vulnerablility (Explained)
      ... Get your security news from a reliable source. ... A critical security bug has been found in the Linux kernel 2.4.22 (and ... earlier) memory management subsystem. ... for the code working at the lowest privilege level. ...
      (Securiteam)