Re: [PATCH 0/7] dlm: overview

From: Lars Marowsky-Bree (lmb_at_suse.de)
Date: 04/28/05

  • Next message: Alan Cox: "Re: [Question] Does the kernel ignore errors writng to disk?"
    Date:	Thu, 28 Apr 2005 16:57:15 +0200
    To: Daniel Phillips <phillips@istop.com>
    
    

    On 2005-04-27T18:38:18, Daniel Phillips <phillips@istop.com> wrote:

    > Uuids's at this level are inherently bogus, unless of course you have more
    > than 2**32 cluster nodes. I don't know about you, but I do not have even
    > half that many nodes over here.

    This is not quite the argument. With that argument, 16 bit would be
    fine. And even then, I'd call you guilty of causing my lights to flicker
    ;-)

    The argument about UUIDs goes a bit beyond that: No admin needed to
    assign them; they can stay the same even if clusters/clusters merge (in
    theory); they can be used for inter-cluster addressing too, because they
    aren't just unique within a single cluster (think clusters of clusters,
    grids etc, whatever the topology), and finally, UUID is a big enough
    blob to put all other identifiers in, be it a two bit node id, a
    nodename, 32bit IPv4 address or a 128bit IPv6.

    This piece is important. It defines one of the fundamental objects in
    the API.

    I recommend you read up on the discussions on the OCF list on this; this
    has probably been one of the hottest arguments.

    > > > > - How are the communication links configured? How to tell it which
    > > > > interfaces to use for IP, for example?
    > > >
    > > > CMAN provides a PF_CLUSTER. This facility seems cool, but I haven't got
    > > > much experience with it, and certainly not enough to know if PF_CLUSTER
    > > > is really necessary, or should be put forth as a required component of
    > > > the common infrastructure. It is not clear to me that SCTP can't be used
    > > > directly, perhaps with some library support.
    > >
    > > You've missed the point of my question. I did not mean "How does an
    > > application use the cluster comm links", but "How is the kernel
    > > component told which paths/IPs it should use".
    >
    > I believe cman gives you an address in AF_CLUSTER at the same time it hands
    > you your event socket. Last time I did this, the actual mechanism was buried
    > under a wrapper (magma) so I could have that got that slightly wrong.
    > Anybody want to clarify?

    This still doesn't answer the question. You're telling me how I get my
    address in AF_CLUSTER. I was asking, again: "How is the kernel component
    configured which paths/IP to use" - ie, the equivalent of ifconfig/route
    for the cluster stack, if you so please.

    Doing this in a wrapper is one answer - in which case we'd have a
    consistent user-space API provided by shared libraries wrapping a
    specific kernel component. This places the boundary in user-space.

    This seems to be a main point of contention, also applicable to the
    first question about node identifiers: What does the kernel/user-space
    boundary look like, and is this the one we are aiming to clarify?

    Or do we place the boundary in user-space with a specific wrapper around
    a given kernel solution.

    I can see both or even a mix, but it's an important question.

    > Since cman has now moved to user space, userspace does not tell the kernel
    > about membership,

    That partial sentence already makes no sense. So how does the kernel
    (DLM in this case) learn about whether a node is assumed to be up or
    down if the membership is in user-space? Right! User-space must tell
    it.

    Again, this is sort of the question of where the API boundary between
    published/internal is.

    For example, with OCFS2 (w/user-space membership, which it doesn't yet
    have, but which they keep telling me is trivial to add, but we're
    busying them with other work right now ;-) it is supposed to work like
    this: When a membership event occurs, user-space transfers this event
    to the kernel by writing to a configfs mount.

    Likewise, the node ids and comm links the kernel DLM uses with OCFS2
    are configured via that interface.

    If we could standarize at the kernel/user-space boundary for clustering,
    like we do for syscalls, this would IMHO be cleaner than having
    user-space wrappers.

    > Can we have a list of all the reasons that you cannot wrap your heartbeat
    > interface around cman, please?

    Any API can be mapped to any other API. That wasn't the question. I was
    aiming at the kernel/user-space boundary again.

    > > ... which is why I asked the above questions: User-space needs to
    > > interface with the kernel to tell it the membership (if the membership
    > > is user-space driven), or retrieve it (if it is kernel driven).
    > Passing things around via sockets is a powerful model.

    Passing a socket in to use for communication makes sense. "I want you to
    use this transport when talking to the cluster". However, that begs the
    question whether you're passing in a unicast peer-to-peer socket or a
    multicast one which reaches all of the nodes, and what kind of
    security, ordering, and reliability guarantees that transport needs to
    provide.

    Again, this is (from my side) about understanding the user-space/kernel
    boundary, and the APIs used within the kernel.

    > > This implies we need to understand the expected semantics of the kernel,
    > > and either standarize them, or have a way for user-space to figure out
    > > which are wanted when interfacing with a particular kernel.
    > Of course, we could always read the patches...

    Reading patches is fine for understanding syntax, and spotting some
    nits. I find actual discussion with the developers to be invaluable to
    figure out the semantics and the _intention_ of the code, which takes
    much longer to deduce from the code alone; and you know, just sometimes
    the code doesn't actually reflect the intentions of the programmers who
    wrote it ;-)

    Sincerely,
        Lars Marowsky-Brée <lmb@suse.de>

    -- 
    High Availability & Clustering
    SUSE Labs, Research and Development
    SUSE LINUX Products GmbH - A Novell Business
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Alan Cox: "Re: [Question] Does the kernel ignore errors writng to disk?"

    Relevant Pages

    • Re: + proc-fix-the-threaded-proc-self.patch added to -mm tree
      ... much we kernel folks suck at defining new APIs. ... rare case where we touch an API it's a catastrophy most of the time. ... Most of user-space is worse. ... constructive and cut out the insults towards others. ...
      (Linux-Kernel)
    • Re: [PATCH 0/7] dlm: overview
      ... > aren't just unique within a single cluster (think clusters of clusters, ... How the configuration gets from the config file to kernel is a mystery to me ... By a message over a socket, ... Let's have no magical filesystems in the core interface please. ...
      (Linux-Kernel)
    • Re: [2.6 patch] schedule obsolete OSS drivers for removal
      ... > recommend to use ALSA API directly with apps. ... At the same time the kernel API itself should be suitable to ... > I, at least, have never thought that the OSS _API_ would die. ...
      (Linux-Kernel)
    • Re: The Linux Staging tree, what it is and is not.
      ... Powerlink) driver in staging. ... BTW, the implementation does not follow the kernel style guide, because our company has its own code style guide. ... But it is no easy task to find a common API for all field busses. ... userspace solution like outlined above. ...
      (Linux-Kernel)
    • Re: Out of tree module using LSM
      ... How about becoming involved in creating that official API? ... Every time we make a change in the kernel code, ... effort to engage with the kernel community, ... I suggest you review Greg KH's excellent document on the issue: ...
      (Linux-Kernel)