Re: [ckrm-tech] Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

From: Paul Jackson (pj_at_sgi.com)
Date: 10/03/04

  • Next message: Jesse: "Re: 2.6.9-rc3 software suspend (pmdisk) stopped working"
    Date:	Sun, 3 Oct 2004 07:11:45 -0700
    To: Paul Jackson <pj@sgi.com>
    
    

    Paul wrote:
    > It's a requirement, I say. It's a requirement. Let the slapping begin ;).

    Granted, to give Andrew his due (begrudgingly ;), the requirement
    to pin processes on CPUs is a requirement of the _implementation_,
    which follows, for someone familiar with the art, from the two
    items:
      1) The requirement of the _user_ that runtimes be repeatable
         within perhaps 1% to 5% for a certain class of job, plus
      2) The cantankerous properties of big honkin NUMA boxes.

    Clearly, Andrew was looking for _user_ requirements, to which I
    managed somewhat unwittingly to back up in my user case scenario.

    I suspect that there is a second user case scenario, with which the Bull
    or NEC folks might be more familiar with than I, that can seemingly lead
    to the same implementation requirement to pin jobs. This scenario would
    involve a customer who has paid good money for some compute capacity
    (CPU cycles and Memory pages) with a certain guaranteed Quality of
    Service, and who would prefer to see this capacity go to waste when
    underutilized rather than risk it being unavailable in times of need.

    However in this case, as Andrew is likely already chomping at the bit to
    tell me, CKRM could provide such guaranteed compute capacities without
    pinning.

    Whether or not a CKRM class would sell to the customers of Bull and
    NEC in lieu of a set of pinned nodes, I have no clue.

      Erich, Simon - Can you introduce a note of reality into my
                     speculations above?

    The third user case scenario that commonly leads us to pinning is
    support of the batch or workload managers, PBS and LSF, which are fond
    of dividing the compute resources up into identifiable subsets of CPUs
    and Memory Nodes that are near to each other (in terms of the NUMA
    topology) and that have the size (compute capacity as measured in free
    cycles and freely available ram) requested by a job, then attaching that
    job to that subset and running it.

    In this third case, batch or workload managers have a long history with
    big honkin SMP and NUMA boxes, and this remains an important market for
    them. Consistent runtimes are valued by their customers and are a key
    selling point of these products in the HPC market. So this third case
    reduces to the first, with its implementation requirement for pinning
    the tasks of an active job to specific CPUs and Memory Nodes.

    For example from Platform's web site (the vendor of LSF) at:
        http://www.platform.com/products/HPC
    the benefits for their LSF HPC product include:
      * Guaranteed consistent and reliable parallel workload processing with
        high performance interconnect support
      * Maximized application performance with topology-aware scheduling
      * Ensures application runtime consistency by automatically allocating
        similar processors

    -- 
                              I won't rest till it's the best ...
                              Programmer, Linux Scalability
                              Paul Jackson <pj@sgi.com> 1.650.933.1373
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Jesse: "Re: 2.6.9-rc3 software suspend (pmdisk) stopped working"

    Relevant Pages

    • Re: Can I do this Query?
      ... Andrew and Uri ... Order No Line No Box Capacity ... Can I get the query to return this information? ... SELECT Orderid,MAXLine FROM #Test GROUP BY Orderid ...
      (microsoft.public.sqlserver.server)
    • Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation
      ... > Andrew> Do we care about that? ... > Andrew> I assume there's a more sensible scenario? ... then the child just crashes with a seg fault. ... > c) app writes to the registered memory region, ...
      (Linux-Kernel)
    • Re: OpenVMS clusters give Windows, Unix thorough thrashing
      ... >>part of what Andrew is saying. ... When doing a TCO study, you have to compare ... >>systems of similar capacity. ...
      (comp.os.vms)
    • Re: Renegade Cavers - Insurance
      ... > Agreed with your first point Andrew, but not this one, we can go on ... > creating scenario after scenario till we find one that fits the bill. ... Can you please quote come context so others can understand to what you're ...
      (uk.rec.subterranea)
    • Re: Can I do this Query?
      ... The capacity will never exceed 1 for any item. ... Harley-Davidson Motor Company ... Andrew and Uri ...
      (microsoft.public.sqlserver.server)