Re: NUMA API - wish list

From: Andi Kleen (ak_at_muc.de)
Date: 05/03/04

  • Next message: Ian McConnell: "4x card in 8x AGP KT600 = locked solid (AGP bug?)"
    To: Zoltan.Menyhart@bull.net
    Date:	Mon, 03 May 2004 15:17:52 +0200
    
    

    Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org> writes:

    > The work load manager / load balancer can negotiate other resource
    > assignment at any time with the application.
    > The work load manager / load balancer is free to move a collection of
    > resources from some NUMA domains to others, provided the application's
    > requirements are still met. (No hard binding.)

    IMHO these are hard research topics that will need considerable
    more work to be automated, if they will ever work automated at all.
    The main problem is that you several conflicting goals: you
    want to use all available CPU power, all available memory,
    all available memory bandwidth and the best average memory latency.
    They all conflict.

    First: basically any more advanced automatic schemes will
    require to go all the way to a full workload manager
    that can move around memory later, because it is near impossible
    to get even two of these goals right in advance.

    I first tried to develop a NUMA scheduler "homenode scheduler" that
    attempted to do a lot of this automatically. I then realized that it
    is just too hard to do and it never worked very well. That is why I
    changed gears and just started with a simple API to let the user tell
    the kernel what he wants.

    The advantage of this is that a lot of complexity is avoided;
    e.g. the NUMA API avoids any need to move memory around.

    Now if somebody comes up with a good design for a workload manager and
    does all the experiments needed to validate it then it could be later
    added. But defering NUMA optimization efforts until this considerable
    task is solved (if it even can be solved) would be a big mistake IMHO.

    > Billing is done accordingly :-)
    >
    > As you do not need to know anything about SCSI LUNs, sector IDs, phy-
    > sical memory maps or the other applications when you compile your kernel,
    > why should an application care for HW NUMA details ?

    There is a big difference between these and NUMA.

    LUNs, sectors, physical memory are all hidden for correctness. For
    that virtualization is fine, because performance is secondary
    after correctness.

    But NUMA knowledge is purely for optimization. And for optimization
    purposes you want to avoid virtualization layers, because they get
    in the way of your optimization efforts.

    When a human does NUMA optimization they usually want to work near the
    bare hardware. And if your dream of a automatic workload manager ever
    worked it would also work on the bare hardware.

    -Andi

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Ian McConnell: "4x card in 8x AGP KT600 = locked solid (AGP bug?)"

    Relevant Pages

    • Migrate pages from a ccNUMA node to another
      ... We are left with Non Uniform Memory Architectures. ... You can make use of the forthcoming NUMA APIs to set up your NUMA environment: ... (e.g. it is a reference benchmark) ... Page migration tries to help you out in these situations. ...
      (Linux-Kernel)
    • Re: VIRTUAL MEMORY
      ... In task manager look at the performance tab and repost the PF Usage ... What virus was ... Hard to know if you dont know about computers I know. ... Are you using one of those memory managers which free up memory. ...
      (microsoft.public.windowsxp.perform_maintain)
    • Re: GC Handle leaks....
      ... Note that you can use many features of SOS in VS 2005, ... to check the .Net memory and related GC counters and the GC Handles for the ... The handles in Task Manager relates to operating system handles, ... It means the GCHandle doesn't permit the object to be collected. ...
      (microsoft.public.dotnet.framework.performance)
    • Re: Physical Memory
      ... Thanks for your help, John. ... As for the services just use the Windows Services Management Console to ... Remote Access Connection Manager ... gig Pentium 4 processor and apparently a half gig of available memory, ...
      (microsoft.public.windowsxp.general)
    • Re: Multicore Is Bad News For Supercomputers
      ... Kerry" wrote in message ... Multicore Is Bad News For Supercomputers ... > that for very high performance requires accessing local memory much ... It's NUMA, but not NUMA like you remember from the Wildire (GS-320, ...
      (comp.os.vms)