Re: Scaling noise

From: William Lee Irwin III (wli_at_holomorphy.com)
Date: 09/03/03

  • Next message: Robert L. Harris: "nmi errors?"
    Date:	Wed, 3 Sep 2003 14:21:19 -0700
    To: "Martin J. Bligh" <mbligh@aracnet.com>
    
    

    At some point in the past, I wrote:
    >> The sharing matters; e.g. libc and other massively shared bits are
    >> replicated in memory once per instance, which increases memory and
    >> cache footprint(s). A number of other consequences of the sharing loss:

    On Wed, Sep 03, 2003 at 01:48:59PM -0700, Martin J. Bligh wrote:
    > Explain the cache footprint argument - if you're only using a single
    > copy from any given cpu, it shouldn't affect the cpu cache. More
    > importantly, it'll massively reduce the footprint on the NUMA
    > interconnect cache, which is the whole point of doing text replication.

    The single copy from any given cpu assumption was not explicitly made.
    Some of this depends on how the administrator/whoever wants to arrange
    OS instances so that when one becomes blocked on io or otherwise idled
    others can make progress or other forms of overcommitment.

    At some point in the past, I wrote:
    >> The number of systems to manage proliferates.

    On Wed, Sep 03, 2003 at 01:48:59PM -0700, Martin J. Bligh wrote:
    > Not if you have an SSI cluster, that's the point.

    The scenario described above wasn't SSI but independent instances with
    a shared distributed fs. SSI clusters have most of the same problems,
    really. Managing the systems just becomes "managing the nodes" because
    they're not called systems, and you have to go through some (possibly
    automated, though not likely) hassle to figure out the right way to
    spread things across nodes, which virtualizes pieces to hand to which
    nodes running which loads, etc.

    At some point in the past, I wrote:
    >> Pagecache access suddenly involves cross-instance communication instead
    >> of swift memory access and function calls, with potentially enormous
    >> invalidation latencies.

    On Wed, Sep 03, 2003 at 01:48:59PM -0700, Martin J. Bligh wrote:
    > No, each node in an SSI cluster has its own pagecache, that's mostly
    > independant.

    But not totally. truncate() etc. need handling, i.e. cross-instance
    pagecache invalidations. And write() too. =)

    At some point in the past, I wrote:
    >> The limited size of a single instance bounds the size of individual
    >> applications, which at various times would like to have larger memory
    >> footprints or consume more cpu time than fits in a single instance.
    >> i.e. something resembling external fragmentation of system resources.

    On Wed, Sep 03, 2003 at 01:48:59PM -0700, Martin J. Bligh wrote:
    > True. depends on how the processes / threads in that app communicate
    > as to how big the impact would be. There's nothing saying that two
    > processes of the same app in an SSI cluster can't run on different
    > nodes ... we present a single system image to userspace, across nodes.
    > Some of the glue layer (eg for ps, to give a simple example), like
    > for_each_task, is where the hard work in doing this is.

    Well, let's try the word "process" then. e.g. 4GB nodes and a process
    that suddenly wants to inflate to 8GB due to some ephemeral load
    imbalance.

    -- wli
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Robert L. Harris: "nmi errors?"

    Relevant Pages

    • Re: Cached memory never gets released
      ... Stock linux 2.4.26 kernel. ... Due to flash bug 3M of memory gets lost due to font memory getting lost ... The output of "free" cache number steadily grows. ... longer to exhaust all of system memory with the cache. ...
      (Linux-Kernel)
    • Re: Problem: Creating a raw binary string
      ... > While its true that a 64-bit cpu will move twice the data per instruction it ... > Memory bus width plays an important role here and unless it too is widened / ... You are forgetting the two levels of cache in the processor. ... The memory chips are addressed in Row col fashion. ...
      (alt.comp.lang.borland-delphi)
    • Re: Is Greenspun enough?
      ... Most OSes memory map executables directly from the file system so code doesn't pollute the file cache or swap space. ...
      (comp.lang.lisp)
    • Re: Superstitious learning in Computer Architecture
      ... Without a LOT of logic or some other better approach, re-executing the instructions requires re-decoding and it ties up the cache memory bus transferring more data as instructions than the instructions are working on. ... The concept of cache is fundamentally flawed in that it STILL restricts access to one word per clock cycle, when a single modern ALU can easily use 5 plus whatever is eaten up with instruction accesses. ... The size of an optimizing compiler is proportional to the SQUARE of the size of the language times the SQUARE of the complexity of the machine - because all interactions must be considered. ...
      (comp.arch.arithmetic)
    • Re: FPGA-based hardware accelerator for PC
      ... I know that in most cases the CPU ... that it contsins no cache, as BRAMs are too precious resources to be wasted ... The BRAMs are what define the opportunity, ... many threads with full associativity of memory lines using hashed MMU ...
      (comp.arch.fpga)