Re: [PATCH 0/3] bootmem2 III



On Tue, May 13, 2008 at 02:40:44PM +0200, Johannes Weiner wrote:
Hi,

Andi Kleen <andi@xxxxxxxxxxxxxx> writes:

Johannes Weiner wrote:

On Fri, May 09, 2008 at 05:17:13PM +0200, Johannes Weiner wrote:
here is bootmem2, a memory block-oriented boot time allocator.

Recent NUMA topologies broke the current bootmem's assumption that
memory nodes provide non-overlapping and contiguous ranges of pages.
I'm still not sure that's a really good rationale for bootmem2.
e.g. the non continuous nodes are really special cases and there tends
to be enough memory at the beginning which is enough for boot time
use, so for those systems it would be quite reasonably to only
put the continuous starts of the nodes into bootmem.

Hm, that would put the logic into arch-code. I have no strong opinion
about it.

In fact I suspect the current code will already work like that
implicitely. The aliasing is only a problem for the new "arbitary node
free_bootmem" right?

And that alloc_bootmem_node() can not garuantee node-locality which is
the much worse part, I think.

That said the bootmem code has gotten a little crufty and a clean
rewrite might be a good idea.

I agree completely.

The trouble is just that bootmem is used in early boot and early boot is
very subtle and getting it working over all architectures could be a
challenge. Not wanting to discourage you, but it's not exactly the
easiest part of the kernel to hack on.

Bootmem seemed pretty self-contained to me, at least in the beginning.
The bad thing is that I can test only the most simple configuration with
it.

I was wondering yesterday if it would be feasible to enforce
contiguousness for nodes. So that arch-code does not create one pgdat
for each node but one for each contiguous block. I have not yet looked

That re-introduces the concept that a node is not a unit of numa locality,
but one of memory contiguity. The kernel pretty much assumes that a node
exhibits memory locality.

deeper into it, but I suspect that other mm code has similar problems
with nodes spanning other nodes.

One thing we do know is that we already have systems in the wild with
overlapping nodes. PowerPC systems sometimes exhibit this behaviour, the
ones I have seen have node 1 embedded within node 0. x86_64 also enables
this support. This necessitated checks when initially freeing memory
into the allocator to make sure it ended up freed into the right node.
For non-sparsemem configurations these systems have some wasted mem_map,
but otherwise it does work.

Check out NODES_SPAN_OTHER_NODES for the code to avoid miss-placing
memory.

-apw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 0/3] bootmem2 III
    ... Recent NUMA topologies broke the current bootmem's assumption that ... memory nodes provide non-overlapping and contiguous ranges of pages. ... The trouble is just that bootmem is used in early boot and early boot is ... contiguousness for nodes. ...
    (Linux-Kernel)
  • Re: malloc vs calloc
    ... allocate memory(malloc give a contigous block, ... is going to be accessing the returned memory with virtual addressing. ... allocation that guarantees contiguousness (these addresses are applied ... The language knows nothing about DMA, pages, device ...
    (comp.lang.c)
  • Re: malloc vs calloc
    ... allocate memory(malloc give a contigous block, ... is going to be accessing the returned memory with virtual addressing. ... allocation that guarantees contiguousness (these addresses are applied ...
    (comp.lang.c)
  • [rfc][patch 0/3] bootmem2: a memory block-oriented boot time allocator
    ... here is a bootmem allocator replacement that uses one bitmap for all ... The problem is that memory nodes are not anymore garuanteed to be ... Note that this is in no way theoretical only, bootmem suffers from ... certain to reside on one node. ...
    (Linux-Kernel)
  • Re: [Patch] Allocate sparse vmemmap block above 4G
    ... although this function will not be called during memory hotplug time. ... I think you need to rework this to have an arch-specific function ... I believe it is a common practice not to define constant for a Boolean parameter. ... Your declaration here makes it look like a bootmem API ...
    (Linux-Kernel)