Re: boot time node and memory limit options
From: Dave Hansen (haveblue_at_us.ibm.com)
Date: 03/17/04
- Previous message: Mark Gross: "Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace"
- In reply to: Martin J. Bligh: "Re: boot time node and memory limit options"
- Next in thread: Jesse Barnes: "Re: boot time node and memory limit options"
- Reply: Jesse Barnes: "Re: boot time node and memory limit options"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: "Martin J. Bligh" <mbligh@aracnet.com> Date: Wed, 17 Mar 2004 09:09:45 -0800
On Wed, 2004-03-17 at 08:36, Martin J. Bligh wrote:
> > The patch I posted was arrived at after some people suggested an
> > architecture independent patch. My patch basically allocates memory
> > from the bootmem allocator before mem_init calls free_all_bootmem_core.
> > It's architecture independent. If the real goal is to limit physical
> > memory before the bootmem allocator is initialized, then my current
> > patch doesn't accomplish this.
>
> Don't we have the same arch dependant issue with the current mem= anyway?
> Can we come up with something where the arch code calls back into a generic
> function to derive limitations, and thereby at least get the parsing done
> in a common routine for consistency? There aren't *that* many NUMA arches
> to change anyway ...
The problem with doing it in generic code is that it has to happen
_after_ the memory layout is discovered. It's a mess to reconstruct all
of the necessary information about where holes stop and start, at least
from the current information that we store. Then, you have to go track
down any information that might have "leaked" into the arch code before
you parsed the mem=, which includes all of the {min,max)_{high,low}_pfn
variable. I prefer to just take care of it at its source where NUMA
information is read out of the hardware.
Every arch has its own way of describing its layout. Some use "chunks"
and others like ppc64 use LMB (logical memory blocks). If each arch was
willing to store their memory layout information in a generic way, then
we might have a shot at doing a generic mem= or a NUMA version.
I coded this up a few days ago to see if I could replace the x440 SRAT
chunks with it. I never got around to actually doing that part, but
something like this is what we need to do *layout* manipulation in an
architecture-agnostic way.
I started coding this before I thought *too* much about it. What I want
is a way to get rid of all of the crap that each architecture (and
subarch) have to store their physical memory layout. On normal x86 we
have the e820 and the EFI tables and on Summit/x440, we have yet another
way to do it.
What I'd like to do is present a standard way for all of these
architectures to store the information that they need to record at boot
time, plus make something flexible enough that we can use it for stuff
at runtime when hotplug memory is involved.
The code I'd like to see go away from boot-time is anything that deals
with arch-specific structures like the e820, functions like
lmb_end_of_DRAM(), or any code that deals with zholes. I'd like to get
it to a point where we can do a mostly arch-independent mem=.
So, here's a little bit of (now userspace) code that implements a very
simple way to track physical memory areas.
stuff that sucks:
- long type names/indiscriminate use of u64
- "section" is on my brain from CONFIG_NONLINEAR, probably don't want
to use that name again
- Doesn't coalesce adjacent sections with identical attributes, only
extends existing ones.
- could sort arrays instead of using lists for speed/space
- can leave "UNDEF" holes
- can't add new sections spanning 2 old ones
-- dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- text/x-c attachment: layout.c__charset_ANSI_X3.4-1968
- text/x-c-header attachment: list.h__charset_ANSI_X3.4-1968
- Previous message: Mark Gross: "Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace"
- In reply to: Martin J. Bligh: "Re: boot time node and memory limit options"
- Next in thread: Jesse Barnes: "Re: boot time node and memory limit options"
- Reply: Jesse Barnes: "Re: boot time node and memory limit options"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|