Re: Allocating kernel memory

From: Kasper Dupont (kasperd_at_daimi.au.dk)
Date: 05/11/04


Date: Tue, 11 May 2004 18:03:36 +0200

George Nelson wrote:
>
> Wrong. That may be what you intended but not what you said and I
> quote:
>
> "Your case sounds like you just wrote your code without
> understanding
> the design"

OK, so the choice of words was ambiguous.

>
> > And most of such applications belongs in user mode, not
> > in the kernel.
>
> True, but not necessarily all of them.

The evidence you have provided so far indicates a
user mode implementation would be the best choice.
Until you provide some more informations, I will
have to assume that is the case.

>
> Should I take this to mean that I am required to know the detailed
> implementation of a piece of code before I can use it?

No, but then you will need to learn it along the
way, and be prepared for surprises.

> If so pardon my
> gross presumption in assuming that an understanding of the programming
> interfaces and the limitations of those interfaces is not sufficient
> to develop code. My code works perfectly well as it is but I could
> extract better performance with more memory since I maintain a cache
> of data to reduce the amount of real I/O I perform.

The kernel have a disk cache, which is not restricted
by the address space limitation. On a machine with 8GB
of physical RAM more than 7GB can be used by the disk
cache.

Your cache should use the same principles to use the
desired amount of physical RAM for cache.

> The fact that I
> was unaware that there is an (artificial) limit on the amount of
> memory available within the kernel does not imply I do not understand
> the system, rather it shows that I did not see the need to know the
> design details of a kernel system in order to develop my code.

It clearly shows you did not understand all the kernel
design details relevant to your code.

>
> >
> > > I could have adopted an alternative
> > > approach of simply reserving a large amount of memory to my driver at
> > > boot time and handle the management of it myself completely.
> >
> > That wouldn't change anything. You could allocate some
> > physical memory that way, but it would still not be
> > mapped in your address space. Essentially this means
> > allocating high memory pages would work at least as
> > well as allocating the memory at boot. It is possible
> > to allocate high memory pages and map them into kernel
> > address space. But there is a limit to the number of
> > such pages that can be mapped at a time.
>
> So you are saying that the kernel does not map all of physical memory
> into its address space!

Yes. The kernel did map all physical memory back in the
days when physical memory was small enough to fit in
the address space. But memory has become larger, and
the address space remained unchanged.

> Without looking at the code details, I find
> this hard to believe.

With PAE you can have 64GB of physical memory, and
only 4GB of address space, which is to be shared between
kernel and user space. I find it hard to belive any
kernel for that architecture will map all physical
memory into the address space.

>
> >
> > > I
> > > considered this was not a good approach and would have made the driver
> > > larger and more complex than needed so instead opted to use the memory
> > > management facilities available and hit upon this problem.
> >
> > The memory management present in the kernel has some
> > features that would allow you to access more memory.
> > Why don't you use them?
> >
>
> Such as? I am aware of kmalloc, get_free_pages and vmalloc. Since my
> main requirement is buffers for I/O I require contiguous physical
> address space which kmalloc does not provide. kmalloc does not provide
> large enough areas and that leaves get_free_pages. If there is
> something I have missed then I would be grateful to know what it is.

I'm not 100% sure what function it is you need to
use here. But I think it is get_free_pages. Clearly
kmalloc is not an option as it allocates only from
the memory permanently mapped in kernel address
space.

get_free_pages takes a flags argument, some of the
flags will tell which zone to allocate from. On x86
Linux the normal sizes of the zones are:

zone 1 - 16MB
zone 2 - 880MB
zone 3 - the rest (at most 64640MB)

And kmalloc will use zone 2 unless explicitly told
to use zone 1.

If you allocate pages from zone 3, you may need to
map them into kernel address space. So it might be
easier to use alloc_pages directly. Then you can
use kmap and kunmap to map the page when you need it.

(I haven't tried it, I don't have root permissions
on any computer with more than 512MB of RAM)

>
> I may not understand the architecture in detail but I do know that
> each process can have a 4GB virtual address space (I checked this in
> the Intel architecture manual).

Yes, but those 4GB are split into user and kernel
space. (The kernel space is shared between all
processes).

> According to the book The Linux Kernel
> (admittedly covering 2.2 kernel and the only 2.4 changes mentioned is
> support for the PAE to handle upto 64GB of physical memory), 3GB of
> the user address space is accessible by user processes or the kernel
> while the remaining 1GB is accessible to the kernel only.

Those 3GB being talked about is the user space.
The 1GB is the kernel space. AFAIR high memory
support was first in mainstream kernels starting
with 2.4. In 2.2 you could not use more physical
memory than you could fit into kernel address
space. (But there were bigmem patches for 2.2)

> There may be
> some performance benefits from the scheme that is used in Linux
> although these escape me for the moment.

If you want to use the same linear addresses for
both user and kernel space, you will need to
replace the page tables when switching between
user mode and kernel mode. Switching page tables
is expensive (Linux does all it can to avoid
those switches whenever they are not strictly
necesarry). And kernel code not having access to
user space will make copying of data between
user and kernel space more expensive.

>
> I am well aware of the real world. I work in it all the time. Having
> just replaced the motherboard on my system (the old one was less than
> two years old) and had to ditch my processor, memory and soundcard
> with the graphics adaptor just escaping the bin, there appears to be
> little attempt in the H/W arena to address backward compatibility
> issues in the PC arena.

Tell me about it. I recently had to replace a motherboard.
But since it was impossible to find a new motherboard that
would fit with my case, powersuply, CPU, RAM, GFX board,
all of those had to be replaced as well.

> As for software, I agree backward compatibilty
> is a prime design criteria but the memory managment features needed to
> support paging and a 4GB virtual address space in the x86 architecture
> have been there at least since the 386. Therefore the initial design
> was not hampered by earlier hardware constraints. The only changes
> have been the addition of a futher 2 H/W segment registers but the
> basic memory management H/W is as it was when introduced.

The FS and GS registers were introduced already with the
386. The only new things happening since then was support
for 4MB pages and the addition of PAE. And now AMD64 have
eliminated all the restrictions we have been discussing.

I haven't used any AMD64 system yet, but AFAIK it should
give old 32 bit programs a full 4GB of user space while
the kernel can get even more (assuming the kernel code is
64 bit). And it doesn't have the performance problems of
PAE.

Since it must have been necesarry to redesign the page
tables and segment descriptors, I guess the design have
even been cleaned up a bit. But I don't know about that.

-- 
Kasper Dupont -- der bruger for meget tid paa usenet.
For sending spam use abuse@mk.lir.dk and kasperd@mk.lir.dk
I'd rather be a hammer than a nail.


Relevant Pages

  • Re: Sharing memory between kernelspace and userspace
    ... deallocate, on a totally dynamic basis, userspace ... Let userspace allocate shared memory visible to multiple ... and pass that into the kernel for it to write to. ...
    (Linux-Kernel)
  • [UNIX] Linux Kernel do_brk() Vulnerablility (Explained)
    ... Get your security news from a reliable source. ... A critical security bug has been found in the Linux kernel 2.4.22 (and ... earlier) memory management subsystem. ... for the code working at the lowest privilege level. ...
    (Securiteam)
  • kernel panic - not syncing: out of memory and no killable processes
    ... Kernel panic - not syncing: Out of memory and no killable processes... ... Re-tune the scheduler latency defaults to decrease worst-case latencies ... # Device Drivers ...
    (Linux-Kernel)
  • Re: OS-question
    ... MacOS X supports>4GB of RAM on IA32 and so does FreeBSD but I don't know ... Linux can use up to 64 Gigabytes of physical memory on x86 systems. ... If you are compiling a kernel which will never run on a machine with ...
    (comp.arch)
  • Re: Cached memory never gets released
    ... Stock linux 2.4.26 kernel. ... Due to flash bug 3M of memory gets lost due to font memory getting lost ... The output of "free" cache number steadily grows. ... longer to exhaust all of system memory with the cache. ...
    (Linux-Kernel)