Re: Linux Memory / Process Management on x86

Stephan Berger wrote:

--- clip clip ---

A OS running on x86 architecture in protected mode has to use segmentation
of memory and can use paging.
Linux (2.6.XX) bypases segmentation by setting up 4 segments with the same
base and length register 0 - 4GB, so basically a logical address can be
translated to a linear address by "ignoring" everything above bit 31. The
Linux kernels "knows" for every "existing" process the value to put in the
CR3 register (Page Directory Base Register). The linear address is
translated to a physical address by the MMU by
1) looking for the correct page directory pointer (CR3 + Bit 31
and 30)
2) looking for the correct page directory entry (Pointer + Bit 29
to 21)
3) using the PDE to look up the correct page table entry (Bit
31 to 12 of PDE entry + Bit 20 to 12 of linear address)
4) using the PTE to generate physikal address (basically the same way as
in 3, Bit 11 to 0 of linear address)
PDE and PTE leave some space (the LSBs) for managment information, like
r/w, super user page, dirty bit and so on.
(Am I right so far? I hope so... ;)


So that's were, as we say in germany, "the dog is burried" (which might
not make any sense in english ;): The Kernel/MMU can protect __pages__ by only allowing read access or super
user access or mark them as not available. The MMU generates a exception
and the Kernel can "react" to that (swap in page or send a SIGSEGF signal
to the process in case of "wrong write" and so on). Other processes are
protected because each process has a unique CR3 value which points to it's
page directory, thus it's impossible to access a physical address that
does not belong to the process (the values to calculate those are simply
not in the PDEs/PTEs). Sharing memory is easy, too: Put the correct values
in the PDEs/PTEs and different processes can access the same library at
the same location in physicall memory.

'Des Pudels Kern - ja'.

OK, so here are my conclusions, bases on what I have written above:

It is not possible to protect a _single_ byte in phys. memory by means of
the MMU, this can only be done by software (on x86!).
Linux - the kernel - doesn't protect a "single byte in memory" from being
written to if that byte is located in a page that is writeable.

The hardware cannot protect in smaller units than a page, as
the other limiting mechanism in segments is effectively wide open.

The Copy-On-Write mechanism actually could be misused to protect
smaller units than a page, but there are currently no data structures
in the kernel for it. I also doubt that there will be, due to the
high overhead associated with such mechanisms.

If I malloc(), say, one byte, I get an address "inside" the heap. Neither the MMU, nor the Linux kernel, nor "malloc()" can/do stop me from
writing to (address + X) unless, by doing that, I generate a linear
address which "leads" to a "different" page which is marked read only. The
Linux kenel and the C Library (C++?) do not provide that strong protection
because (I guess) it's too expensive (speed, space). On the other hand,
"managed code" (Java, C#) can provide such strong protection because the
code runs "inside" a VM which implements the possibility to protect a
single byte (by creating extra managment information for each process).

The C code will happily let you to shoot yourself in the
foot if you properly request it. You're right about the heavy
overhead associated with it. It could be implemented with lighter
(but not zero) overhead using the segment hardware in the processor.
However, it would require heavy modifications to the binary utilities
(assembler, linker & co) and program loader.

Sharing libraries: "call printf" is basically a "call <address>", only
that the compiler does not put "printf" somewhere in the executable and
"printf" is a symbolic name for a address inside the code segment of the
executable (unless the executable is linked statically). Instead the
"dynamic loader" "knows" where the code for printf is located, sets up a
PDE (in the Page Directory of the process) which points to a PTE (probably
not in the Page Table of the provess) which points to the location of the
printf code and replaces "printf" in each "call printf" with the correct

Not so complicated - the linkage vector is in the data segment
of the process. The pages will be set up by the virtual memory
system at first access. Remember that the Linux memory management
uses demand paging. What is set up is the permission to access
the pages in the mapped shared library addresses, so the page
fault is a page request and not an illegal access.

Process memory layout: Is it (generally) correct to assume that the
address space for a process is set up like this, when you are looking at
the Page Director Table:

PDT: linear address:
--------------------------------- <- XX1...1X...X (Bit 29 to 21 set)
PDEs to stack pages, r/w
PDEs marked as
"not valid", access
leads to exception
PDEs fou heap, r/w
--------------------------------- <- XX0..01?1X..X PDE for Code, Constants
--------------------------------- <- XX0..0X...X (Bit 29 to 21 not set)

Would it be possible, though, to change this? For example, put data and
stack on bottom and code in the middle, although this might be rather
stupid? I'd only need a compiler and libraries which would generate the
linear address according to the scheme I've mentiond, right? The
kernel/MMU doesn't care which pages I mark as, r/w, read only or
executable, right?
Or wrong, the kernel enforces (somehow) a consistent layout for all

The kernel is prepared to run executables in the standard
layout. You'll need modifications to the process handling
code and the linker scripts to make any changes - but why?

You can check the process address layout by reading some of the
/proc/pid/maps files (replace pid by the process number).

Maybe you should read the book

Daniel P. Bovet, Marco Cesati, Understanding the Linux Kernel,
3rd ed, O'Reilly, ISBN 0-596-00565-2.



Tauno Voipio
tauno voipio (at) iki fi