Re: [00/17] Large Blocksize Support V3



On (25/04/07 23:37), Christoph Lameter didst pronounce:
On Wed, 25 Apr 2007, Eric W. Biederman wrote:

You are trying to couple something that has no business being coupled
as it reduces the system usability when you couple them.

What I am coupling? The approach solves a series of issues as far as I can
tell.

But that is due to the VM (at least Linus tree) having no defrag methods.
mm has Mel's antifrag methods and can do it.

This is fundamental. Fragmentation when you multiple chunk sizes
cannot be solved without a the ability to move things in memory,
whereas it doesn't exist when you only have a single chunk size.

We have that ability (although in a limited form) right now.


And grouping pages by mobility works best when the majority of memory is
used as page cache and other movable/reclaimable allocations which it be
for the majority of workloads that care about larger blocksizes. If a
failure case is found, the memory partitioning is there to give hard
guarantees until I figure out what went wrong.

Yes you get lots of small request *because* we do not support defrag and
cannot large contiguous allocations.

Lots of small requests are fundamental. If lots of small requests were
not fundamental we would gets large requests scatter gather requests.

That is a statement of faith in small requests? Small requests are
fundamental so we want them?

Ummm the other arches read 16k blocks of contigous memory. That is not
supported on 4k platforms right now. I guess you you move those to vmalloc
areas? Want to hack the filesystems for this?

Freak no. You teach the code how to have a block in multiple physical
pages.

This aint gonna work without something that stores the information about
how the pieces come together. Teach the code.... More hacks.

There are multiple scaling issues in the kernel. What you propose is to
add hack over hack into the VM to avoid having to deal with
defragmentation. That in turn will cause churn with hardware etc
etc.

No. I propose to avoid all designs that have the concept of
fragmentation.

There are such designs? You can limit fragmentation but not avoid it.


Indeed, it can't be eliminated unless all memory is movable which isn't.
That's why grouping pages by mobility keeps migratable+reclaimable memory
in one set of blocks and reclaimable (mainly slab) in a second set on the
knowledge that truely unmovable allocations are rare.

Heuristic it might be, but I expect it'll work well in practice. This sort
of patchset will put the fragmentation avoidance under more pressure than I
was expecting so problems will be found sooner rather than later. It's also
worth bearing in mind that the high-order allocations looked for here are
in the order 3 or 4 level instead of the order-9 and order-10 allocations
that I normally test with and get reasonably high success rates for.

Besides, we've seen that with the normal kernel that order-3 allocations
(e1000 jumbo frames) work longer than one would expect without fragmentation
avoidance and they are atomic allocations as well as everything else. With
fragmentation avoidance, we should be able to handle it although I'll admit
that jumbo frame allocations are nowhere near as long lived. If I'm wrong,
the allocation failure bug reports will roll in in a very obvious manner.

There is an argument for having struct page control more than 4K
of memory even when the hardware page size is 4K. But that is a
separate problem. And by only having one size we still don't
get fragmentation.

We have fragmentation because we cannot limit our allocation sizes to 4k.
The stack is already 8k and there are subsystems that need more (f.e.
for jumbo frames). Then there is the huge page subsystem that is used to
avoid the TLB pressure that comes with small pages.

I think we are doing our typical community thing of running away from the
problem and developing ways to explain why our traditional approaches are
right.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Memory fragmentation issue in kernel mode
    ... My direct I/O requests are asynchronous ones with 4 parallel ... Memory usage of the applications were looking normal. ... fragmentation, which is actually fragmentation of the system virtual ... application which will be accessing my device to test the driver. ...
    (microsoft.public.development.device.drivers)
  • Re: [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9)
    ... allocations signficantly ... Is there a standard test used to evaluate kernel memory fragmentation? ... These tests are not suitable on machines with very large amounts of memory ...
    (Linux-Kernel)
  • Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19
    ... > fragmentation avoidance does not have a major performance penalty. ... It is complexity that is mostly already handled for us with the zones ... If you don't need to guarantee higher order allocations, ... > be under pressure even though there is memory available elsewhere. ...
    (Linux-Kernel)
  • Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19
    ... >> fragmentation. ... > Well I think it can satisfy hugepage allocations simply because ... > Hugepages and memory unplug are the two main reasons for IBM to be ... 256 megabytes of memory to UML, that's half the memory on my laptop and UML ...
    (Linux-Kernel)
  • Re: O_DIRECT question
    ... On 1/12/07, Nick Piggin wrote: ... >>We are talking about about fragmentation. ... Yeah *smallish* higher order allocations are fine, and we use them all the ... memory, so we have to allocate a big block to accommodate it. ...
    (Linux-Kernel)