Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- From: Christoph Raisch <RAISCH@xxxxxxxxxx>
- Date: Fri, 15 Feb 2008 14:22:55 +0100
Dave Hansen <haveblue@xxxxxxxxxx> wrote on 14.02.2008 18:12:43:
On Thu, 2008-02-14 at 09:46 +0100, Christoph Raisch wrote:bus)
Dave Hansen <haveblue@xxxxxxxxxx> wrote on 13.02.2008 18:05:00:
On Wed, 2008-02-13 at 16:17 +0100, Jan-Bernd Themann wrote:virtual
Constraints imposed by HW / FW:
- eHEA has own MMU
- eHEA Memory Regions (MRs) are used by the eHEA MMU to translate
addresses to absolute addresses (like DMA mapped memory on a PCI
packet)- The number of MRs is limited (not enough to have one MR per
tossed
Are there enough to have one per 16MB section?
Unfortunately this won't work. This was one of our first ideas we
It depends on HMC configuration, but in worst case the upper limit is inout,
but the number of MRs will not be sufficient.
Can you give a ballpark of how many there are to work with? 10? 100?
1000?
the 2 digits range.
getBut, I'm really not convinced that you can actually keep this map
yourselves. It's not as simple as you think. What happens if you
vmalloc'don an LPAR with two sections, one 256MB@0x0 and another
16MB@0x1000000000000000. That's quite possible. I think your
systems?array will eat all of memory.I'm glad you mention this part. There are many algorithms out there to
handle this problem,
hashes/trees/... all of these trade speed for smaller memory footprint.
We based the table decission on the existing implementations of the
architecture.
Do you see such a case coming along for the next generation POWER
Dude. It exists *TODAY*. Go take a machine, add tens of gigabytes of
memory to it. Then, remove all of the sections of memory in the middle.
You'll be left with a very sparse memory configuration that we *DO*
handle today in the core VM. We handle it quite well, actually.
The hypervisor does not shrink memory from the top down. It pulls
things out of the middle and shuffles things around. In fact, a NUMA
node's memory isn't even contiguous.
Your code will OOM the machine in this case. I consider the ehea driver
buggy in this regard.
Your comment indicates that the upper limit for memory to be set on HMC
does not influence
the upper limit of the partition physical address space.
So our base assumption we discussed internally is wrong here.
(conclusion see below)
address
I would guess these drastic changes would also require changes in base
kernel.
No, we actually solved those a couple years ago.
Will you provide a generic mapping system with a contiguous virtual
ofspace
like the ehea_bmap we can query? This would need to be a "stable" part
nextgen_ehea_generic_bmapthe implementation,
including translation functions from kernel to
yeslike virt_to_abs.
Yes, that's a real possibility, especially if some other users for it
come forward. We could definitely add something like that to the
generic code. But, you'll have to be convincing that what we have now
is insufficient.
Does this requirement:
"- MRs cover a contiguous virtual memory block (no holes)"
come from the hardware?
Is that *EACH* MR? OR all MRs?each
Where does EHEA_BUSMAP_START come from? Is that defined in theEHEA_BUSMAP_START is a value which has to match between the wqe
hardware? Have you checked to ensure that no other users might want a
chunk of memory in that area?
virtual addresses and the MR used in them.
Fortunately there's a simple answer on that one. Each MR has a own address
space,
so there's no need to check.
A HEA MR actually has exactly the same attributes as a Infiniband MR with
this hardware.
send/receive processing is pretty much comparable to a Infiniband UD queue.
Can you query the existing MRs?no
Not change them in place, but can youno
query their contents?
implemented
That's why we have SPARSEMEM_EXTREME and SPARSEMEM_VMEMMAP
andin the core, so that we can deal with these kinds of problems, once
system*NOT* in every single little driver out there.
Functions to use while building ehea_bmap + MRs:
- Use either the functions that are used by the memory hotplug
newas
(section_nr_to_pfn,well, that means using the section defines + functions
pfn_valid)
Basically, you can't use anything related to sections outside of the
core code. You can use things like pfn_valid(), or you can create
ainterfaces that are properly abstracted.
We picked sections instead of PFNs because this keeps the ehea_bmap in
reasonable range
on the existing systems.
But if you provide a abstract method handling exactly the problem we
mention
we'll be happy to use that and dump our private implementation.
One thing you can guarantee today is that things are contiguous up to
MAX_ORDER_NR_PAGES. That's a symbol that is unlikely to change and is
much more appropriate than using sparsemem. We could also give you a
nice new #define like MINIMUM_CONTIGUOUS_PAGES or something. I think
that's what you really want.
That's definitely the right direction.
From this mail thread I would conclude....memory space can have holes, and drivers shouldn't make any assumption when
where and how.
A translation from kernel to ehea_bmap space should be fast and predictable
(ruling out hashes).
If a driver doesn't know anything else about the mapping structure,
the normal solution in kernel for this type of problem is a multi level
look up table
like pgd->pud->pmd->pte
This doesn't sound right to be implemented in a device driver.
We didn't see from the existing code that such a mapping to a contiguous
space already exists.
Maybe we've missed it.
If the mapping is less random, the translation gets much simpler.
MAX_ORDER_NR_PAGES helps here, is there more like that?
Gruss / Regards
Christoph Raisch + Jan-Bernd Themann
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- From: Dave Hansen
- Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- References:
- [PATCH] drivers/base: export gpl (un)register_memory_notifier
- From: Jan-Bernd Themann
- Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- From: Dave Hansen
- Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- From: Jan-Bernd Themann
- Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- From: Christoph Raisch
- Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- From: Dave Hansen
- [PATCH] drivers/base: export gpl (un)register_memory_notifier
- Prev by Date: Re: [PATCH 00/14] [ISDN] HiSax hotplug conversion
- Next by Date: BUG/ spinlock lockup, 2.6.24
- Previous by thread: Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- Next by thread: Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
- Index(es):
Relevant Pages
|