Re: [PATCH 0/3]HTLB mapping for drivers (take 2)



On Wed, Aug 19, 2009 at 11:05 AM, Mel Gorman<mel@xxxxxxxxx> wrote:
On Wed, Aug 19, 2009 at 05:48:11PM +1200, Alexey Korolev wrote:
Hi,

It sounds like this patch set working towards the same goal as my
MAP_HUGETLB set.  The only difference I see is you allocate huge page
at a time and (if I am understanding the patch) fault the page in
immediately, where MAP_HUGETLB only faults pages as needed.  Does the
MAP_HUGETLB patch set provide the functionality that you need, and if
not, what can be done to provide what you need?


Thanks a lot for willing to help. I'll be much appreciate if you have
an interesting idea how HTLB mapping for drivers can be done.

It is better to describe use case in order to make it clear what needs
to be done.
Driver provides mapping of device DMA buffers to user level
applications.

Ok, so the buffer is in normal memory. When mmap() is called, the buffer
is already populated by data DMA'd from the device. That scenario rules out
calling mmap(MAP_ANONYMOUS|MAP_HUGETLB) because userspace has access to the
buffer before it is populated by data from the device.

However, it does not rule out mmap(MAP_ANONYMOUS|MAP_HUGETLB) when userspace
is responsible for populating a buffer for sending to a device. i.e. whether it
is suitable or not depends on when the buffer is populated and who is doing it.

User level applications process the data.
Device is using a master DMA to send data to the user buffer, buffer
size can be >1GB and performance is very important. (So huge pages
mapping really makes sense.)


Ok, so the DMA may be faster because you have to do less scatter/gather
and can DMA in larger chunks and and reading from userspace may be faster
because there is less translation overhead. Right?

In addition we have to mention that:
1. It is hard for user to tell how much huge pages needs to be
   reserved by the driver.

I think you have this problem either way. If the buffer is allocated and
populated before mmap(), then the driver is going to have to guess how many
pages it needs. If the DMA occurs as a result of mmap(), it's easier because
you know the number of huge pages to be reserved at that point and you have
the option of falling back to small pages if necessary.

2. Devices add constrains on memory regions. For example it needs to
   be contiguous with in the physical address space. It is necessary to
  have ability to specify special gfp flags.

The contiguity constraints are the same for huge pages. Do you mean there
are zone restrictions? If so, the hugetlbfs_file_setup() function could be
extended to specify a GFP mask that is used for the allocation of hugepages
and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask
mask that is applied to some additional flags so htlb_alloc_mask would be
the default mask unless otherwise specified.

3 The HW needs to access physical memory before the user level
software can access it. (Hugetlbfs picks up pages on page fault from
pool).
It means memory allocation needs to be driven by device driver.


How about;

       o Extend Eric's helper slightly to take a GFP mask that is
         associated with the inode and used for allocations from
         outside the hugepage pool
       o A helper that returns the page at a given offset within
         a hugetlbfs file for population before the page has been
         faulted.

I know this is a bit hand-wavy, but it would allow significant sharing
of the existing code and remove much of the hugetlbfs-awareness from
your current driver.

Original idea was: create hugetlbfs file which has common mapping with
device file. Allocate memory. Populate page cache of hugetlbfs file
with allocated pages.
When fault occurs, page will be taken from page cache and then
remapped to user space by hugetlbfs.

Another possible approach is described here:
http://marc.info/?l=linux-mm&m=125065257431410&w=2
But currently not sure  will it work or not.


Thanks,
Alexey


--
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab


Alexey,

I'd be willing to take a stab at a prototype of Mel's suggestion based
on my patch set if you this it would be useful to you.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
    ... so the buffer is in normal memory. ... is responsible for populating a buffer for sending to a device. ... and associated with the hugetlbfs inode. ... Allocate memory. ...
    (Linux-Kernel)
  • [PATCH] cpm_uart: Fix dpram allocation and non-console uarts
    ... Makes non-console UART work on both 8xx and 82xx ... static unsigned int cpm_uart_tx_empty(struct uart_port *port) ... /* Write back buffer pointer */ ... * Allocate DP-Ram and memory buffers. ...
    (Linux-Kernel)
  • Re: Access violation with heap memory
    ... I'm getting a runtime access violation using heap memory that was ... Now, when I try to access the buffer in the main function, I ... Why are you using 'char'? ... Note that there is no need to allocate storage until you are in the FillBuf, ...
    (microsoft.public.vc.mfc)
  • Re: perfmon2 vector argument question
    ... into a kernel buffer. ... the vector must be copied into a kernel-level buffer. ... because kmalloc/kfree are expensive. ... Another approach that was suggested to me is to allocate on demand but not kfree ...
    (Linux-Kernel)
  • Re: [RFC v2][PATCH 2/9] General infrastructure for checkpoint restart
    ... kmalloc a temporary buffer and flush immediately. ... Only after the container resumes ... (This is also useful in case you want to keep the checkpoint image entirely ... provides a shortcut to allocate space directly on the buffer, ...
    (Linux-Kernel)