Re: [ofa-general] Re: [GIT PULL] please pull ummunotify



On Mon, Sep 28, 2009 at 10:49:23PM +0200, Pavel Machek wrote:

> I don't remember seeing discussion of this on lkml. Yes it is in
> -next...

eg http://lkml.org/lkml/2009/7/31/197 and followups, or search for v2
and earlier patches.

Well... it seems little overspecialized. Just modifying libc to
provide hooks you want looks like better solution.

That is what MPI people are doing today and their feedback is that it
doesn't work - there are a lot of ways to mess with memory and no good
choices to hook the raw syscalls and keep sensible performance.

The main focus of this is high performance MPI apps, so lower overhead
on critical paths like memory allocation is part of the point. It is
ment to go hand-in-hand with the specialized RDMA memory pinning
interfaces..

> Basically it allows app to 'trace itself'? ...with interesting mmap()
> interface, exporting int to userspace, hoping it behaves atomically...?

Yes, it allows app to trace what the kernel does to memory mappings. I
don't believe there's any real issue to atomicity of mmap'ed memory,
since userspace really just tests whether read value is == to old read
value or not.

That still needs memory barriers etc.. to ensure reliable operation,
no?

No, I don't think so..

The application is expected to provide sequencing of some sort between
the memory call (mmap/munmap/brk/etc) and the int check - usually just
by running in the same thread, or through some kind of locking scheme.

As long as the mmu notifiers run immediately in the same context as
the mmap/etc then it should be fine.

For example, the most common problem to solve looks like this:

x = mmap(...)
do RDMA with x
[..]
mmunmap(x);

[..]
y = mmap(..);
do RDMA with y
if by chance x == y things explode.

So this API puts the int test directly before 'do RDMA with'.

Due to the above kind of argument the net requirement is either to
completely synchronously (and with low overhead) hook every
mmap/munmap/brk/etc call into the kernel and do the accounting work,
or have a very low over head check every time the memory region is
about to be used.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [ofa-general] Re: Demand paging for memory regions
    ... We currently also run XPmem with pinning. ... No reclaim ever iccurs. ... notifier stuff wrt rdma hardware and pinning memory. ...
    (Linux-Kernel)
  • Re: [ofa-general] Re: Demand paging for memory regions
    ... host memory so they are already designed to hang onto packets during ... If the remote side then tries to access the page ... RDMA can proceed after both sides again agree ...
    (Linux-Kernel)
  • Re: SIGBUS: 10 error
    ... > I am debugging a mpi program and got this error. ... SIGBUS errors usually indicate that an invalid memory address was ... memory dereference lies outside your process' address space, ... The p4 error handler obviously lies within the MPI library code, ...
    (comp.parallel.mpi)
  • Re: 16-Node Parallel System
    ... >> to be used by the MPI program and you are done. ... >> the cluster at the same time, it might not be easy to select a free ... >> install out of the CD including SGE. ... >> One thing to mention is the local memory to each CPU for Opterons. ...
    (comp.parallel.mpi)
  • Re: I need some guideance regarding parallel processing
    ... think of the memory layout ... I was turned on to openMP. ... But I am open to learning MPI, PVM, ... OpenMP on a distributed-memory systems. ...
    (comp.lang.fortran)