Re: nonlinear swapping w/o pte_chains [Re: VMA_MERGING_FIXUP and patch]

From: Andrea Arcangeli (andrea_at_suse.de)
Date: 03/24/04

  • Next message: Amit S. Kale: "Re: [Kgdb-bugreport] kgdb_arch_set/remove_break() ?"
    Date:	Wed, 24 Mar 2004 15:37:29 +0100
    To: Hugh Dickins <hugh@veritas.com>
    
    

    On Wed, Mar 24, 2004 at 10:12:58AM +0000, Hugh Dickins wrote:
    > On Tue, 23 Mar 2004, Andrea Arcangeli wrote:
    > >
    > > I don't think I can use the tlb gather because I've to set the pte back
    > > immediatly, or can I? The IPI flood and huge pagetable walk with total
    > > destruction of the address space with huge mappings will be very bad in
    > > terms of usability during swapping of huge nonlinear vmas, but hey, if
    > > you want to swap smoothly, you should use the vmas.
    >
    > Thanks a lot for the preview (or would have been a preview if I'd been
    > awake - and now I've found it easiest to look at 2.6.5-rc1 patched with
    > the 2.6.5-rc1-aa2 objrmap and anon_vma you pointed Martin to in other
    > mail, which includes your latest fixes).
    >
    > I think you're being too harsh on the nonlinear vmas! I know you're
    > not keen on them, but punishing them this hard! If I read it right,
    > page_referenced will never (unless PageReferenced, or mapped into
    > a nonlinear also) report a page from a nonlinear vma as referenced
    > (I do agree with that part). So they'll soon reach try_to_unmap,
    > and each one which gets there will cause every page in every nonlinear
    > vma of that inode to be unmapped from the nonlinears right then?
    > Yes, that'll teach 'em to use sys_remap_file_pages without VM_LOCKED.

    Yep ;)

    > For mine I'll try to carry on with the less draconian approach I
    > started yesterday, scanning just a range each time (rather 2.4 style).

    That will DoS real life, that's why I had to be draconian. after you
    finished I'll send a testcase to test, that is a real life testcase not
    an exploit. The only way to dominate complexity with a pagetable scan is
    to do what 2.4 is doing, that is to drop all ptes we find it in our way
    so the vm will stop calling try_to_unmap, we must avoid walking the vma
    more than once to swap it out. This will cause a minor fault flood but
    that's ok, it doesn't need to be fast at swapping.

    > At the very least, I think your unmap (and mine) needs to
    > ptep_test_and_clear_young just before unmap_pte_page, and back out if
    > the page is young (referenced). I was going to recommend that anyway:
    > at last got around to considering that issue of whether the failed
    > trylocks should report referenced or not (return 1 or 0). Looking at
    > how shrink_list goes, even before 2.6.5-rc1, I'd expect it to behave
    > better your way (proceed to try_to_unmap, which will rightly say
    > SWAP_AGAIN if it fails the same trylock) than how it was before in
    > objrmap; but that will behave better with a ptep_test_and_clear_young
    > check first too.

    cute, I agree we should recheck the young bit inside.

    > Sorry to see the #if VMA_MERGING_FIXUPs are still there. I've a
    > growing feeling that it won't make enough difference when they're
    > gone. But maybe you have a cunning plan to merge all the anon_vmas
    > which would result from an mmap next page, write data in, mprotect ro,
    > mmap next page, write data in, mprotect ro, ..... workload.

    problem is that mprotect (and mremap) meging is low prio compared to
    nonlinear==mlock and i_mmap{shared} complexity, so it'll address it only
    after I've a scalable swapping for huge i_mmap{shared} list too, which
    is a pre-requisite for merging, mprotect merging doesn't sounds
    prerequisite, though I certainly agree we should fixup it soon (and
    after we fix it it'll work for files too, something that never worked
    todate, and I feel it'll be as important for files as it was so far for
    anon ram, and nobody complained yet that it's not enabled for files ;).
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Amit S. Kale: "Re: [Kgdb-bugreport] kgdb_arch_set/remove_break() ?"