Re: swapping and the value of /proc/sys/vm/swappiness

From: Marcelo Tosatti (marcelo.tosatti_at_cyclades.com)
Date: 09/16/04

  • Next message: Pavel Machek: "swsusp: solving build issue leads to crash on x86-64"
    Date:	Thu, 16 Sep 2004 15:50:19 -0300
    To: Con Kolivas <kernel@kolivas.org>
    
    

    Con!

    Spent some time reading your patch...

    On Wed, Sep 15, 2004 at 10:22:46AM +1000, Con Kolivas wrote:

    > >>I already answered this. That hard swappiness patch does not really
    > >>rewrite swapping policy. It identifies exactly what has changed because
    > >>it does not count "distress in the swap tendency". Therefore if the
    > >>swappiness value is the same, the mapped ratio is the same (in the
    > >>workload) yet the vm is swappinig more, it is getting into more
    > >>"distress". The mapped ratio is the same but the "distress" is for some
    > >>reason much higher in later kernels, meaning the priority of our
    > >>scanning is getting more and more intense. This should help direct your
    > >>searches.

    Well if "distress" is getting higher (with similar workload/pressure)
    thats because VM is having a harder time freeing pages (priority increases,
    distress increases).

    You say "distress is getting higher in later kernels". Can you expand
    more on that? How did you find this out, and can you be more especific
    wrt "later kernels".

    > >>These are the relevant lines of code _from mainline_:
    > >>
    > >>distress = 100 >> zone->prev_priority
    > >>mapped_ratio = (sc->nr_mapped * 100) / total_memory;
    > >>swap_tendency = mapped_ratio / 2 + distress + vm_swappiness
    > >>if (swap_tendency >= 100)
    > >>- reclaim_mapped = 1;
    > >>
    > >>
    > >>That hard swappiness patch effectively made "distress == 0" always.
    > >
    > >So isnt it true that decreasing vm_swappiness should compensate
    > >distress and have the same effect of your patch?
    >
    > Nope. We swap large amounts with the wrong workload at swappiness==0
    > where we wouldn't before at swappiness==60. ie there is no workaround
    > possible without changing the code in some way.

    "we wouldn't before" refering to older kernel versions?

    I see you add a "z->nr_unmapped" watermark a bit above "z->pages_high",
    and use that to set "pgdat->mapped_nrpages" to what needs to be freed
    so z->free_pages reaches "z->nr_unmapped".

    And then you use that per-pgdat "mapped_nrpages" count to avoid:

    - moving mapped pages to inactive list (wasting the swappiness algorithm)
    - swapping out pages at shrink_list

    Those two only happen when pgdat->mapped_nrpages is zero, which
    becomes true when we go below pages_low.

    To resume, deactivation/swapout of mapped pages only happens when we
    go any zone pages_low.

    Correct?

    Now with v2.6 stock kernel, kswapd will deactivate (using vm_swappiness algorithm)
    and swapout pages between the low and high zone watermarks.

    That avoids swapping out as hard as possible until we go below pages_low.

    IMHO this might be OK for common desktop workloads where people complain
    about swap, but might be harmful for other workloads where swapping out on
    advance unused anonymous process memory is a _gain_.

    I dont understand this check on balance_pgdat (kswapd worker function):

    + /*
    + * kswapd does a light balance_pgdat() when there is less than 1/3
    + * ram free provided there is less than vm_mapped % of that ram
    + * mapped.
    + */
    + if (maplimit && sc.nr_mapped * 100 / total_memory > vm_mapped)
    + return 0;
    +

    So "if not any zone is under pages_low, and more than vm_mapped % of ram
    is mapped, bail out."

    I dont get what you're trying to achieve with this.

    >
    > >To be fair I'm just arguing, haven't really looked at the code.
    >
    > Thats cool ;)

    I still think swapout behaviour can be correctly tuned with vm_swappiness,
    and agree with Andrew on that we should not change anything in the algorithm
    if this can be tuned.

    Andrew, maybe decrease vm_swappiness to 50 on the next -mm for a test?

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Pavel Machek: "swsusp: solving build issue leads to crash on x86-64"
    Loading