Re: [discuss] Re: 32-bit dma allocations on 64-bit platforms

From: Andrea Arcangeli (andrea_at_suse.de)
Date: 06/24/04

  • Next message: William Lee Irwin III: "Re: [discuss] Re: 32-bit dma allocations on 64-bit platforms"
    Date:	Thu, 24 Jun 2004 19:39:27 +0200
    To: Nick Piggin <nickpiggin@yahoo.com.au>
    
    

    On Fri, Jun 25, 2004 at 01:48:47AM +1000, Nick Piggin wrote:
    > 2.6 has the "incremental min" thing. What is wrong with that?
    > Though I think it is turned off by default.

    I looked more into it and you can leave it turned off since it's not
    going to work.

    it's all in functions of z->pages_* and those are _global_ for all the
    zones, and in turn they're absolutely meaningless.

    the algorithm has nothing in common with lowmem_reverse_ratio, the
    effect has a tinybit of similarity but the incremntal min thing is so
    weak and so bad that it will either not help or it'll waste tons of
    memory. Furthemore you cannot set a sysctl value that works for all
    machines. The whole thing should be dropped and replaced with the fine
    production quality lowmem_reserve_ratio in 2.4.26+

    (the only broken thing of lowmem_reserve_ratio is that it cannot be
    tuned, not even at boottime, a recompile is needed, but that's fixable
    to tune it at boot time, and in theory at runtime too, but the point is
    that no dyanmic tuning is required with it)

    Please focus on this code of 2.4:

            /*
             * We don't know if the memory that we're going to allocate will
             * be freeable or/and it will be released eventually, so to
             * avoid totally wasting several GB of ram we must reserve some
             * of the lower zone memory (otherwise we risk to run OOM on the
             * lower zones despite there's tons of freeable ram on the
             * higher zones).
             */
            zone_watermarks_t watermarks[MAX_NR_ZONES];

    typedef struct zone_watermarks_s {
            unsigned long min, low, high;
    } zone_watermarks_t;

            class_idx = zone_idx(classzone);

            for (;;) {
                    zone_t *z = *(zone++);
                    if (!z)
                            break;

                    if (zone_free_pages(z, order) >
    z->watermarks[class_idx].low) {
                            page = rmqueue(z, order);
                            if (page)
                                    return page;
                    }
            }

                    zone->watermarks[j].min = mask;
                    zone->watermarks[j].low = mask*2;
                    zone->watermarks[j].high = mask*3;
                    /* now set the watermarks of the lower zones in the "j"
     * classzone */
                    for (idx = j-1; idx >= 0; idx--) {
                            zone_t * lower_zone = pgdat->node_zones + idx;
                            unsigned long lower_zone_reserve;
                            if (!lower_zone->size)
                                    continue;

                            mask = lower_zone->watermarks[idx].min;
                            lower_zone->watermarks[j].min = mask;
                            lower_zone->watermarks[j].low = mask*2;
                            lower_zone->watermarks[j].high = mask*3;

                            /* now the brainer part */
                            lower_zone_reserve = realsize /
    lower_zone_reserve_ratio[idx];
                            lower_zone->watermarks[j].min +=
    lower_zone_reserve;
                            lower_zone->watermarks[j].low +=
    lower_zone_reserve;
                            lower_zone->watermarks[j].high +=
    lower_zone_reserve;

                            realsize += lower_zone->realsize;
                    }

    The 2.6 algorithm controlled by the sysctl does nothing similar to the
    above.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: William Lee Irwin III: "Re: [discuss] Re: 32-bit dma allocations on 64-bit platforms"

    Relevant Pages

    • Re: PATCH - raise max_anon limit
      ... > It would be better to lose the sysctl and do it all dynamically. ... > Simple, a bit crufty, doesn't release memory. ... > Each entry in the radix tree can be a bitmap (radix-tree.c should ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [PATCH] Fix argument checking in sched_setaffinity
      ... The main reason is that when you get an EINVAL for some other ... Alternative would be the sysctl and strict check again. ... because complex interfaces tend to ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: 2.6.6-mm1
      ... If you don't set the sysctl there is no change in system behaviour. ... > When did shm segments come into the play? ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.)
      ... > What's the problem with the sysctl? ... I can change the patch ... feature so we can't be sure that someone somewhere isn't depending ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: PATCH: report NGROUPS_MAX via a sysctl (read-only)
      ... > if the kernel limit is actually smaller. ... Now that the sysctl is in, it's a very tiny patch to make ngroups_max ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)