Re: hash table sizes

From: Martin J. Bligh (mbligh_at_aracnet.com)
Date: 11/26/03

  • Next message: Martin J. Bligh: "Re: 2.6.0-test10-mm1"
    Date:	Wed, 26 Nov 2003 08:17:41 -0800
    To: William Lee Irwin III <wli@holomorphy.com>
    
    

    >> However, I'm curious as to why this crashes X, as I don't see how this
    >> code change makes a difference in practice. I didn't think we had any i386
    >> NUMA with memory holes between nodes at the moment, though perhaps the x440
    >> does.
    >> M.
    >> PS. No, I haven't tested my rephrasing of your patch either.
    >
    > mmap() of framebuffers. It takes the box out, not just X. There are
    > holes just below 4GB regardless. This has actually been reported by
    > rml and some others.
    >
    > False positives on pfn_valid() result in manipulations of purported page
    > structures beyond the bounds of actual allocated pgdat->node_mem_map[]'s,
    > potentially either corrupting memory or accessing areas outside memory's
    > limits (the case causing oopsen).

    OK. But the hole from 3.75 - 4GB you're referring to doesn't seem to
    fall under that definition.

    1) It still has a valid pfn, though the backing memory itself isn't there.
    2) It's covered by node_start_pfn ... node_start_pfn + node_spanned_pages,
            which is what your patch tests for. Maybe not if you have = 4GB
            per node? Will be if you have more than that.

    I agree that pfn_valid absolutely has to be correct. The current definition
    was, I thought, correct unless we have holes *between* nodes. I was under
    the impression that no box we had uses that setup, but I guess we might
    do - were you seeing this on x440 or NUMA-Q?

    NUMA-Q looks like this:

    BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
    BIOS-e820: 0000000000100000 - 00000000e0000000 (usable)
    BIOS-e820: 00000000fec00000 - 00000000fec09000 (reserved)
    BIOS-e820: 00000000ffe80000 - 0000000100000000 (reserved)
    BIOS-e820: 0000000100000000 - 0000000400000000 (usable)

    which should create mem_map from 0 - 4GB contigious for node 0, AFAICS
    (I believe struct pages are created for reserved areas still). Maybe
    the srat stuff for x440 does something else.

    Your patch is correct, I just don't see that it'll fix the X problem.

    > diff -prauN linux-2.6.0-test10/include/asm-i386/mmzone.h pfn_valid-2.6.0-test10/include/asm-i386/mmzone.h
    > --- linux-2.6.0-test10/include/asm-i386/mmzone.h 2003-11-23 17:31:56.000000000 -0800
    > +++ pfn_valid-2.6.0-test10/include/asm-i386/mmzone.h 2003-11-26 01:40:36.000000000 -0800
    > @@ -84,14 +84,30 @@ extern struct pglist_data *node_data[];
    > + __zone->zone_start_pfn; \
    > })
    > #define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
    > +
    > +static inline int pfn_to_nid(unsigned long);
    > /*
    > - * pfn_valid should be made as fast as possible, and the current definition
    > - * is valid for machines that are NUMA, but still contiguous, which is what
    > - * is currently supported. A more generalised, but slower definition would
    > - * be something like this - mbligh:
    > - * ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) )
    > + * pfn_valid must absolutely be correct, regardless of speed concerns.
    > */
    > -#define pfn_valid(pfn) ((pfn) < num_physpages)
    > +static inline int pfn_valid(unsigned long pfn)
    > +{
    > + u8 nid = pfn_to_nid(pfn);
    > + pg_data_t *pgdat;
    > +
    > + if (nid < MAX_NUMNODES)
    > + pgdat = NODE_DATA(nid);
    > + else
    > + return 0;
    > +
    > + if (!pgdat)
    > + return 0;
    > + else if (pfn < pgdat->node_start_pfn)
    > + return 0;
    > + else if (pfn - pgdat->node_start_pfn >= pgdat->node_spanned_pages)
    > + return 0;
    > + else
    > + return 1;
    > +}
    >
    > /*
    > * generic node memory support, the following assumptions apply:

    Cool, thanks. I'll try runtesting it.

    M.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Martin J. Bligh: "Re: 2.6.0-test10-mm1"

    Relevant Pages

    • Re: [RFC] sys_punchhole()
      ... unused memory back to the host OS (initially via a hotplug memory interface ... madvisehas special behavior to work like punch anyway. ... On the other hand, if you're going to support holes at all, having to recreate ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: JCL
      ... The refrigerator sized cabinet not only housed this core memory, ... Speaking of paper tape, anyone ever have to bootstrap a machine by entering ... Another new fangled invention by IBM: 96 column cards! ... Not the standard rectangular 80 column ones with rectangular holes ...
      (comp.sys.hp.mpe)
    • Re: [PATCH 2/2] cciss: disable dma prefetch for P600
      ... falling off into one the holes on IPF and AMD. ... It doesn't happen on Proliant because the last 4kB of memory is ... prefetching was walking off the end of real mmeory and into the AGP region ... There is a bug in the DMA engine that that may result in prefetching ...
      (Linux-Kernel)
    • Re: 2.4.23aa2 (bugfixes and important VM improvements for the high end)
      ... > triggering the zone-normal shortage in 32G, ... Also bear in mind that as memory gets tight, ... Without shared pagetables, ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: log-buf-len dynamic
      ... Wake up, dude. ... disk and memory. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)