Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 12 Apr 2007 16:32:35 -0700
On Thu, 12 Apr 2007 16:10:50 -0700
William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
On Tue, Apr 03, 2007 at 09:43:30PM -0500, Matt Mackall wrote:
This patch series introduces /proc/pid/pagemap and /proc/kpagemap,
which allow detailed run-time examination of process memory usage at a
page granularity.
The first several patches whip the page-walking code introduced for
/proc/pid/smaps and clear_refs into a more generic form, the next
couple make those interfaces optional, and the last two introduce the
new interfaces, also optional.
This solves a real-life problem for Oracle system monitoring software
(specifically EM). Among the tasks it must carry out is determining
per-process memory footprint of a set of cooperating tasks (i.e. Oracle
processes). RSS is inadequate for this due to page sharing; this work
provides sufficient information to determine what EM needs.
I'm still dying to see what the human-readable output from this
thing looks like.
<looks>
+ * Each entry is a pair of unsigned longs representing the
+ * corresponding physical page, the first containing the page flags
+ * and the second containing the page use count.
+ *
+ * The first 4 bytes of this file form a simple header:
+ *
+ * first byte: 0 for big endian, 1 for little
+ * second byte: page shift (eg 12 for 4096 byte pages)
+ * third byte: entry size in bytes (currently either 4 or 8)
+ * fourth byte: header size
...
+ while (count > 0) {
+ chunk = min_t(size_t, count, PAGE_SIZE);
+ i = 0;
+
+ if (pfn == -1) {
+ page[0] = 0;
+ page[1] = 0;
+ ((char *)page)[0] = (ntohl(1) != 1);
OK.
+ ((char *)page)[1] = PAGE_SHIFT;
OK.
+ ((char *)page)[2] = sizeof(unsigned long);
OK.
+ ((char *)page)[3] = KPMSIZE;
OK.
+ i = 2;
+ pfn++;
+ }
+
+ for (; i < 2 * chunk / KPMSIZE; i += 2, pfn++) {
+ ppage = pfn_to_page(pfn);
+ if (!ppage) {
+ page[i] = 0;
+ page[i + 1] = 0;
+ } else {
+ page[i] = ppage->flags;
+ page[i + 1] = atomic_read(&ppage->_count);
+ }
+ }
Not a good idea to expose raw flags in this manner - it changes at the drop
of a hat. We'd need to also expose the kernel's PG_foo-to-bitnumber
mapping to make this viable.
Not a good idea to use page->_count: page_count() will be more stable.
Otherwise OK, I guess: the interpretation of the page refcount is unlikely
to change much over time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- From: Matt Mackall
- Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- From: Nick Piggin
- Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- From: William Lee Irwin III
- Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- References:
- [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- From: Matt Mackall
- Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- From: William Lee Irwin III
- [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- Prev by Date: Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c
- Next by Date: Re: [PATCH 0/30] Use menuconfig objects
- Previous by thread: Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- Next by thread: Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups
- Index(es):