Re: tbench regression in 2.6.25-rc1



On Mon, 2008-02-18 at 12:33 -0500, Valdis.Kletnieks@xxxxxx wrote:
On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:

I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin <yanmin.zhang@xxxxxxxxx>

Could you add a comment someplace that says "refcnt wants to be on a different
cache line from input/output/ops or performance tanks badly", to warn some
future kernel hacker who starts adding new fields to the structure?
Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by
moving tclassid to different place. It looks like tclassid could also have impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin <yanmin.zhang@xxxxxxxxx>

---

--- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.000000000 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.000000000 +0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short header_len; /* more space at head required */
unsigned short trailer_len; /* space to reserve at tail */

- u32 metrics[RTAX_MAX];
- struct dst_entry *path;
-
- unsigned long rate_last; /* rate limiting for ICMP */
unsigned int rate_tokens;
+ unsigned long rate_last; /* rate limiting for ICMP */

-#ifdef CONFIG_NET_CLS_ROUTE
- __u32 tclassid;
-#endif
+ struct dst_entry *path;

struct neighbour *neighbour;
struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);

struct dst_ops *ops;
-
- unsigned long lastuse;
+
+ u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+ __u32 tclassid;
+#endif
+
+ /*
+ * __refcnt wants to be on a different cache line from
+ * input/output/ops or performance tanks badly
+ */
atomic_t __refcnt; /* client references */
int __use;
+ unsigned long lastuse;
union {
struct dst_entry *next;
struct rtable *rt_next;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: missing madvise functionality
    ... takes significant time, ... Could you please add this patch and see if it helps on your machine? ... In case find_vmahits the cache, we dont need to access the begining of mm_struct. ... struct rb_root mm_rb; ...
    (Linux-Kernel)
  • Re: [PATCH] vmscan: improve reclaim throuput to bail out patch
    ... vmscan bail out patch move nr_reclaimed variable to struct scan_control. ... indirect access can easily happen cache miss. ... this patch introduce temporal local variable. ...
    (Linux-Kernel)
  • Re: Possible memory leak via slub kmem_cache_create
    ... The patch below fixes this issue for proto_register. ... kmem_cache_name is no longer guaranteed to return the same pointer ... the module calling this has to destroy the cache before getting unloaded. ... struct request_sock *req); ...
    (Linux-Kernel)
  • Re: tbench regression in 2.6.25-rc1
    ... bisect located below patch. ... creating a rather ugly alignment hole in struct dst. ... Above patch changes the cache line alignment, especially member __refcnt. ... If moving tclassid before metrics, or just don't move tclassid, the performance isn't ...
    (Linux-Kernel)
  • [PATCH] x86_64 : vsyscall_gtod_data diet and vgettimeofday() fix
    ... This patch should be applied after x86_64: fix vtimevsyscall ... Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer interrupt). ... Instead of copying a whole struct clocksource, ... This patch fixes one oddity in vgettimeofday: It can returns a timeval with tv_usec = 1000000. ...
    (Linux-Kernel)