Re: tbench regression in 2.6.25-rc1



On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote:
Zhang, Yanmin a écrit :
On Mon, 2008-02-18 at 12:33 -0500, Valdis.Kletnieks@xxxxxx wrote:
On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:

I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin <yanmin.zhang@xxxxxxxxx>
Could you add a comment someplace that says "refcnt wants to be on a different
cache line from input/output/ops or performance tanks badly", to warn some
future kernel hacker who starts adding new fields to the structure?
Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by
moving tclassid to different place. It looks like tclassid could also have impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin <yanmin.zhang@xxxxxxxxx>

---

--- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.000000000 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.000000000 +0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short header_len; /* more space at head required */
unsigned short trailer_len; /* space to reserve at tail */

- u32 metrics[RTAX_MAX];
- struct dst_entry *path;
-
- unsigned long rate_last; /* rate limiting for ICMP */
unsigned int rate_tokens;
+ unsigned long rate_last; /* rate limiting for ICMP */

-#ifdef CONFIG_NET_CLS_ROUTE
- __u32 tclassid;
-#endif
+ struct dst_entry *path;

struct neighbour *neighbour;
struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);

struct dst_ops *ops;
-
- unsigned long lastuse;
+
+ u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+ __u32 tclassid;
+#endif
+
+ /*
+ * __refcnt wants to be on a different cache line from
+ * input/output/ops or performance tanks badly
+ */
atomic_t __refcnt; /* client references */
int __use;
+ unsigned long lastuse;
union {
struct dst_entry *next;
struct rtable *rt_next;




I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels.

Could you please test now with 32 bits one ?
I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no difference
between pure kernel and patched kernel.

New update: On 8-core stoakley, the regression becomes 2~3% with kernel 2.6.25-rc2. On
tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 cores+hyperthreading),
the regression is still 4% with 2.6.25-rc2.

With my patch, on tigerton, almost all regression disappears. On tulsa, only about 2%
regression disappears.

So this issue is triggerred with multiple-cpu. Perhaps process scheduler is another
factor causing the issue to happen, but it's very hard to change scheduler.


Eric,

I tested your new patch in function loopback_xmit. It has no improvement, while it doesn't
introduce new issues. As you tested it on dual-core machine and got improvement, how about
merging your patch with mine?

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: RT patch acceptance
    ... judge the complexity of a design for that type of system. ... claim that you cannot judge the complexity of a kernel modification. ... Since the patch in question doesn't actually need that information to ... nanokernel's API up to date with additions to Linux's API that RT people ...
    (Linux-Kernel)
  • Re: [PATCH] Fix congestion_wait() sync/async vs read/write confusion
    ... the kernel. ... Jan Kara was able to reproduce the tiobench 2.6.30 regression, ... cc'd him and kept the patch below. ... So with the fix there's no statistically significant difference and we ...
    (Linux-Kernel)
  • Re: inline asm semantics: output constraint width smaller than input
    ... Now in this case the patch you suggest might end up hurting the end result ... The below patch is to build the kernel for x86_64, ... # Device Drivers ... # PCI IDE chipsets support ...
    (Linux-Kernel)
  • [RFC] Making percpu module variables have their own memory.
    ... Someone using the -rt patch found that one of the tracing options caused ... 64K for every CPU to cover all the per_cpu variables used in the kernel ... static void wakeup_softirqd_prio ...
    (Linux-Kernel)
  • Re: This is [Re:] How to improve the quality of the kernel[?].
    ... The -mm kernel already implements what your proposed PTS would do. ... If patch have no TS ID, ... Thus i can apply for example lguest patches and implement and test new ... How many open source projects use Bugzilla and how many use the Debian BTS? ...
    (Linux-Kernel)