Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49



On Sun, Apr 12, 2009 at 09:06:03AM -0700, Stephen Hemminger wrote:
On Sat, 11 Apr 2009 17:34:45 -0700
"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:

On Sat, Apr 11, 2009 at 11:57:16AM -0700, Linus Torvalds wrote:


On Fri, 10 Apr 2009, Paul E. McKenney wrote:

1. Assuming that the synchronize_net() is intended to guarantee
that the new rules will be in effect before returning to
user space:

Btw, I think that's a bad assumption.

It does indeed appear to be!

The thing is, nobody can really care if the new rules are in effect or
not, because the thing you race with is not the "return to user space"
part, but the incoming packets.

And those incoming packets might have been incoming before the rules were
set up too.

So I seriously doubt you need to synchronize with any returning to user
space. What you want to synchronize with is then later actions that do
things like turning on the interface that the rules are attached to etc!

So I would suggest:

- remove the synchronize_net() entirely. Replace it with just freeing the
old rules using RCU.

- new packets will always end up seeing the new rules. That includes the
case of somebody doing "ifconfig eth0 up" that enables a new source of
packets, so there are no real security issues.

- if you enabled your network interfaces before you updated your packet
filtering rules, you already had a window where packets would come in
with the old rules, so doing a "synchronize_net()" in no way protects
against any race conditions anyway.

Am I missing something?

The issue at this point seems to be the need to get accurate snapshots
of various counters -- there are a number of Linux networking users who
need to account for every byte flowing through their systems. However,
it is also necessary to update these counters very efficiently, given
that they are updated on a per-packet basis. The current approach is
as follows:

1. Install a new set of counters.

2. Wait for a grace period to elapse.

3. At this point, we know that all subsequent counting will happen
on the new set of counters.

4. Add the value of the old set of counters to the new set of
counters.

5. Copy the old set of counters up to user space.

So we get a good snapshot in #5, while #4 ensures that we don't lose
any counts when taking future snapshots. Unfortunately, #2 hits us
with grace-period latencies on the critical path.

We are going through the following possibilities:

o Stick with the current approach, and ask people to move to
new batch-oriented interfaces. However, a 30x decrease in
performance is pretty grim, even for an old-style interface.

o Use various atomic tricks to get an immediate snapshot of the
old counters after step 1. Make step 3 use call_rcu() instead
of synchronize_rcu(), and then step 4 happens off the
critical path.

This approach moves the RCU grace period off of the critical
path, but the atomic tricks are extremely ugly on 32-bit SMP
machines. 32-bit UP machines and 64-bit machines are not
too bad, though the 32-bit UP case does add preemption-disable
overhead on the counter-update fastpath.

o Provide some sort of expedited synchronize_rcu(). This might
be able to decrease the hit from 30x down to maybe 5x.
But I might need to do this for the fast-boot folks anyway,
though I am first trying to get away with just speeding
up synchronized_rcu(). Though I was not thinking in terms
of 6x, let alone 30x.

Please note that this would not be a drop-in replacement for
synchronize_rcu(). One would use synchronize_rcu_expedited()
(or whatever) only when the system really could not get any
useful work done while the grace period was in progress.
The general approach would be to keep the whole machine busy
trying to get the grace period done as soon as possible.

Thanx, Paul

We could also try:
* per-cpu spinlock on counters (instead of synchronize_net).
When doing update, just acquire
lock on that cpu and futz with counters then. Overhead should
still be less than 2.6.29 and earlier global rwlock

This one makes a lot of sense to me. The overhead of an uncontended
lock is pretty small on most systems. This would also mean that you
don't have to actually swap the counters, correct?

* synchonize_rcu/synchronize_net is more guarantee than needed?

If you really do need to swap the counters themselves, you -might- also
need call_rcu() to dispose of them. But it should possible to do that
under the per-CPU lock instead.

* use on_each_cpu() somehow to do grace periood?

You could certainly use something like smp_call_function() to collect
the other CPUs' counter values -- just disable interrupts across the
increments for architectures that cannot atomically increment a 64-bit
value. (And it only needs to be atomic with respect to an interrupt,
not necessarily to some other CPU.)

* Add a cond_resched() into net_rx_action which might cause rx processing
to get out of rcu sooner? also in transmit packet scheduler.

This might help some, but would probably only give a few tens of percent
improvement.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49
    ... because the thing you race with is not the "return to user space" ... And those incoming packets might have been incoming before the rules were ... of various counters -- there are a number of Linux networking users who ... 32-bit UP machines and 64-bit machines are not ...
    (Linux-Kernel)
  • Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49
    ... because the thing you race with is not the "return to user space" ... And those incoming packets might have been incoming before the rules were ... of various counters -- there are a number of Linux networking users who ... 32-bit UP machines and 64-bit machines are not ...
    (Linux-Kernel)
  • Re: Question regarding using AES in CTR mode to encrypt UDP
    ... but in CTR mode I have a problem synchronising the counter of the CTR ... Reject old packets with a soft error ... synchronization of the CTR counters (assuming I still want to use CTR, ... and I need the exact same counters in order to reverse the encryption ...
    (sci.crypt)
  • Re: Cat 4000 Crazy Port Statistics.
    ... The thing here is that you have another 'opinion' on the stats. ... 2047853408 packets output, 1214833509 bytes, 0 underruns ... In summary as you state these counters are impossible. ... The switch has a bug or bugs, ...
    (comp.dcom.sys.cisco)
  • Re: Electronic voting
    ... inspection is the equivalent of counting the votes in secret, ... inspected, even can't inspect, trust it. ... A far better solution seems to be: let the machines print out a paper ... electronic counters might think twice about falsifications. ...
    (talk.politics.misc)