RE: [PATCH 1/2] NET: Multiple queue network device support



* Waskiewicz Jr, Peter P <peter.p.waskiewicz.jr@xxxxxxxxx>
2007-03-09 11:25
+ }
+ } else {
+ /* We're not a multi-queue device. */
+ spin_lock(&dev->queue_lock);
+ q = dev->qdisc;
+ if (q->enqueue) {
+ rc = q->enqueue(skb, q);
+ qdisc_run(dev);
+ spin_unlock(&dev->queue_lock);
+ rc = rc == NET_XMIT_BYPASS
+ ?
NET_XMIT_SUCCESS : rc;
+ goto out;
+ }
+ spin_unlock(&dev->queue_lock);

Please don't duplicate already existing code.

I don't want to mix multiqueue and non-multiqueue code in
the transmit
path. This was an attempt to allow MQ and non-MQ devices
to coexist
in a machine having separate code paths. Are you suggesting to
combine them? That would get very messy trying to
determine what type
of lock to grab (subqueue lock or dev->queue_lock), not to mention
grabbing the
dev->queue_lock would block multiqueue devices in that same
codepath.

The piece of code I quoted above is the branch executed if
multi queue is not enabled. The code you added is 100%
identical to the already existing enqueue logic. Just execute
the existing branch if multi queue is not enabled.


I totally agree this is identical code if multiqueue isn't enabled.
However, when multiqueue is enabled, I don't want to make all network
drivers implement the subqueue API just to be able to lock the queues.
So the first thing I check is netif_is_multiqueue(dev), and if it
*isn't* multiqueue, it will run the existing code. This way both
non-multiqueue devices don't have to have any knowledge of the MQ API.

This is another attempt to keep the two codepaths separate.
The only
way I see of combining them is to check netif_is_multiqueue()
everytime I need to grab a lock, which I think would be excessive.

The code added is 100% identical to the existing code, just
be a little smarter on how you do the ifdefs.

An example could be an on-board adapter isn't multiqueue, and an
expansion card you have in your system is. If I handle multiqueue being
on with just ifdef's, then I could use the expansion card, but not the
on-board adapter as-is. The on-board driver would need to be updated to
have 1 queue in the multiqueue context, which is what I tried to avoid.

Your modified qdisc_restart() expects the queue_lock to
be locked,
how can this work?

No, it doesn't expect the lock to be held. Because of the multiple
queues, enqueueing and dequeueing are now asynchronous, since I can
enqueue to queue 0 while dequeuing from queue 1. dev->queue_lock
isn't held, so this can happen. Therefore the
spin_trylock() is used
in this dequeue because I don't want to wait for someone to finish
with that queue in question (e.g. enqueue working), since that will
block all other bands/queues after the band in question. So if the
lock isn't available to grab, we move to the next band. If
I were to
wait for the lock, I'd serialize the enqueue/dequeue
completely, and
block other traffic flows in other queues waiting for the lock.

The first thing you do in qdisc_restart() after dequeue()ing
is unlock the sub queue lock. You explicitely unlock it
before calling qdisc_run() so I assume dequeue() is expected
to keep it locked. Something doesn't add up.

That's the entire point of this extra locking. enqueue() is going to
put an skb into a band somewhere that maps to some queue, and there is
no way to guarantee the skb I retrieve from dequeue() is headed for the
same queue. Therefore, I need to unlock the queue after I finish
enqueuing, since having that lock makes little sense to dequeue().
dequeue() will then grab *a* lock on a queue; it may be the same one we
had during enqueue(), but it may not be. And the placement of the
unlock of that queue is exactly where it happens in non-multiqueue,
which is right before the hard_start_xmit().

I concede that the locking model is complex and seems really nasty, but
to truly separate queue functionality from one another, I see no other
feasible option than to run locking like this. I am very open to
suggestions.


BTW, which lock serializes your write access to
qdisc->q.qlen? It used to be dev->queue_lock but that is
apparently not true for multi queue.


This is a very good catch, because it isn't being protected on the
entire qdisc right now for PRIO. However, Chris Leech pointed out the
LINK_STATE_QDISC_RUNNING bit is serializing things at the qdisc_run()
level, so that's probably protecting this counter right now. I'm
looking at pushing that down into the qdisc, but at the same time I can
think of something to serialize these stats containers for the qdisc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages


Loading