Re: [RFC PATCH 0/6] Convert all tasklets to workqueues



Hello!

the context-switch argument i'll believe if i see numbers. You'll
probably need in excess of tens of thousands of irqs/sec to even be able
to measure its overhead. (workqueues are driven by nice kernel threads
so there's no TLB overhead, etc.)

It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)

According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.

Formally looking awful, this result is positive: tasklets are almost
never used in hot paths. I am sure only about one such place: acenic
driver uses tasklet to refill rx queue. This generates not more than
3000 tasklet schedules per second. Even on P4 it pure workqueue schedule
will eat ~1% of bare cpu ticks.

Anyway, all the uses of tasklet should be verified:

The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)

Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.

The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.


... workqueues are also possibly much more scalable

I cannot figure out - scale in what direction? :-)


(percpu workqueues
are easy without changing anything in your code but the call where you
create the workqueue).

I do not see how it is related to scalability. And the statement
does not even make sense. The patch already uses per-cpu workqueue
for tasklets, otherwise it would be a disaster: guaranteed cpu non-locality.

Tasklet is single thread by definition and purpose. Those a few places
where people used tasklets to do per-cpu jobs (RCU f.e.) exist just because
they had troubles with allocating new softirq. Workqueues do not make
any difference: tasklet is not workqueue, it is work_struct, and you
still will have to allocate array of per-cpu work structs, everything
remains the same.


the only remaining argument is latency:

You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • tty driver: TASKLET or WORKQUEUE?
    ... Which concept is better for tty driver: TASKLET or WORKQUEUE? ... I want to make ISR much faster and call after that timing consume function. ...
    (Linux-Kernel)
  • [PATCH/RFC v2 0/6] Convert stop_machine to use a workqueue
    ... This is version 2 which converts stop_machine to a workqueue based ... This patch series would allow to convert s390 to the generic IPI interface. ... cpus are running in an interrupt handler with interrupts disabled. ... use a workqueue instead of kernel threads to synchronize all cpus. ...
    (Linux-Kernel)
  • Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
    ... thread around to do all the cpus, and duplicate much of the workqueue ... wrong thing into keventd is ... easy to use) will lock up work_on_cpuusers. ... worth adding another great pile of kernel threads for! ...
    (Linux-Kernel)
  • Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
    ... worth adding another great pile of kernel threads for! ... remove potential clashes with generic kevent workqueue ... Which was fine except it didnt take into account the interaction with the ...
    (Linux-Kernel)
  • [PATCH 0/6] Lazy workqueues
    ... A lazy workqueue works like a normal workqueue, ... When work is queued on a lazy workqueue for a CPU ... that doesn't have a thread running, it will be placed on the core CPUs ... I am now down to 280 kernel threads on one of my test ...
    (Linux-Kernel)