Re: regression introduced by - timers: fix itimer/many thread hang



Dne Friday 07 of November 2008 11:29:04 Peter Zijlstra napsal(a):
(fwiw your email doesn't come across properly, evo refuses to display
them, there's some mangling of headers which makes it think there's an
attachment)

On Thu, 2008-11-06 at 15:52 -0800, Frank Mayhar wrote:
On Thu, 2008-11-06 at 16:08 +0100, Peter Zijlstra wrote:
On Thu, 2008-11-06 at 09:03 -0600, Christoph Lameter wrote:
On Thu, 6 Nov 2008, Peter Zijlstra wrote:
Also, you just introduced per-cpu allocations for each
thread-group, while Christoph is reworking the per-cpu allocator,
with one unfortunate side-effect - its going to have a limited size
pool. Therefore this will limit the number of thread-groups we can
have.

Patches exist that implement a dynamically growable percpu pool
(using virtual mappings though). If the cost of the additional
complexity / overhead is justifiable then we can make the percpu pool
dynamically extendable.

Right, but I don't think the patch under consideration will fly anyway,
doing a for_each_possible_cpu() loop on every tick on all cpus isn't
really healthy, even for moderate sized machines.

I personally think that you're overstating this. First, the current
implementation walks all threads for each tick, which is simply not
scalable and results in soft lockups with large numbers of threads.
This patch fixes a real bug. Second, this only happens "on every tick"
for processes that have more than one thread _and_ that use posix
interval timers. Roland and I went to some effort to keep loops like
the on you're referring to out of the common paths.

In any event, while this particular implementation may not be optimal,
at least it's _right_. Whatever happened to "make it right, then make
it fast?"

Well, I'm not thinking you did it right ;-)

While I agree that the linear loop is sub-optimal, but it only really
becomes a problem when you have hundreds or thousands of threads in your
application, which I'll argue to be insane anyway.

This is just not true. I've seen a very real example of a lockup with a very
sane number of threads (one per CPU), but on a very large machine (1024 CPUs
IIRC). The application set per-process CPU profiling with an interval of 1
tick, which translates to 1024 timers firing off with each tick...

Well, yes, that was broken, too, but that's the way one quite popular FORTRAN
compiler works...

Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: regression introduced by - timers: fix itimer/many thread hang
    ... Patches exist that implement a dynamically growable percpu pool (using ... doing a for_each_possible_cpuloop on every tick on all cpus isn't ... While I agree that the linear loop is sub-optimal, ... But with your new scheme it'll be a problem regardless of how many ...
    (Linux-Kernel)
  • Re: [RFC PATCH] x86:Use cpu_khz for loops_per_jiffy calculation
    ... Some cpus do one loop per tick, some do two loops per tick, and ... there are probably weird cpus, ... (cesky, pictures) ...
    (Linux-Kernel)
  • Re: question about thread scheduling
    ... I doubt you will need to mess with the system tick to get what ... I will try what you suggested, the reason that I didn't use the sleep ... If the NN run in a different thread as the control loop ... Sleepputs your thread to sleep for 3 timer ticks and ...
    (microsoft.public.windowsce.platbuilder)
  • Re: question about thread scheduling
    ... whatsoever to tamper with the timer tick. ... If the NN run in a different thread as the control loop ... least* 3 ms and because Sleep is synchonized on timer ticks, ...
    (microsoft.public.windowsce.platbuilder)
  • Re: Larkin, Power BASIC cannot be THAT good:
    ... separate arrays might still be faster - SIMD vector instructions are ... Branch prediction and speculative execution on these CPUs is so good ... that they only mispredict on the final termination before loop exit. ... SIMD was slowest of all at 2.288 ...
    (sci.electronics.design)