Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.
- From: Frank Mayhar <fmayhar@xxxxxxxxxx>
- Date: Thu, 06 Mar 2008 11:04:03 -0800
On Tue, 2008-03-04 at 20:08 -0800, Roland McGrath wrote:
check_process_timers can look just like check_thread_timers, but
consulting the shared fields instead of the per-thread ones for both the
clock accumulators and the timers' expiry times. Likewise, arm_timer
only has to set signal->it_*_expires; process_timer_rebalance goes away.
Okay, my understanding of this is still evolving, so please (please!)
correct me when I get it wrong. I take what you're saying to mean that,
first, run_posix_cpu_timers() only needs to be run once per thread
group. It _sounds_ like it should be checking the shared fields rather
than the per-task fields for timer expiration (in fact, the more I think
about it the more sure I am that that's the case).
The old process_timer_rebalance() routine was intended to distribute the
remaining ticks across all the threads, so that the per-task fields
would cause run_posix_cpu_timers() to run at the appropriate time. With
it checking the shared fields this becomes no longer necessary.
Since the shared fields are getting all the ticks, this will work for
per-thread timers as well.
The arm_timers() routine, instead of calling posix_timer_rebalance(),
should just directly set signal->it_*_expires to the expiration time,
e.g.:
switch (CPUCLOCK_WHICH(timer->it_clock)) {
default:
BUG();
case CPUCLOCK_VIRT:
if (!cputime_eq(p->signal->it_virt_expires,
cputime_zero) &&
cputime_lt(p->signal->it_virt_expires,
timer->it.cpu.expires.cpu))
break;
p->signal->it_virt_expires = timer->it.cpu.expires.cpu;
goto rebalance;
case CPUCLOCK_PROF:
if (!cputime_eq(p->signal->it_prof_expires,
cputime_zero) &&
cputime_lt(p->signal->it_prof_expires,
timer->it.cpu.expires.cpu))
break;
i = p->signal->rlim[RLIMIT_CPU].rlim_cur;
if (i != RLIM_INFINITY &&
i <= cputime_to_secs(timer->it.cpu.expires.cpu))
break;
p->signal->it_prof_expires = timer->it.cpu.expires.cpu;
goto rebalance;
case CPUCLOCK_SCHED:
p->signal->it_sched_expires = timer->it.cpu.expires.sched;
break;
}
If you do all that then the time spent in run_posix_cpu_timers should
not be affected at all by the number of threads. The only "walking the
timer lists" that happens is popping the expired timers off the head of
the lists that are kept in ascending order of expiry time. For each
flavor of timer, there are n+1 steps in the "walk" for n timers that
have expired. So already no costs here should scale with the number of
timers, just the with the number of timers that all expire at the same time.
It's still probably worthwhile to defer processing to a workqueue
thread, though, just because it's still a lot to do at interrupt. I'll
probably end up trying it both ways.
One thing that's still unclear to me is, if there were only one run of
run_posix_cpu_timers() per thread group per tick, how would per-thread
timers be serviced?
The simplifications I described above will obviously greatly improve
your test case (many threads and with some process timers expiring
pretty frequently). We need to consider and analyze the other kinds of
cases too. That is, cases with a few threads (not many more than the
number of CPUs); cases where no timer is close to expiring very often.
The most common cases, from one-thread cases to one-million thread
cases, are when no timers are going off before next Tuesday (if any are
set at all). Then run_posix_cpu_timers always bails out early, and none
of the costs you've seen become relevant at all. Any change to what the
timer interrupt handler does on every tick affects those cases too.
These are all on the roadmap, and in fact the null case should already
be covered. :-)
As I mentioned in my last message, my concern about this originally was
with the SMP cache/lock effects of multiple CPUs touching the same
memory in signal_struct on every tick (which presumably all tend to
happen simultaneously on all the CPUs). I'd insist that we have
measurements and analysis as thorough as possible of the effects of
introducing that frequent/synchronized sharing, before endorsing such
changes. I have a couple of guesses as to what might be reasonable ways
to mitigate that. But it needs a lot of measurement and wise opinion on
the low-level performance effects of each proposal.
I've given this some thought. It seems clear that there's going to be
some performance penalty when multiple CPUs are competing trying to
update the same field at the tick. It would be much better if there
were cacheline-aligned per-cpu fields associated with either the task or
the signal structure; that way each CPU could update its own field
without competing with the others. Processing in run_posix_cpu_timers
would still be constant, although slightly higher for having to consult
multiple fields instead of just one. Not one per thread, though, just
one per CPU, a much smaller and fixed number.
--
Frank Mayhar <fmayhar@xxxxxxxxxx>
Google, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- posix-cpu-timers revamp
- From: Roland McGrath
- posix-cpu-timers revamp
- References:
- Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.
- From: Roland McGrath
- Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.
- From: Frank Mayhar
- Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.
- From: Roland McGrath
- Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.
- Prev by Date: Re: [RFC] JBD ordered mode rewrite
- Next by Date: build #407 issue for v2.6.25-rc4-56-gd7fe321 : "sas_bios_param" [drivers/scsi/mvsas.ko] undefined!
- Previous by thread: Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.
- Next by thread: posix-cpu-timers revamp
- Index(es):
Relevant Pages
|