Re: posix-cpu-timers revamp



Sorry for the delay.

Please take a look and let me know what you think. In the meantime I'll
be working on a similar patch to 2.6-head that has optimizations for
uniprocessor and two-CPU operation, to avoid the overhead of the percpu
functions when they are unneeded.

My mention of a 2-CPU special case was just an off-hand idea. I don't
really have any idea if that would be optimal given the tradeoff of
increaing signal_struct size. The performance needs be analyzed.

disappeared entirely and the arm_timer() routine merely fills
p->signal->it_*_expires from timer->it.cpu.expires.*. The
cpu_clock_sample_group_locked() loses its summing loops, using the
the shared structure instead. Finally, set_process_cpu_timer() sets
tsk->signal->it_*_expires directly rather than calling the deleted
rebalance routine.

I think I misled you about the use of the it_*_expires fields, sorry.
The task_struct.it_*_expires fields are used solely as a cache of the
head of cpu_timers[]. Despite the poor choice of the same name, the
signal_struct.it_*_expires fields serve a different purpose. For an
analogous cache of the soonest timer to expire, you need to add new
fields. The signal_struct.it_{prof,virt}_{expires,incr} fields hold
the setitimer settings for ITIMER_{PROF,VTALRM}. You can't change
those in arm_timer. For a quick cache you need a new field that is
the sooner of it_foo_expires or the head cpu_timers[foo] expiry time.

The shared_utime_sum et al names are somewhat oblique to anyone who
hasn't just been hacking on exactly this thing like you and I have.
Things like thread_group_*time make more sense.

There are now several places where you call both shared_utime_sum and
shared_stime_sum. It looks simple because they're nicely encapsulated.
But now you have two loops through all CPUs, and three loops in
check_process_timers.

I think what we want instead is this:

struct task_cputime
{
cputime_t utime;
cputime_t stime;
unsigned long long schedtime;
};

Use one in task_struct to replace the utime, stime, and sum_sched_runtime
fields, and another to replace it_*_expires. Use a single inline function
thread_group_cputime() that fills a sum struct task_cputime using a single
loop. For the places only one or two of the sums is actually used, the
compiler should optimize away the extra summing from the loop.

Don't use __cacheline_aligned on this struct type itself, because most of
the uses don't need that. When using alloc_percpu, you can rely on it to
take care of those needs--that's what it's for. If you implement a
variant that uses a flat array, you can use a wrapper struct with
__cacheline_aligned for that.


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [PATCH 02/15] ftrace: add do_for_each_ftrace_rec and while_for_each_ftrace_rec
    ... To iterate over all the functions that dynamic trace knows about ... There are several duplications of these loops in ftrace.c. ... struct ftrace_page *pg; ... struct dyn_ftrace *rec; ...
    (Linux-Kernel)
  • Re: [RFC][PATCH] Remove cgroup member from struct page (v3)
    ... Test result of your patch is. ... loops per sec. ... It seems "execl" test is affected by footprint and cache hit rate than other ... 's struct page is 56 bytes. ...
    (Linux-Kernel)
  • Re: synchronization demo
    ... two "for" loops at the bottom. ... the Locker struct and BransMonitor class. ... this pattern for synchronization really isn't very useful anyway. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: defining a function
    ... hagii wrote: ... here model is a struct which involves parameter values of a model. ... Or defining different function for each parameter set cab ... matlab offers you the function LENGTH to help you define the upper bound of the loops. ...
    (comp.soft-sys.matlab)