Re: regression introduced by - timers: fix itimer/many thread hang



Peter Zijlstra píše v Po 24. 11. 2008 v 10:33 +0100:
On Mon, 2008-11-24 at 09:46 +0100, Petr Tesarik wrote:
Peter Zijlstra píše v Ne 23. 11. 2008 v 15:24 +0100:
[...]
The current (per-cpu) code is utterly broken on large machines too, I've
asked SGI to run some tests on real numa machines (something multi-brick
altix) and even moderately small machines with 256 cpus in them grind to
a halt (or make progress at a snails pace) when the itimer stuff is
enabled.

Furthermore, I really dislike the per-process-per-cpu memory cost, it
bloats applications and makes the new per-cpu alloc work rather more
difficult than it already is.

I basically think the whole process wide itimer stuff is broken by
design, there is no way to make it work on reasonably large machines,
the whole problem space just doesn't scale. You simply cannot maintain a
global count without bouncing cachelines like mad, so you might as well
accept it and do the process wide counter and bounce only a single line,
instead of bouncing a line per-cpu.

Very true. Unfortunately per-process itimers are prescribed by the
Single Unix Specification, so we have to cope with them in some way,
while not permitting a non-privileged process a DoS attack. This is
going to be hard, and we'll probably have to twist the specification a
bit to still conform to its wording. :((

Feel like reading the actual spec and trying to come up with a creative
interpretation? :-)

Yes, I've just spent a few hours doing that... And I feel very
depressed, as expected.

I really don't think it's a good idea to set a per-process ITIMER_PROF
to one timer tick on a large machine, but the kernel does allow any
process to do it, and then it can even cause hard freeze on some
hardware. This is _not_ acceptable.

What is worse, we can't just limit the granularity of itimers, because
threads can come into being _after_ the itimer was set.

Currently it has jiffy granularity, right? And jiffies are different
depending on some compile time constant (HZ), so can't we, for the sake
of per-process itimers, pretend to have a 1 minute jiffie?

That should be as compliant as we are now, and utterly useless for
everybody, thereby discouraging its use, hmm? :-)

I've got a copy of IEEE Std 10003.1-2004 here, and it suggests that this
should be generally possible. In particular, the description for
itimer_set says:

Implementations may place limitations on the granularity of timer values. For
each interval timer, if the requested timer value requires a finer granularity
than the implementation supports, the actual timer value shall be rounded up
to the next supported value.

However, it seems to be vaguely linked to CLOCK_PROCESS_CPUTIME_ID,
which is defined as:

The identifier of the CPU-time clock associated with the process making a
clock ( ) or timer*( ) function call.

POSIX does not specify whether this clock is identical to the one used
for setitimer et al., or not, but it seems logical that it should. Then,
the kernel should probably return the coarse granularity in
clock_getres(), too.

I tried to find out how this is currently implemented in Linux, and it's
broken. How else. :-/

1. clock_getres() always returns a resolution of 1ns

This is actually good news, because it means that nobody really cares
whether the actual granularity is greater, so I guess we can safely
return any bogus number in clock_getres().

What about using an actual granularity of NR_CPUS*HZ, which should be
safe for any (at least remotely) sane usage?

2. clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts) returns -EINVAL

Should not happen. Looking further into it, I think this line in
cpu_clock_sample_group():

switch (which_clock) {

should look like a similar line in cpu_clock_sample(), ie:

switch (CPUCLOCK_WHICH(which_clock)) {

Shall I send a patch?

Petr Tesarik

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: do pop up boxes stop all events?
    ... You can "get out of there" in just a few microseconds and the next event will still not occur on the next 10 millisecond interval on most machines. ... The fact is that the underlying sytem which triggers the VB Timer does not really get a very high priority and the OS doesn't really consider such timings to be important. ... Private Declare Function timeBeginPeriod _ ... Private Sub Form_Unload ...
    (microsoft.public.vb.general.discussion)
  • Re: [patch 6/6] x86: add c1e aware idle function
    ... This excludes those machines from high ... To work nicely with C1E enabled machines we use a separate idle ... This allows us to do timer broadcasting ... Does the boot CPU ...
    (Linux-Kernel)
  • Re: regression introduced by - timers: fix itimer/many thread hang
    ... sane number of threads (one per CPU), but on a very large machine (1024 CPUs ... The application set per-process CPU profiling with an interval of 1 ... asked SGI to run some tests on real numa machines (something multi-brick ... I basically think the whole process wide itimer stuff is broken by ...
    (Linux-Kernel)
  • Re: regression introduced by - timers: fix itimer/many thread hang
    ... sane number of threads (one per CPU), but on a very large machine (1024 CPUs ... asked SGI to run some tests on real numa machines (something multi-brick ... I basically think the whole process wide itimer stuff is broken by ... Unfortunately per-process itimers are prescribed by the ...
    (Linux-Kernel)
  • Re: 50 hz timer motor
    ... on the Brandt BR1000) where the cycling motor modules, ... On that type of timer, the timer motor is usually wired through a ... > part of the fill - and on some machines, the initial part of the water ... The pressostat (water level switch) and/or the thermostats are ...
    (sci.electronics.repair)