Re: [PATCH]cpuset: add new API to change cpuset top group's cpus



... the point is, we
don't need a new interface to force a cpu idle. Hotplug does that.

Furthermore, we should not want anything outside of that, either the cpu
is there available for work, or its not -- halfway measures don't make
sense.

Furthermore, we already have power aware scheduling which tries to
aggregate idle time on cpu/core/packages so as to maximize the idle time
power savings. Use it there.

Some context...

In the past, server room power and thermal issues were handled
either by spending too much money to provision power and
thermals for theoretical worst case, or by abruptly shutting off
servers when hard limits were reached.

Going forward, platforms are getting smarter, measuring how
much power is drawn from the power supply, measuring the room
thermals etc. so that real dollars can be saved by deploying
systems that exceed the theoretical worst case if the power
and thermal limits are enforced.

So if server approaches a budget, the platform
will notify the OS to limit its P-states, and limit its T-states
in order to draw less power.

If that is not sufficient, the platform will ask us to take
processors off-line. These are not processors that are otherwise idle
-- those are already saving as much power as they can --
these are processors that are fully utilized.

So power-aware scheduling is moot here, this isn't the
partially idle case, this is the fully utilized case.

If power draw continues to be too high, the platform
will simply ask us to take more processors off line.

If this dance doesn't reduce power below that required,
the platform will be shut off.

So it is sufficient to simply not schedule cpu burners
on the 'idled' processor. Interrupts should generally
not matter -- and if they do, we'll end up simply idling
an additional processor.

Besides, a hot removed cpu will do a dead loop halt, which isn't power saving
efficient. To make hot removed cpu enters deep C-state is in whish list for a
long time, but still not available. The acpi_processor_idle is a module, and
cpuidle governor potentially can't handle offline cpu.

Then fix that hot-unplug idle loop. I agree that the hlt thing is silly,
and I've no idea why its still there, seems like a much better candidate
for your efforts than this.

CONFIG_HOTPLUG_CPU has been problematic in the past.
It does more than what we need here, so we thought
a lighter-weight and lower-latency method that simply
didn't schedule to the idled cpu would suffice.

Personally, I don't think that CONFIG_HOTPLUG_CPU should exist,
taking processors on and off-line should be part of CONFIG_SMP.

A while back when I selected CONFIG_HOTPLUG_CPU from ACPI && SMP,
there was a torrent of outrage that it infringed on user's right's
to save that additional 18KB of memory that CONFIG_HOTPLUG_CPU
includes that SMP does not...

We are fixing the hotplug-unplug idle loop, but there
turns out to be some issues with it related to idle
processors with interrupts disabled that don't actually
get down into the deep C-states we request:-(

So this is why you see a patch for a "halfway measure",
it does what is necessary, and does nothing more.

-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH]cpuset: add new API to change cpuset top groups cpus
    ... I have tried few load balancer hacks to evacuate cores but not a solid ... where the word "power" is incorrectly used in the thread above) ... aggregate idle time on cpu/core/packages so as to maximize the idle time ... other major issue for using cpu hotplug for this purpose. ...
    (Linux-Kernel)
  • Re: About Power Manager
    ... me what is the condition that the system will enter the IDLE mode? ... Depends on the capability of the cpu. ... If I move the CPU into idle mode, I will reduce the CPU clock for power ... Is it OK for reducing CPU clock in the OEMIdle? ...
    (microsoft.public.windowsce.platbuilder)
  • Re: About Power Manager
    ... me what is the condition that the system will enter the IDLE mode? ... "Dean Ramsier" wrote: ... Depends on the capability of the cpu. ... If I move the CPU into idle mode, I will reduce the CPU clock for power ...
    (microsoft.public.windowsce.platbuilder)
  • Re: 2.6.25.x: Wrong CPU frequency (cpufreq table) with p4-clockmod
    ... which means slower fans, which means less noise. ... idle uses *EXACTLY* the same amount of power as p4-clockmod slower speed and idle, the only power difference would be that if you were using p4-clockmod to slow down the cpu when it had an active running process. ...
    (Linux-Kernel)
  • C# .Net Equivalent of NtQuerySystemInformation ?
    ... What I need to be able to detect is when the CPU or CPUs of a system are idle ... or no more than 10% busy. ... Platform or C#? ...
    (microsoft.public.dotnet.languages.csharp)