Re: [PATCH] AMD Thermal Interrupt Support



On Dec 30, 2007 10:39 AM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
"Russell Leidich" <rml@xxxxxxxxxx> writes:

Not sure you have addressed any of my feedback; don't see many changes.

I emailed you on 12/7/2007 and never heard back, so I assumed that you
had no further issues. Anyway, thanks for getting back to me.


When you repost stuff can you please add a changelog or if you decide
to not address some review comment say why at least.

Also the patch changelog description is missing anyways?

Sorry. I'll add a robust description with the next revision of the patch.

In general you seem to have a lot (too many?) of low level comments,
but no high level "strategy" comment anywhere what this code actually
tries to do. I find the high level comments usually more useful.


I tried to add high-level comments at the top of each function. But I
can add more.


Biggest issue I raised is still not addressed:

+ /*
+ * If any of the northbridges has PCI ID 0x1103, then its thermal
+ * hardware suffers from an erratum which prevents this code from
+ * working, so abort.
+ */
+ for (nb_num = 0; nb_num < num_k8_northbridges; nb_num++) {
+ if ((k8_northbridges[nb_num]->device) == 0x1103)
+ goto out;
+ }

AFAIK that's all K8s so the code will never work on them.

Well, yes, this is Barcelona-only at the moment (and in all
likelihood, will extend to some future CPUs). Indeed, as far as my
testing has determined, there is no stepping of K8s which properly
implements thermal throttling. Even Rev F has a crippling erratum.


amd_cpu_callback:

- if (cpu >= NR_CPUS)
- goto out;
-

that change seems to be unrelated cleanup. Such patches should be always separate.

My buddy at Google suggested that I might as well fix this while I'm
tearing up the code. But OK, I'll remove it. I may or may not submit
a further patch.

Same with the rename.

I disagree here. The original code was exclusively focussed on
setting some MCE-related "threshold". With my changes, it's more
generic. "amd_" might not be the most appropriate prefix, but
"threshold_" certainly is not.

+ printk(KERN_CRIT "CPU 0x%x: Thermal monitoring not "
+ "functional.\n", cpu);

Why is that KERN_CRIT? Does not seem that critical to me.

So what this message really says is: "The OS cannot hook the thermal
interrupt because it has been hijacked or misconfigured by the BIOS.
Therefore, the throttling MCEs which you might naturally expect to see
on other Barcelona systems will not occur on this one. Therefore, if
your cooling policy relies on these MCEs (bad idea, but legal), it
will fail, potentially resulting in fire or the loss of user data."
For example, if you're expecting to be warned at 50C that your CPU has
tripped the throttling threshold, you may never receive this warning,
and therefore may never turn on the fan. Just because the CPU itself
may automatically shut down at 100C does not mean that the system
itself can withstand CPU temperatures this high, so fire is a remote
possibility. If that's not CRIT, then tell me what level is
appropriate, and I'll change it. To Valdis' point: the code is only
checking here for whether or not the BIOS has preempted the OS'
ability to hook the interrupt; there is no way for the code to
determine whether or not the sensor is actually functional, however
any client code which relies on its production of MCEs would fail, as
I just explained.

I find it is quite common to do smp_call_function_single in CPU up/down callbacks.
It would be better to fix that generically in the high level code (provide
a new callback that is guaranteed to run on the target CPU) than to always reimplement
it.

I agree, but it sounds like that should be the subject of another
patch which touches lots of other components.

The erratum number/part number should be documented and the kernel ought to print
why it didn't initialize thermal on that hardware.

I don't think there's a need for this, because 0x1103 came before
Barcelona; therefore, we can just consider this as a
Barcelona-and-later feature.

You're technically racy against parallel cpu hotplug happening from initrd
(which already runs during initcall -- yes that is a deathtrap)
although that is typically hard to fix.

Can you elaborate? I'm assuming that I still need to fix this in
order to get the patch accepted, no?

thermal_apic_init_allowed seems like a hack. A separate notifier would
be cleaner.

A variable is always lighter than a notifier. I'm just trying to make
sure that if a CPU comes online before I'm ready to hook the thermal
interrupt, that it does not attempt to do so prematurely. I'm not
sure what a notifier would do, other than set
thermal_apic_init_allowed anyway.

PCI is already initialized before normal initcall, otherwise pretty much
all drivers would break.

I'll change the comment. I still want the convenience of a process
context, so I plan to keep the late_initcall().

mce_thermal.c seems to be just unnecessary to me. It would be cleaner
to just keep the separate own handlers, especially since they do quite
different things anyways. Don't mesh together what is quite different.

As I mentioned to Andrew Morton, this is not easily avoidable without
some gross runtime CPUID hack. Specifically, how do you handle a
kernel build which supports both AMD and Intel thermal throttling,
wherein you don't know which CPU -- if either -- is present until
runtime? The root of the problem is that both architectures share the
same APIC vector, but implement throttling in incompatible ways.

Also "cpu_specific_smp_thermal_interrupt_callback_t" is definitely too long.

I'll delete some characters to make it more obscure and Linux-like.

--
Russell Leidich
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Setup code crashes my old 486 box
    ... CPU detect code we already have). ... Would be cool to have this patch one merged soon :-) ... Attached is your patch with my little build fix. ... +extern struct cpu_features cpu; ...
    (Linux-Kernel)
  • Re: linux-next: Tree for June 13: IO APIC breakage on HP nx6325
    ... On Wednesday, 18 of June 2008, Thomas Gleixner wrote: ... Fix below. ... The main symptom is that CPU loads are computed incorrectly (I got X using 126% ... Reverting the above-mentioned patch fixes those problems. ...
    (Linux-Kernel)
  • Re: [PATCH] fix rcu vs hotplug race
    ... I believe Gautham's fix at http://lkml.org/lkml/2008/6/27/9 is ... this is the patch i picked up: ... Dhaval Giani reported this warning during cpu hotplug stress-tests: ...
    (Linux-Kernel)
  • Re: [PATCH]: PCI: GART iommu alignment fixes [v2]
    ... The cpu return address and the DMA bus master address are both ... AMD IOMMU currently also violates this ... I will send a patch to fix that there too. ...
    (Linux-Kernel)
  • Re: RT patch acceptance (scheduler)
    ... > lock up the CPU in IRQ mode for human-perceptible time, ... For non-DMA IDE access data copies are CPU driven ... which can create tons of latency problems for that case. ... I suggest that you read the patch for the answer to softirq ...
    (Linux-Kernel)