Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree



On Tue, Jul 04, 2006 at 11:09:47AM +0100, Alan Cox wrote:
Ar Maw, 2006-07-04 am 11:23 +0200, ysgrifennodd Andi Kleen:
Regarding your buzzwords: I don't think mcelog is in any way
less "manageable" or "consistent" than EDAC.

Its chip specific rather than generalised so you need awareness of it.

You mean the final output?

I guess it would be possible to add a generic output format
for memory errors in mcelog, but it's not clear you can always get
the same information from different chipsets.


Hmm, i haven't checked, but my understanding was that the newer
Intel chipsets all forwarded the memory errors as machine
check anyways.

Quite a few still in use do not. We also have no idea where the future

New ones? Would surprise me.

All the world is not x86.

The rest of the world either doesn't do significant error handling
(embedded, lowend) or has its own similar to mcelog error handling machine
check systems (POWER, IA64)

Ok Sparc, pa-risc, old SGI mips are left out currently but I'm sure the
maintainers will attack that eventually if there is need.

We don't have a generic interface for logging some of the other errors
(like PCI-E errors), but I don't see EDAC solving that. In some ways
it's understandable because there is no generic PCI-E error handling
code at all yet.

EDAC solves that for the PCI bus side. It's only solving the logging
side not the "ok it exploded, now what" question - although there are
some unrelated IBM patches in that area.

Yes some of that might be useful still for legacy systems.

In the future it should be more standardized with the standard x86
machine check architecture and standardized PCI Express advanced
error handling. So generic drivers should do the heavy lifting.

I'm not disputing it is still useful for some old systems, it just
doesn't seem to be the right part forward for new ones.

Is there work going on to hook up the old EDAC drivers for PCI errors to
the new error handling?


The ecc code predates the MCE bits by years. The re-doing occurred
rather earlier. Rather more useful would be to get the common interface

Earlier than the x86-64 machine check code?

Linux 1.2 I believe, certainly by 2.0

Doubtful you wrote a K8 error handler at this time frame ;-)


Giving a consistent sysfs interface is a bit harder, but I suppose one
could change the code to provide pseudo banks for enable/disable too.
However that would be system specific again, so a default "all on/all off"
policy might be quite ok.

I think we need the basic consistent sysfs case. Whether that is

What should i do?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Error Raising and Memory in VB (general question)
    ... what is your standard to hanlde errors? ... memory releasing, recordset closing etc.. ... Windows-API style of error handling. ... MsgBoxLog is a more interesting thing, it is declared almost exactly as the ...
    (microsoft.public.vb.general.discussion)
  • Re: Error handling
    ... >I've seen that standard C functions use -1 for error, modifying errno, ... my error handling and bypassing is not very ... For those functions which return an error indication, ... specifies what that indication is on a function by function basis. ...
    (comp.lang.c)
  • Re: Error.pm and try/catch/throw
    ... fishfry wrote: ... > Is Error.pm the standard way to do error handling these days? ... It seems to me that the try/catch/throw method is an ...
    (comp.lang.perl.misc)
  • Re: security enhacement to C runtime library (XXX_s)
    ... it's not related to the standard C. ... never specifies error handling in a detailed way. ... to make some library functions safe, ... Probably link error with libraries that use different implementations? ...
    (comp.std.c)
  • Re: VBA code to close an Access user notification message
    ... assumed it was the standard "action was canceled" message. ... Access message box that tells me the 2nd report was cancelled. ... error handling code in the button's Click event. ... Sub yourbutton_Click ...
    (microsoft.public.access.modulesdaovba)