Re: GART error 11 (fwd)

From: Andi Kleen (ak_at_muc.de)
Date: 05/27/04

  • Next message: Arjan van de Ven: "Re: 4k stacks in 2.6"
    To: Arthur Perry <kernel@linuxfarms.com>
    Date:	Thu, 27 May 2004 17:26:05 +0200
    
    

    Arthur Perry <kernel@linuxfarms.com> writes:

    > Here is a posting that I dropped off in RedHat's amd64-list.
    > It is a kernel related issue, so if anybody has any insight or opinion of
    > proper implementation here, please jump in!

    Machine Check Exceptions are in front of all hardware issues, not kernel
    issues. It is your CPU trying to tell you that something is wrong in the
    hardware.

    The 2.4 MCE code tends to label unrelated MCEs as "GART error" because
    of bugs in the MCE decoding functions. There is a full fix for that
    in the works.

    In some early 2.4 kernels it also managed to trigger a CPU bug
    by writing directly nb registers. This should be fixed in later
    2.4 kernels and also in SuSE SLES8-SP3.

    Best alternative is to use 2.6 which has much improved MCE handling.

    -Andi

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Arjan van de Ven: "Re: 4k stacks in 2.6"

    Relevant Pages

    • Re: x86/mce merge, integration hickup + crash, design thoughts
      ... MCE exceptions themselves cannot generally printk ... --ascii mode to decode something quirky that the kernel could have (and ... HARDWARE ERROR. ... The usage patterns i see is that admins who get an MCE crash often fail to ...
      (Linux-Kernel)
    • Re: 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
      ... > MCE should not be triggered under any circumstances unless it is a kernel ... > Its not a kernel problem, ... system auto-reset itself before reaching the BIOS afterwards... ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64
      ... goes and decodes the error/does the mapping to DIMM ... the MCE situation, i checked both the kernel and the user-space ... mcelog (the user-space tool) is a big stinking pile of poo on every ...
      (Linux-Kernel)
    • Re: MSI K8D-Master - GART error 3
      ... > MCE handler over the reference port. ... A quick google brings up this reference: ... Hat GinGin64 installer gave a kernel panic - so I wouldn't be surprised ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [PATCH] AMD64: fix mce_cpu_quirks typos
      ... > at the right mailing lists ... > The 64bit kernel uses the AGP aperture as IOMMU, ... Someone from AMD told Marc that fixes in pci-gart.c (probably related ... > MCE handler is too thorough and picks them up anyways as corrected ...
      (Linux-Kernel)