RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

From: Nguyen, Tom L (tom.l.nguyen_at_intel.com)
Date: 03/17/05

  • Next message: Benjamin Herrenschmidt: "RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)"
    Date:	Thu, 17 Mar 2005 10:53:46 -0800
    To: "Benjamin Herrenschmidt" <benh@kernel.crashing.org>
    
    

    On Wednesday, March 16, 2005 7:20 PM Benjamin Herrenschmidt wrote:
    >> What mechanism (message??) is used to perform the bus and/or link
    >> level reset? For PCI Express the reset is performed by the upstream
    >> port driver. My API takes this into account. Are you assuming the
    PCI
    >> device on the bus does the reset or will there be a PCI bus driver
    that
    >> will do the reset? How will the PCI error handling code initiate a
    >> reset?
    >
    >The "caller", that is the error management framework. I'm defining the
    >API at the driver level, not the implementation at the core level.
    >
    >For example, on IBM pSeries with PCI-Express, we will probably not have
    >an AER driver. This will be all dealt by the firmware which will mimmic
    >that to the existing EEH error management. We'll have the same API to
    do
    >the reset that we have today for resetting a slot.

    We decided to implement PCI Express error handling based on the PCI
    Express specification in a platform independent manner. This allows any
    platform that implements PCI Express AER per the PCI SIG specification
    can take advantage of the advanced features, much like SHPC hot-plug or
    PCI Express hot-plug implementations.

    >You may have noticed in general that I didn't either define who is
    >callign those callbacks. It's all implicit that this is done by
    platform
    >error management code. For example, on ppc64, even the recovery step
    >requires action from the platform since the slot has been physically
    >isolated. After we have notified all drivers with the "error detected"
    >callback, if we decide we can try the "recover" step (all drivers
    >returned they could try it and we decided the error wasn't too fatal)
    we
    >will call the firmware to re-enable IOs on the slot and call the
    >"recover" step.

    For PCI Express the endpoint device driver can take recovery action on
    its own, depending on the nature of the error so long as it does not
    affect the upstream device. This can include endpoint device resets.
    We expect the driver to do this upon error notification, if possible.
    In PCI Express since the driver will have the most knowledge regarding
    the error it will have the best ability to do device dependent recovery
    and IO retry. If its recovery fails then the AER driver will ask the
    upstream device driver to perform the link reset. Since this is more of
    a side effect an explicit call to recover is not necessary. However, we
    understand and agree that it is needed to support the general error
    recovery cases for PCI.

    To support the AER driver calling an upstream device to initiate a reset
    of the link we need a specific callback since the driver doing the reset
    is not the driver who got the error. In the case of general PCI this
    could be useful if a PCI bus driver were available to support the
    callback for a bridge device. This would also support specific error
    recovery calls to reset an endpoint adapter. We need a call to request
    a driver to perform a reset on a link or device.

    Thanks,
    Long
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Benjamin Herrenschmidt: "RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)"

    Relevant Pages

    • [PATCH] PCI Error Recovery: documentation
      ... Various PCI bus errors can be signaled by newer PCI controllers. ... a reset sequence. ... current error recovery proposal. ... +A driver doesn't have to implement all of these callbacks. ...
      (Linux-Kernel)
    • Re: [PATCH] PCI Error Recovery: documentation
      ... Various PCI bus errors can be signaled by newer PCI controllers. ... a reset sequence. ... current error recovery proposal. ... +A driver doesn't have to implement all of these callbacks. ...
      (Linux-Kernel)
    • [PATCH 15/42]: Documentation: PCI Error Recovery
      ... PCI Error Recovery: documentation patch ... Various PCI bus errors can be signaled by newer PCI controllers. ... a reset sequence. ... +"non-aware" driver, behaviour on these is platform dependant. ...
      (Linux-Kernel)
    • [PATCH 6/22] ppc64: PCI Error Recovery: documentation patch
      ... PCI Error Recovery: documentation patch ... Various PCI bus errors can be signaled by newer PCI controllers. ... a reset sequence. ... +"non-aware" driver, behaviour on these is platform dependant. ...
      (Linux-Kernel)
    • [PATCH]: Documentation: Updated PCI Error Recovery
      ... PCI error handling doc. ... PCI Error Recovery ... -errors, and to be notified of, and respond to, a reset sequence. ... +including multiple instances of a device driver on multi-function ...
      (Linux-Kernel)