Re: Solving suspend-level confusion

From: Benjamin Herrenschmidt (benh_at_kernel.crashing.org)
Date: 07/31/04

  • Next message: Willy Tarreau: "PATCH-2.4: MTU fix for tulip driver"
    To: David Brownell <david-b@pacbell.net>
    Date:	Sun, 01 Aug 2004 07:09:28 +1000
    
    

    On Sun, 2004-08-01 at 00:23, David Brownell wrote:

    > So suspend-to-RAM more or less matches PCI D3hot, and
    > suspend-to-DISK matches PCI D3cold. If those power states
    > were passed to the device suspend(), the disk driver could act
    > appropriately. In my observation, D3cold was never passed
    > down, it was always D3hot.

    Not really, they really have nothing to do with the PCI state
    at all. Besides, the difference between PCI D3 hot and D3 cold
    is dependent on what happens on the mobo after the chip gets
    set to D3...

    > These look to me like "wrong device-level suspend state" cases.

    No. The driver interface provide a semantic that indicates in what
    state the system is going to, it's driver policy to translate that into
    appropriate actions, eventually involving some bus power state (or not,
    for some archs, we'll haev no choice but have some driver level arch
    hacks in there anyway, like some macs wanting video chips to be in D2
    state etc...)

    > > USB is another example. Typically, suspend-to-RAM wants to do a bus
    > > suspend, eventually enabling remote wakeup on some devices, and expects
    > > to recover the bus on wakeup, while suspend-to-disk is roughtly
    > > equivalent to a full shutdown & reconnect on wakeup.
    >
    > Same thing: an HCD could do the right thing if it was told to go
    > into D1, D2, or D3hot (supports USB suspend) vs D3cold (doesn't).

    And your whole PM code would suddently become PCI specific ...

    > Though the PM core doesn't cooperate at all there. Neither the
    > suspend nor the resume codepaths cope well with disconnect
    > (and hence device removal), the PM core self-deadlocks since
    > suspend/resume outcalls are done while holding the semaphore
    > that device_pm_remove() needs, ugh.

    That's an old problem, I don't know how to fix this one best in USB,
    I've always had problem with the whole PM semaphore for that reasons
    among others but I'll let Greg & Patrick find a solution there.

    > And FWIW Greg's now merged the CONFIG_USB_SUSPEND code
    > into his tree, as experimental ... so now all the relevant integration
    > issues can start to get sorted out. Eventually, that PM core deadlock
    > can get fixed. And remote wakeup needs work ... x86 machines need
    > a way to tell ACPI to pay attention to PME# wakeups (which Len says
    > is in some recent ACPI patch), and maybe the PPC/Mac platforms
    > will need something similar.
    >
    >
    > > > The problem I'm clear on is with PCI suspend, which some
    > > > earlier driver model PM changes goofed up. It's trying to
    > > > pass a system state to driver suspend() methods that are
    > > > expecting a value appropriate for pci_set_power_state().
    > > > You're proposing to fix that by changing the call semantics,
    > > > while I'd rather just see the call site fixed.
    > >
    > > No, I don't agree. It's a driver policy to decide what PCI state
    > > to use based on the system suspend level.
    >
    > You've not persuaded me on that point at all ...
    >
    > Consider: device PM calls aren't only made as part of changing
    > a "system suspend level". Minimally, there's the ability to suspend
    > a single device through sysfs. And in general, we need to be able
    > to do that directly ... one device suspending **must** be able to
    > trigger another one suspending, without assuming that every
    > device on the system is suspending to the same state.
    >
    > As for example, suspending a USB flash storage device must
    > be able to suspend its subsidiary storage and block devices,
    > without necessarily suspending the network link on an adjacent
    > hub port. Or a USB port on a camera (or cell phone) that must
    > act as either host or peripheral (USB OTG) ... at most one of those
    > controllers should be active at a time.

    And who resumes it ? Or you just lockup as soon as your VM try to
    swap in from the flash ? If it self-wakes up then it's driver internal
    policy. Again, the PM interface exposed to the kernel and userland has,
    imho, nothing to do with the actualy low level bus states.

    > Consider a system PM policy including "suspend all idle devices".
    > With N devices supporting only "on" and "suspend" states, that's
    > something like 2^N system suspend levels. Or more; most
    > devices support more than two suspend states (add at least
    > "off", plus often light weight suspend states). And N is system-specific,
    > so there's no way all those system states can be given small
    > integer values ... much less fit into a "u32 state".

    And ? that's not the point. "standby" is defined as a kernel PM state
    that could be used to implement that. It's per-driver policy to decide
    how to react to "standby" of the HW can only do on/off (which usually
    turns into go to "off" if the transition back to "on" is fast, or do
    nothing if not).

    There's also one thing we haven't dealt with is since queues are blocked
    etc..., there need to be some explicit action waking things up. We don't
    yet have provisions for devices themselves to trigger wakeup, either of
    the whole tree, or themselves individually, based on activity. things like
    USB remote wakeup usually hook through the firmware/motherboard to trigger
    the system wakeup, but all of this is useless for such a standby state
    where we actually want a pending request to the disk, for example, to
    wakeup the whole subsystem.

    Ben.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Willy Tarreau: "PATCH-2.4: MTU fix for tulip driver"

    Relevant Pages

    • Re: [linux-usb-devel] USB deadlock after resume
      ... usb subsystem in the uvc driver, ... If I load the driver, and do not access the device after suspending ... (I'm looking for races in the uvc driver at the moment). ... The current state I revealed is that after suspend if the video node ...
      (Linux-Kernel)
    • Re: USB suspend issues
      ... your driver has to send down the wait-wake power IRP ... docs and USB samples driver sources for details on these mechanisms). ... If you don't want USB selective suspend, ...
      (microsoft.public.development.device.drivers)
    • Re: Totally broken PCI PM calls
      ... And we need partial tree suspend for that. ... cold most of the time as the system will remove power). ... driver must refuse suspend if it can't do it. ... So the driver decision of what state to enter into and how to wakeup ...
      (Linux-Kernel)
    • Re: [PATCH] Remove process freezer from suspend to RAM pathway
      ... activity is what driver suspend() is supposed to do. ... Things like USB has to use the ... controller to sleep and go to sleep ourselves. ...
      (Linux-Kernel)
    • [RFC][PATCH 3/3] PM: Suspend/hibernation debug documentation update
      ... it is necessary to suspend and resume a fully ... functional system with this driver loaded. ... +several times, preferably several times in a row, and separately for hibernation ... -c) Compile the driver directly into the kernel and try the STD in the test mode. ...
      (Linux-Kernel)