Re: Power Management with rootfs on SDMMC.
- From: Andreas Mohr <andi@xxxxxxxx>
- Date: Fri, 2 Jan 2009 13:21:22 +0100
On Fri, Jan 02, 2009 at 12:21:48PM +0100, Pierre Ossman wrote:
On Fri, 2 Jan 2009 11:21:52 +0100
Andreas Mohr <andi@xxxxxxxx> wrote:
There have been long threads on mobile phone and netbook related forums about issues
with seemingly "any slightly advanced use whatsoever" of partitions on SD cards.
As you may notice, you only get egg on your face when you suspend, so
it's really just the single problem. Granted, it's still a big one.
The problem being that I (just like many other users) am trying to suspend
"all the time" (my God-Given Right ;), with issues popping up "all the time"
(Intel VC switching, microcode module, ath5k, and SD slots, just to name
all resume issues - now mostly working - on one single machine recently).
And I'm just fed up with it, sorry to have to put it that bluntly.
IMHO in this strongly increasingly netbook- and mobile phone-enabled world it's
a bloody shame that:
- we have a hanging suspend/resume on an SD rootfs (often the only way of
achieving serious Linux use on a mobile phone!)
I take it this is without CONFIG_MMC_UNSAFE_RESUME.
Indeed (and I admittedly haven't even done any .28 tests yet about the
previous observations of suspend hangs and resume hangs
and partition corruption).
But one of my items was that CONFIG_MMC_UNSAFE_RESUME itself seems a
pretty inflexible and _hard-wired-selectable_ workaround measure anyway.
The fundamental problem is that we have no way of detecting if a card
was removed during suspend, meaning we cannot guarantee that we'll
return the hardware to the upper layers in the same state it was
before the suspend.
There are two improvements that can be made here:
- Don't power down the card during suspend. This eats more power and
might not be supported on all systems, but it allows us to detect any
removal. This has been on my todo list for ages, but I haven't found
any time to implement it (or even test if I have any systems that
might support it).
While this would improve things, it seems to be the second-best solution only,
especially since this probably requires properly working removal
notification for _every_ controller type.
- Have upper layers handle removal detection. E.g. in the common case
of rootfs, the filesystem driver verifies that the storage is in the
same state when it resumes as it was when it suspended. This requires
a lot of work though as AFAIK there is no suspend functionality in
either the block layer or the VFS.
To me this seems to be the clearly preferred method.
(CC'd VFS, already pondered before whether I should do this but decided not to yet)
- we lose partition mounts due to full device re-probing instead of re-using the
same minor device ID after resume
This is a block layer issue, and I don't know if it's fixable.
Basically the problem is that someone is keeping the resources
associated with the pre-suspend block device pinned in memory. When the
post-suspend block device is created, it cannot reuse the device IDs
since they are still in use.
I thought so, but someone would need to get to the bottom of this
and figure out a way to get a nice suspend routines support/workaround.
- installing a swap partition on an SD card and then resuming can easily
go as far as __even completely corrupting__ the entire SD card partitioning
plus first partition (corrupts first 1kB of the card: both table and partition)
People then immediately resort to a non-helpful "Don't Do This, Ever" reply
(using swap partition on SD and suspend, see http://dev.laptop.org/ticket/6532#comment:10),
Hmm... I was under the impression that they got this fixed nice and
proper. Perhaps comment 34 should be sent to lkml and/or added to the
kernel bugzilla.
Right, #34 seems to describe pretty much what I think should be done
(keep things powered-down, then resume and compare with existing remembered
media id and revive old device handle in case it's actually same card).
("media id" above preferrably being a generic kernel concept of a media id
mechanism supported for all sorts of different media that a controller
may allow the kernel to support).
As a side note, I'm voicing a "me too" of not being too happy
to see people hard-coding timeouts there to try to "fix" this issue
instead of directly trying to come up with a synchronized signalling method
to fix this race there.
Am I right in thinking that if this is fixed properly, it would be the
CONFIG_MMC_UNSAFE_RESUME way of handling things, just in a sufficiently safe
manner? (notwithstanding user stupidity, i.e. hard removal of cards)
(i.e. CONFIG_MMC_UNSAFE_RESUME would then just be made default?)
Or... hmm... perhaps CONFIG_MMC_UNSAFE_RESUME actually would already
work for me entirely with my PCIE hotplug controller
in case its driver already provides reliably timed controller reinit
/ media re-detection after resume...
Anyway, the general thinking here _has_ to be:
if a mounted card remains in the slot during suspend, then it _should_ get
re-assigned properly, and if it has been removed despite not being
unmounted, then after resume the kernel should actively discard all references
(and throw a warning or some such).
And having a special CONFIG_MMC_UNSAFE_RESUME isn't really helpful here AFAICS,
VFS (and all related layers) should be able to handle this on its own
in its entirety, and if it's not able to do this
then it is to be considered very buggy and ought to be fixed.
But this is all common wisdom anyway I'd think, someone would have to actually
implement things to correctly work this way.
I actually thought of digging into this myself some time, but as opposed
to libata UDMA issues or WLAN LED support it's way too problematic
to tackle for me since it's said to be deep in VFS lands and debugging on
this measly machine would additionally take ages (2 hours in case one
needs an entire kernel build), plus limited time.
Thanks for your comments!
Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- Power Management with rootfs on SDMMC.
- From: Sriram V
- Re: Power Management with rootfs on SDMMC.
- From: Andreas Mohr
- Re: Power Management with rootfs on SDMMC.
- From: Pierre Ossman
- Power Management with rootfs on SDMMC.
- Prev by Date: Re: [BUG] Regression in v2.6.28 introduced by: 'USB: skip Set-Interface(0) if already in altsetting 0'
- Next by Date: Re: PATCH [0/3]: Simplify the kernel build by removing perl.
- Previous by thread: Re: Power Management with rootfs on SDMMC.
- Next by thread: Re: Power Management with rootfs on SDMMC.
- Index(es):
Relevant Pages
|