Re: [E1000-devel] BUG: bad unlock balance detected! e1000e



On Tue, Dec 9, 2008 at 3:56 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Wed, 10 Dec 2008 00:43:46 +0100
Frederik Deweerdt <frederik.deweerdt@xxxxxxxx> wrote:

On Tue, Dec 09, 2008 at 03:08:01PM -0800, Andrew Morton wrote:
On Tue, 9 Dec 2008 12:03:37 +0100
Frederik Deweerdt <frederik.deweerdt@xxxxxxxx> wrote:

It some error checking is missing in e1000e: debug contention on NVM
SWFLAG
On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
Hi

During occasional scan of message log - I've found out this BUG which
happened on Dec3 with the -rc7 from that day.
(So if it's now fixed in current git feel free to ignore :))

My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
shutdown and was actually not noticed by me...


NetworkManager: <WARN> nm_signal_handler(): Caught signal 15,
shutting down normally.
NetworkManager: <info> (eth0): now unmanaged
NetworkManager: <info> (eth0): device state change: 3 -> 1
NetworkManager: <info> (eth0): cleaning up...
NetworkManager: <info> (eth0): taking down device.

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------

(top-posting repaired. Please don't do that!!!).
Yep, sorry.

Hello Zdenek,

This could be due to 717d438d1fde94decef874b9808379d1f4523453
"e1000e: debug contention on NVM SWFLAG"
Error handling is missing from e1000_reset_hw_ich8lan so it may happen
that we don't acquire the nvm_mutex if the card times out.

Adding Thomas to CC.

yup. 2.6.27 needs fixing also.

Like this?
I don't think so, e1000_acquire_swflag_ich8lan() locks and
e1000_release_swflag_ich8lan() unlocks.

urgh, OK, I made the mistake of reading the comments.

I think it is more along the
lines of:


diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 523b971..f971b83 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -1892,7 +1892,13 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
*/
ctrl |= E1000_CTRL_PHY_RST;
}
+
ret_val = e1000_acquire_swflag_ich8lan(hw);
+ if (ret_val) {
+ hw_dbg(hw, "Failed to acquire NVM swflag");
+ return ret_val;
+ }
+
hw_dbg(hw, "Issuing a global reset to ich8lan");
ew32(CTRL, (ctrl | E1000_CTRL_RST));
msleep(20);


But I'm not sure we should cancel the ongoing reset if the card times
out...


Yes, something like that. Or something like

--- a/drivers/net/e1000e/ich8lan.c~a
+++ a/drivers/net/e1000e/ich8lan.c
@@ -1940,12 +1940,14 @@ static s32 e1000_reset_hw_ich8lan(struct
ctrl |= E1000_CTRL_PHY_RST;
}
ret_val = e1000_acquire_swflag_ich8lan(hw);
- hw_dbg(hw, "Issuing a global reset to ich8lan\n");
- ew32(CTRL, (ctrl | E1000_CTRL_RST));
- msleep(20);
+ if (!ret_val) {
+ hw_dbg(hw, "Issuing a global reset to ich8lan\n");
+ ew32(CTRL, (ctrl | E1000_CTRL_RST));
+ msleep(20);

- /* release the swflag because it is not reset by hardware reset */
- e1000_release_swflag_ich8lan(hw);
+ /* release the swflag because it is not reset by hardware reset */
+ e1000_release_swflag_ich8lan(hw);
+ }

ret_val = e1000e_get_auto_rd_done(hw);
if (ret_val) {
_


Dunno. It's e1000-developer-summoning-dance time.


Actually, if we time out trying to acquire the swflag, we still want
to reset the part because we are most likely in an unrecoverable
state.

So I would suggest the following
--- a/drivers/net/e1000e/ich8lan.c~a
+++ a/drivers/net/e1000e/ich8lan.c
@@ -1940,9 +1940,10 @@ static s32 e1000_reset_hw_ich8lan(struct
ctrl |= E1000_CTRL_PHY_RST;
}
ret_val = e1000_acquire_swflag_ich8lan(hw);
hw_dbg(hw, "Issuing a global reset to ich8lan\n");
ew32(CTRL, (ctrl | E1000_CTRL_RST));
msleep(20);
+ if (!ret_val) {
-
- /* release the swflag because it is not reset by hardware reset */
- e1000_release_swflag_ich8lan(hw);
+ /* release the swflag because it is not reset by
hardware reset */
+ e1000_release_swflag_ich8lan(hw);
+ }

Of course, we will want to add a comment to the fact that we still
want to reset the part, even if we have not acquired the lock because
we are in an unrecoverable state.

I can provide a patch in a few minutes.

--
Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: BUG: bad unlock balance detected! e1000e
    ... Frederik Deweerdt wrote: ... During occasional scan of message log - I've found out this BUG which ... [BUG: bad unlock balance detected! ... But I'm not sure we should cancel the ongoing reset if the card times ...
    (Linux-Kernel)
  • Re: BUG: bad unlock balance detected! e1000e
    ... During occasional scan of message log - I've found out this BUG which ... [BUG: bad unlock balance detected! ... debug contention on NVM SWFLAG" ... But I'm not sure we should cancel the ongoing reset if the card times ...
    (Linux-Kernel)
  • Re: Warning on migrating to ATMega8515 - eeprom problem
    ... >> Yes, probably a bug, but it is questionable if it is serious. ... >> It is generally a good practice to avoid relying on reset values. ... Conditional compilation is one way. ... You can thus run the compiled result of the same source code on both CPUs ...
    (comp.arch.embedded)
  • Re: [PATCH] sata_nv.c, dont have libata perform phy reset
    ... > bug database. ... we keep libata from performing the phy reset. ... > phy reset sometimes results in the busy bit failing to deassert. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: Login Attempts are cumulative on su-only accounts
    ... As far as I know this is not a bug; it's just the way it's always worked. ... so the unsuccessful login count is not reset. ... One way is to beat up the people who are trying to log in, and failing: ...
    (AIX-L)