[PATCH] e1000e: prevent concurrent access to NVRAM
- From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
- Date: Fri, 3 Oct 2008 01:42:14 +0200 (CEST)
On Thu, 2 Oct 2008, Olaf Kirch wrote:
On Thursday 02 October 2008 16:28:42 Jiri Kosina wrote:
15:50:52 linux-pr0e kernel: WARNING: at drivers/net/e1000e/ich8lan.c:424 e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]()
15:50:52 linux-pr0e kernel: e1000e mutex contention. Owned by pid 4162
15:50:52 linux-pr0e kernel: Call Trace:
15:50:52 linux-pr0e kernel: [<ffffffff8020e41e>] show_trace_log_lvl+0x41/0x58
15:50:52 linux-pr0e kernel: [<ffffffff80493716>] dump_stack+0x69/0x6f
15:50:52 linux-pr0e kernel: [<ffffffff8023ee54>] warn_slowpath+0xb4/0xdc
15:50:52 linux-pr0e kernel: [<ffffffffa022ce2e>] e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa02317ba>] e1000e_read_phy_reg_igp+0x19/0x64 [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa02319f8>] e1000e_phy_has_link_generic+0x50/0xcc [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa02306f9>] e1000e_check_for_copper_link+0x24/0x86 [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa0236982>] e1000_watchdog_task+0x5c/0x5eb [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffff8024ecdb>] run_workqueue+0xa4/0x14c
15:50:52 linux-pr0e kernel: [<ffffffff8024ee5b>] worker_thread+0xd8/0xe7
15:50:52 linux-pr0e kernel: [<ffffffff80251fe5>] kthread+0x47/0x73
15:50:52 linux-pr0e kernel: [<ffffffff8020d7a9>] child_rip+0xa/0x11
Looks like the e1000 watchdog racing with some dhclient activity (upping the interface).
I just noticed that the driver actually uses register pages. So it looks like it's
possible to have something like this without the mutex:
process A selects page A
process B selects page B
process A writes to register at offset A'
So we may end up writing to the wrong register. I think I heard Vojtech mention
that the e1000e also has a register based interface to erase/rewrite the NVM
programmatically. Do we know at which offsets these registers live?
Linus,
can you please apply the patch below which prevents the concurrent
access to the NVRAM. It triggered the trace which Olaf reported above.
I worked out that patch on Sept 24th and it triggered a couple of
problems in the e1000e code right away. These have been addressed by
Jesse and are part of the patch series he posted in the last days.
Still, all we have in mainline is some "hopefully bug preventing"
patch which does not address the root cause at all.
The confirmed bugs where the nvram acquire code was called
concurrently are still in your tree and the prevention patch along
with the resulting bugfixes are stuck in some obscure intel QA
process.
Please apply at least the bug prevention patch below.
Thanks,
tglx
---
Subject: e1000e prevent concurrent access and debug contention on NVM SWFALG
Date: Wed, 24 Sep 2008 00:45:08 -0700
From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: jesse.brandeburg@xxxxxxxxx
---
drivers/net/e1000e/ich8lan.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
Index: linux-2.6/drivers/net/e1000e/ich8lan.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000e/ich8lan.c
+++ linux-2.6/drivers/net/e1000e/ich8lan.c
@@ -382,6 +382,9 @@ static s32 e1000_get_variants_ich8lan(st
return 0;
}
+static DEFINE_MUTEX(nvm_mutex);
+static pid_t nvm_owner = -1;
+
/**
* e1000_acquire_swflag_ich8lan - Acquire software control flag
* @hw: pointer to the HW structure
@@ -395,6 +398,15 @@ static s32 e1000_acquire_swflag_ich8lan(
u32 extcnf_ctrl;
u32 timeout = PHY_CFG_TIMEOUT;
+ WARN_ON(preempt_count());
+
+ if (!mutex_trylock(&nvm_mutex)) {
+ WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
+ nvm_owner);
+ mutex_lock(&nvm_mutex);
+ }
+ nvm_owner = current->pid;
+
while (timeout) {
extcnf_ctrl = er32(EXTCNF_CTRL);
extcnf_ctrl |= E1000_EXTCNF_CTRL_SWFLAG;
@@ -409,6 +421,8 @@ static s32 e1000_acquire_swflag_ich8lan(
if (!timeout) {
hw_dbg(hw, "FW or HW has locked the resource for too long.\n");
+ nvm_owner = -1;
+ mutex_unlock(&nvm_mutex);
return -E1000_ERR_CONFIG;
}
@@ -430,6 +444,9 @@ static void e1000_release_swflag_ich8lan
extcnf_ctrl = er32(EXTCNF_CTRL);
extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
ew32(EXTCNF_CTRL, extcnf_ctrl);
+
+ nvm_owner = -1;
+ mutex_unlock(&nvm_mutex);
}
/**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [PATCH] e1000e: prevent concurrent access to NVRAM
- From: Jesse Brandeburg
- Re: [PATCH] e1000e: prevent concurrent access to NVRAM
- References:
- Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG
- From: Jiri Kosina
- Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG
- From: Olaf Kirch
- Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG
- Prev by Date: Re: [PATCH] Improve buffered streaming write ordering
- Next by Date: Re: [PATCH] printk.caller
- Previous by thread: RE: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG
- Next by thread: Re: [PATCH] e1000e: prevent concurrent access to NVRAM
- Index(es):
Relevant Pages
|