[PATCH] sysfs: fix deadlock




[ Greg, please see the sysfs fix further below. ]

* Nick Piggin <npiggin@xxxxxxx> wrote:

- moved the might_sleep() check outside the in_atomic() check,

Hmm... but then it has the same failure case again in the is_preempt()
code, does it not?

I guess we should just convert that guy to either use get_user_atomic,
(which would mean implementing that for x86), or use
copy_from_user_inatomic.

i've done the v3 patch below - that seems to have passed all my testing
without any new bugs found. I've reinstated your the clear_user()
might_fault() check, plus i removed it from __[get|put]_user_size, which
the _inatomic() API variants use. That enabled me to utilize the
_inatomic() API in probe_kernel_address().

we still have the checks in put_user()/get_user() and in all the
copy_*_user() APIs, which should be strong enough. [ I havent fully
checked whether __get_user_size() might be used by some less frequent
API - if it is then that API should grow a might_fault() check. ]

i've attached the config.

at first sight it looks like a genuine bug in fs/sysfs/bin.c?

Yes, it is a real bug by the looks. bin.c takes bb->mutex under
mmap_sem when it is mmapped, and then does its copy_*_user under
bb->mutex too.

ok - second patch attached below, Greg, could you please apply? This is
for v2.6.27 too i think.

i.e. your patches are working as expected and the extended
validation mechanism is finding real bugs :-)

Yeah it's nice. I'm just hoping we don't come across one that is as
difficult to fix as prepare_write/commit_write were ;)

Here is a basic fix for the sysfs lor.

and that did the trick here - the patch with a tidied up changelog is
attached further below. [ the second patch is standalone and does not
need the first patch which is relative to tip/master ]

thanks Nick, i think this is a great addition to lockdep! It already
found two real locking bugs within a day. If you can think of any other
proactive methods to widen our lock hierarchy knowledge that would be
great to add. I think what we want is to insert knowledge about other
unlikely lock acquire events, for locks that have a historic pattern of
producing regular locking bugs.

Ingo

----------------->

From 1d18ef489509314506328b9e464dd47c24c1d68f Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@xxxxxxx>
Date: Thu, 11 Sep 2008 20:53:21 +0200
Subject: [PATCH] x86: some lock annotations for user copy paths, v3

- add annotation back to clear_user()
- change probe_kernel_address() to _inatomic*() method

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
arch/x86/lib/usercopy_32.c | 1 +
include/asm-x86/uaccess.h | 2 --
include/linux/uaccess.h | 2 +-
3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 7393152..fab5fab 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -148,6 +148,7 @@ do { \
unsigned long
clear_user(void __user *to, unsigned long n)
{
+ might_fault();
if (access_ok(VERIFY_WRITE, to, n))
__do_clear_user(to, n);
return n;
diff --git a/include/asm-x86/uaccess.h b/include/asm-x86/uaccess.h
index 39f8420..dc8edb5 100644
--- a/include/asm-x86/uaccess.h
+++ b/include/asm-x86/uaccess.h
@@ -267,7 +267,6 @@ extern void __put_user_8(void);
#define __put_user_size(x, ptr, size, retval, errret) \
do { \
retval = 0; \
- might_fault(); \
__chk_user_ptr(ptr); \
switch (size) { \
case 1: \
@@ -320,7 +319,6 @@ do { \
#define __get_user_size(x, ptr, size, retval, errret) \
do { \
retval = 0; \
- might_fault(); \
__chk_user_ptr(ptr); \
switch (size) { \
case 1: \
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index fec6dec..2062293 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -78,7 +78,7 @@ static inline unsigned long __copy_from_user_nocache(void *to,
\
set_fs(KERNEL_DS); \
pagefault_disable(); \
- ret = __get_user(retval, (__force typeof(retval) __user *)(addr)); \
+ ret = __copy_from_user_inatomic((__force typeof(retval) __user *)(addr), &(retval), sizeof(retval)); \
pagefault_enable(); \
set_fs(old_fs); \
ret; \

------------------->

From 13f37e1febb1990ae5cabc53eeee1e78198376ff Mon Sep 17 00:00:00 2001
From: Nick Piggin <npiggin@xxxxxxx>
Date: Thu, 11 Sep 2008 12:43:23 +0200
Subject: [PATCH] sysfs: fix deadlock

On Thu, Sep 11, 2008 at 10:27:10AM +0200, Ingo Molnar wrote:

and it's working fine on most boxes. One testbox found this new locking
scenario:

PM: Adding info for No Bus:vcsa7
EDAC DEBUG: MC0: i82860_check()

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.27-rc6-tip #1
-------------------------------------------------------
X/4873 is trying to acquire lock:
(&bb->mutex){--..}, at: [<c020ba20>] mmap+0x40/0xa0

but task is already holding lock:
(&mm->mmap_sem){----}, at: [<c0125a1e>] sys_mmap2+0x8e/0xc0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){----}:
[<c017dc96>] validate_chain+0xa96/0xf50
[<c017ef2b>] __lock_acquire+0x2cb/0x5b0
[<c017f299>] lock_acquire+0x89/0xc0
[<c01aa8fb>] might_fault+0x6b/0x90
[<c040b618>] copy_to_user+0x38/0x60
[<c020bcfb>] read+0xfb/0x170
[<c01c09a5>] vfs_read+0x95/0x110
[<c01c1443>] sys_pread64+0x63/0x80
[<c012146f>] sysenter_do_call+0x12/0x43
[<ffffffff>] 0xffffffff

-> #0 (&bb->mutex){--..}:
[<c017d8b7>] validate_chain+0x6b7/0xf50
[<c017ef2b>] __lock_acquire+0x2cb/0x5b0
[<c017f299>] lock_acquire+0x89/0xc0
[<c0d6f2ab>] __mutex_lock_common+0xab/0x3c0
[<c0d6f698>] mutex_lock_nested+0x38/0x50
[<c020ba20>] mmap+0x40/0xa0
[<c01b111e>] mmap_region+0x14e/0x450
[<c01b170f>] do_mmap_pgoff+0x2ef/0x310
[<c0125a3d>] sys_mmap2+0xad/0xc0
[<c012146f>] sysenter_do_call+0x12/0x43
[<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by X/4873:
#0: (&mm->mmap_sem){----}, at: [<c0125a1e>] sys_mmap2+0x8e/0xc0

stack backtrace:
Pid: 4873, comm: X Not tainted 2.6.27-rc6-tip #1
[<c017cd09>] print_circular_bug_tail+0x79/0xc0
[<c017d8b7>] validate_chain+0x6b7/0xf50
[<c017a5b5>] ? trace_hardirqs_off_caller+0x15/0xb0
[<c017ef2b>] __lock_acquire+0x2cb/0x5b0
[<c017f299>] lock_acquire+0x89/0xc0
[<c020ba20>] ? mmap+0x40/0xa0
[<c0d6f2ab>] __mutex_lock_common+0xab/0x3c0
[<c020ba20>] ? mmap+0x40/0xa0
[<c0d6f698>] mutex_lock_nested+0x38/0x50
[<c020ba20>] ? mmap+0x40/0xa0
[<c020ba20>] mmap+0x40/0xa0
[<c01b111e>] mmap_region+0x14e/0x450
[<c01afb88>] ? arch_get_unmapped_area_topdown+0xf8/0x160
[<c01b170f>] do_mmap_pgoff+0x2ef/0x310
[<c0125a3d>] sys_mmap2+0xad/0xc0
[<c012146f>] sysenter_do_call+0x12/0x43
[<c0120000>] ? __switch_to+0x130/0x220
=======================
evbug.c: Event. Dev: input3, Type: 20, Code: 0, Value: 500
warning: `sudo' uses deprecated v2 capabilities in a way that may be insecure.

i've attached the config.

at first sight it looks like a genuine bug in fs/sysfs/bin.c?

Yes, it is a real bug by the looks. bin.c takes bb->mutex under mmap_sem
when it is mmapped, and then does its copy_*_user under bb->mutex too.

Here is a basic fix for the sysfs lor.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
fs/sysfs/bin.c | 42 +++++++++++++++++++++++++++++++-----------
1 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 006fc64..66f6e58 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -61,6 +61,7 @@ read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off)
int size = dentry->d_inode->i_size;
loff_t offs = *off;
int count = min_t(size_t, bytes, PAGE_SIZE);
+ char *temp;

if (size) {
if (offs > size)
@@ -69,23 +70,33 @@ read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off)
count = size - offs;
}

+ temp = kmalloc(count, GFP_KERNEL);
+ if (!temp)
+ return -ENOMEM;
+
mutex_lock(&bb->mutex);

count = fill_read(dentry, bb->buffer, offs, count);
- if (count < 0)
- goto out_unlock;
+ if (count < 0) {
+ mutex_unlock(&bb->mutex);
+ goto out_free;
+ }

- if (copy_to_user(userbuf, bb->buffer, count)) {
+ memcpy(temp, bb->buffer, count);
+
+ mutex_unlock(&bb->mutex);
+
+ if (copy_to_user(userbuf, temp, count)) {
count = -EFAULT;
- goto out_unlock;
+ goto out_free;
}

pr_debug("offs = %lld, *off = %lld, count = %d\n", offs, *off, count);

*off = offs + count;

- out_unlock:
- mutex_unlock(&bb->mutex);
+ out_free:
+ kfree(temp);
return count;
}

@@ -118,6 +129,7 @@ static ssize_t write(struct file *file, const char __user *userbuf,
int size = dentry->d_inode->i_size;
loff_t offs = *off;
int count = min_t(size_t, bytes, PAGE_SIZE);
+ char *temp;

if (size) {
if (offs > size)
@@ -126,19 +138,27 @@ static ssize_t write(struct file *file, const char __user *userbuf,
count = size - offs;
}

- mutex_lock(&bb->mutex);
+ temp = kmalloc(count, GFP_KERNEL);
+ if (!temp)
+ return -ENOMEM;

- if (copy_from_user(bb->buffer, userbuf, count)) {
+ if (copy_from_user(temp, userbuf, count)) {
count = -EFAULT;
- goto out_unlock;
+ goto out_free;
}

+ mutex_lock(&bb->mutex);
+
+ memcpy(bb->buffer, temp, count);
+
count = flush_write(dentry, bb->buffer, offs, count);
+ mutex_unlock(&bb->mutex);
+
if (count > 0)
*off = offs + count;

- out_unlock:
- mutex_unlock(&bb->mutex);
+out_free:
+ kfree(temp);
return count;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [GIT PULL] block/splice bits for 2.6.29
    ... Fix small typo in bio.h's documentation ... This patch is wrong, in fact Vergard has noted this in patch log. ... prefer just unifying the lock ordering. ...
    (Linux-Kernel)
  • Re: [2.6.30-rc1] RCU detected CPU 1 stall
    ... Patch below attempts to fix that, but I wouldn't be a bit surprised ... mm of a separate kernel helper thread). ... Please let us know if this patch does indeed fix the bugs you've ...
    (Linux-Kernel)
  • Re: Linux 2.6.16.16
    ... qualified person doing risk management will anyways lookup the CVE. ... It is insane to be giving lease_initthe task of freeing the lock it is ... Also fix a slab leak in __setleasedue to an uninitialised return ... If you look for where this patch was discussed on lkml, ...
    (Linux-Kernel)
  • Re: [RFC][RFT] memcg fix cgroup_mutex deadlock when cpuset reclaims memory
    ... Is the reported deadlock reproducible after this patch? ... I'll test this patch and report the result tomorrow. ... This patch can fix dead lock caused by "circular lock of cgroup_mutex", ...
    (Linux-Kernel)
  • Re: Pascal family of languages is getting crowded :-)
    ... Then he must do the actual fix, and do local, basic testing to check to ... Then a collection of bugs are put together into a build. ... Perhaps the existing regression test suite ... separating out each SKU for it's own patch. ...
    (borland.public.delphi.non-technical)