Re: Bug: fio traps into kernel without exiting because futex has a deadloop



On Thu, 2009-06-11 at 16:33 +0800, Zhang, Yanmin wrote:
On Thu, 2009-06-11 at 08:18 +0200, Peter Zijlstra wrote:
On Thu, 2009-06-11 at 07:55 +0200, Peter Zijlstra wrote:
On Thu, 2009-06-11 at 11:08 +0800, Zhang, Yanmin wrote:
I investigate a fio hang issue. When I run fio multi-process
testing on many disks, fio traps into kernel and doesn't exit
(mostly hit once after runing sub test cases for hundreds of times).

Oprofile data shows kernel consumes time with some futex functions.
Command kill couldn't kill the process and machine reboot also hangs.

Eventually, I locate the root cause as a bug of futex. Kernel enters
a deadloop between 'retry' and 'goto retry' in function futex_wake_op.
By unknown reason (might be an issue of fio or glibc), parameter uaddr2
points to an area which is READONLY. So futex_atomic_op_inuser returns
-EFAULT when trying to changing the data at uaddr2, but later get_user
still succeeds becasue the area is READONLY. Then go back to retry.

I create a simple test case to trigger it, which just shmat an READONLY
area for address uaddr2.

It could be used as a DOS attack.

/me has morning juice and notices he sent the wrong commit...

commit 64d1304a64477629cb16b75491a77bafe6f86963
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Mon May 18 21:20:10 2009 +0200
2.6.30 includes the new commit. I did a quick testing with my simple
test case and it traps into kernel without exiting.

The reason is I use flag FUTEX_PRIVATE_FLAG. So the fshared part in function
get_futex_key should be deleted. That might hurt performance.

FWIW, using a private futex on a shm section is wrong in and of itself.

tglx: should we create CONFIG_DEBUG_FUTEX and do a vma lookup to
validate that private futexes are indeed in private anonymous memory?

But you would be able to trigger the same using an PROT_READ anonymous
mmap().

It appears access_ok() isn't as strict as we'd like it to be:

/*
...
* Note that, depending on architecture, this function probably just
* checks that the pointer is in the user space range - after calling
* this function, memory access functions may still return -EFAULT.
*/
#define access_ok(type, addr, size) (likely(__range_not_ok(addr, size) == 0))

Thomas is working on a fix for this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages