Re: Bug: fio traps into kernel without exiting because futex has a deadloop



On Thu, 2009-06-11 at 08:18 +0200, Peter Zijlstra wrote:
On Thu, 2009-06-11 at 07:55 +0200, Peter Zijlstra wrote:
On Thu, 2009-06-11 at 11:08 +0800, Zhang, Yanmin wrote:
I investigate a fio hang issue. When I run fio multi-process
testing on many disks, fio traps into kernel and doesn't exit
(mostly hit once after runing sub test cases for hundreds of times).

Oprofile data shows kernel consumes time with some futex functions.
Command kill couldn't kill the process and machine reboot also hangs.

Eventually, I locate the root cause as a bug of futex. Kernel enters
a deadloop between 'retry' and 'goto retry' in function futex_wake_op.
By unknown reason (might be an issue of fio or glibc), parameter uaddr2
points to an area which is READONLY. So futex_atomic_op_inuser returns
-EFAULT when trying to changing the data at uaddr2, but later get_user
still succeeds becasue the area is READONLY. Then go back to retry.

I create a simple test case to trigger it, which just shmat an READONLY
area for address uaddr2.

It could be used as a DOS attack.

/me has morning juice and notices he sent the wrong commit...

commit 64d1304a64477629cb16b75491a77bafe6f86963
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Mon May 18 21:20:10 2009 +0200
2.6.30 includes the new commit. I did a quick testing with my simple
test case and it traps into kernel without exiting.

The reason is I use flag FUTEX_PRIVATE_FLAG. So the fshared part in function
get_futex_key should be deleted. That might hurt performance.

Yanmin

#include <stdio.h>
#include <stdlib.h>
#include <linux/futex.h>
#include <sys/time.h>
#define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */
#include <unistd.h>
#include <sys/syscall.h> /* For SYS_xxx definitions */
#include <sys/types.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <errno.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <sys/utsname.h>


#define PAGE_SIZE (4096)
int addr1=1;


int my_shmget(key_t key, int page_count, int *shmid, void **shmaddr)
{
int i, j, k;
void *start_addr = NULL;

if ((*shmid =shmget(key, PAGE_SIZE*page_count, IPC_CREAT|0666 )) < 0) {
perror("Failure:");
return -1;
}

*shmaddr = shmat(*shmid, start_addr, SHM_RDONLY) ;
if (*shmaddr == (void *) -1) {
perror("shmget:Shared Memory Attach Failure:");
shmctl(*shmid, IPC_RMID, NULL);
return -1;
}

return 0;
}

int my_shmput(int shmid, void *shmaddr)
{
if (shmdt((const void *)shmaddr) != 0) {
perror("Detached Failure:");
return -1;
}
if(shmctl(shmid, IPC_RMID, NULL) != 0) {
perror("Remove shm id of htlb page failure!\n");
return -1;
}

return 0;
}

int main()
{
int * uaddr = &addr1, *uaddr2;
void * lp;
int ret;
int shmid;
void *shmaddr;

if(my_shmget(10673861, 10, &shmid, &shmaddr))
exit(0);

uaddr2 = shmaddr;

//uaddr2 = 0;

ret = syscall(__NR_futex, uaddr, FUTEX_WAKE_OP|FUTEX_PRIVATE_FLAG, 1, NULL, uaddr2, 1);

printf("ret=%d\n", ret);

my_shmput(shmid, shmaddr);

return 0;
}



Relevant Pages