Re: [PATCH] proc: readdir race fix (take 3)



On Thursday 7 September 2006 00:43, Eric W. Biederman wrote:
Have you tested 2.6.18-rc6 without my patch?

Yes I did, it didn't crash after a couple hours. Of course it doesn't
prove anything as the crash appears to be the result of a race.

I'll now apply Oleg's fix and see if things get better.

I guess the practical question is what was your test methodology to
reproduce this problem? A couple of more people running the same
test on a few more machines might at least give us confidence in what
is going on.

"My" test program forks 1000 children who sleep for 1 second then look for
themselves in /proc, warn if they can't find themselves, and exit. So
basically the idea is that the process list will shrink very rapidly at
the same moment every child does readdir(/proc).

I attached the test program, I take no credit (nor shame) for it, it was
provided to me by IBM (possibly on behalf of one of their own customers)
as a way to demonstrate and reproduce the original readdir(/proc) race
bug.

--
Jean Delvare
#include <stdlib.h>
#include <stdio.h>
#include <dirent.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/param.h>
#include <utmp.h>
#include <pwd.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <syslog.h>
#include <errno.h>
#include <stdarg.h>
#include <ctype.h>

#define NUM_CHILDREN 1000

findme(i)
int i;
{
DIR * dir = NULL;
struct dirent *d;
int pid;
int mypid;

mypid = getpid();


if ((dir = opendir("/proc")) == (DIR *)0)
{
perror("failed to open /proc\n");
exit(1);
}

while((d = readdir(dir)) != (struct dirent *)0) {
if ((pid = (pid_t)atoi(d->d_name)) == 0) continue;
if (pid==mypid) return(1);
}
printf("\nfailed to find myself: pid %d, iteration %d\n",mypid,i);
return(0);
}

fork_child(i)
int i;
{
int pid;

switch ((pid = fork())) {
case 0: /* child */
sleep(1);
findme(i);
exit(0);
;;
case -1: /* error */
perror("failed to fork\n");
exit(1);
;;
default: /* parent */
;;
}

}


main()
{
int i;
(void)signal(SIGCHLD, SIG_IGN);


for (i=0; i<NUM_CHILDREN; i++)
{
fork_child(i);
}

}


Relevant Pages

  • Re: [RFC] Linux Kernel Dump Test Module
    ... Please find below a patch for a simple module to test Linux Kernel Dump ... This module uses jprobes to install/activate pre-defined crash ... Dump Test Tool by Fernando. ... +struct block_device *bdev, unsigned int cmd, ...
    (Linux-Kernel)
  • Re: Reusing a deleted pointer.
    ... >> more allocations of A's, ... >> crash, ... > or fails, per user choice. ... > int a; ...
    (comp.lang.cpp)
  • CSplitterWnd and OnSize
    ... operations I need to do on it cause the program to crash (NULL pointers ... I need a way to determine whether the splitter is ... // return CFrameWnd::OnCreateClient(lpcs, pContext); ... void CMainFrame::OnSize(UINT nType, int cx, int cy) ...
    (microsoft.public.vc.mfc)
  • [bug] mm/slab.c boot crash in -git, "kernel BUG at mm/slab.c:2103!"
    ... the very same bzImage does not crash on other testboxes - only on ... static int drain_freelist(struct kmem_cache *cache, ... * This verifies that the untrusted pointer looks sane; ...
    (Linux-Kernel)
  • Re: Thread creation
    ... On which line of code do you get the "crash", ... class MYTHREAD{ ... static unsigned int Worker{ ... CString someStr; ...
    (microsoft.public.vc.mfc)