Re: Obscure mutex problem
- From: John <nyrinwi@xxxxxxxxx>
- Date: Tue, 04 Sep 2007 21:57:44 -0500
David Given wrote:
I'm the developer of an SMTP greylisting daemon, spey. It has a number of[snip]
happy users and generally works quite well.
Downloaded 0.4.1 from Sourceforge, but could not build (no access to Linux box). Some observations.
Interesting things that I've observed include:
- at least one user reported that if the daemon was allowed to go into the
background, it would fail; but if it was told not to daemonise itself, it
would run fine.
- in studying the strace output, I notice that there doesn't seem to be a call
to futex() corresponding to the initial pthread_mutex_lock() from the main thread.
- when initially writing the code, I discovered that I had to set
PTHREAD_MUTEX_RECURSIVE to make the mutexes work at all... but then later
found this was no longer necessary. No doubt this is due to another change I
made, but I still don't understand it.
- rewriting the mutex initialisation code to use a statically initialised
mutex with PTHREAD_MUTEX_INITIALISER, rather than initialising the mutex with
code, didn't help.
- my test machines include an ARM-based NSLU2 with linuxthreads (one kernel
process per thread) and a i386-based PC with NPTL (real kernel threads). It
works fine on both of these.
If anyone's interested in browsing the code, it's here:
http://spey.cvs.sourceforge.net/spey/spey/src/Threadlet.cc?view=markup
The main program calls Threadlet::initialise() to create and initialise the
mutex, which is then locked. The main program creates a thread that attempts
to lock the mutex, and so blocks. It then calls Threadlet::halt(), which
releases the mutex, allowing the new child thread to run, and then just does
for(;;) pause(). The child thread then acts as the socket master and creates
new (different) child threads for each incoming connection... except it
doesn't get that far. When the failure occurs, the child thread never wakes up.
Does this issue look familiar to anyone? Any suggestions as to what I can try?
Any possible lines of attack? I'm completely stuck on this one...
The fact you had to make the mutex recursive means that the same thread is taking the lock more than once. Since you didn't expect this, it suggests to me that the code is behaving in a way you did not design it to.
Your classes have virtual destructors defined, but no copy constructors. Inadvertent copying can wreak havoc with mutexes. It's a good idea to hide the copy constructor and assignment operator if your object tracks the state of a thread or contains mutexes. The rule of three has been good to me:
http://en.wikipedia.org/wiki/Rule_of_three_%28C%2B%2B_programming%29
BTW, if the mutex is unlocked when futex is called, there is no system call and you won't see it in strace. (that's why it's called a futex "fast userspace mutex").
Good luck.
John
.
- References:
- Obscure mutex problem
- From: David Given
- Obscure mutex problem
- Prev by Date: Obscure mutex problem
- Next by Date: Re: Obscure mutex problem
- Previous by thread: Obscure mutex problem
- Next by thread: Re: Obscure mutex problem
- Index(es):
Relevant Pages
|