Re: main thread pthread_exit/sys_exit bug!
- From: Roland McGrath <roland@xxxxxxxxxx>
- Date: Sun, 8 Feb 2009 19:33:35 -0800 (PST)
I don't know about other problems with the zombie leaders.
Ok, then we can just concentrate on the test case I posted.
Except, I am worried whether the fix I have in mind is correct ;)
It is simple, wait_task_stopped() should do
I think the first problem is we'll never even get into wait_task_stopped().
We'll be in wait_consider_task() on the group leader, which is EXIT_ZOMBIE.
First we need to adjust this:
if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
return wait_task_zombie(p, options, infop, stat_addr, ru);
to maybe:
if (p->exit_state == EXIT_ZOMBIE) {
if (delay_group_leader(p))
return wait_task_zombie_leader(p, options,
infop, stat_addr, ru);
return wait_task_zombie(p, options, infop, stat_addr, ru);
}
In wait_task_zombie_leader(), it will have to take the siglock and try to
figure out if there is a completed group stop to report.
if we tracer:
check ->state, eat ->exit_code
Being the ptracer does not affect the delay_group_leader logic.
It just affects individual vs group stop reports. So the existing
code path is right for ptrace.
else:
check SIGNAL_STOP_STOPPED, use ->group_exit_code
We don't want wait to change group_exit_code. But we need the "reported as
stopped" tracking that wait_task_stopped() gets by clearing exit_code. So
I think what we need is to get the zombie group_leader->exit_code to be set
to ->group_exit_code as it would have been if the leader were alive and had
participated in the group stop.
This looks logical, and should fix the problem. But this is
the user-visible change. For example,
$ sleep 100
^Z
[1]+ Stopped sleep 100
$ strace -p `pidof sleep`
Process 11442 attached - interrupt to quit
strace hangs in do_wait(). But after the fix strace will happily
proceed. I can't know whether this behaviour change is bad or not.
I think this would only happen if the "reported as stopped" bookkeeping I
mentioned above were broken. The "Stopped" line means that the shell just
did do_wait(WUNTRACED), so wait_task_stopped() cleared ->exit_code when
reporting it as stopped. Now strace does PTRACE_ATTACH and then a wait;
it can't see a fresh wait result here because ->exit_code is still zero.
100% untested concept patch follows.
Thanks,
Roland
==========
diff --git a/kernel/exit.c b/kernel/exit.c
index f80dec3..0000000 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1437,7 +1437,8 @@ static int wait_task_stopped(int ptrace,
exit_code = 0;
spin_lock_irq(&p->sighand->siglock);
- if (unlikely(!task_is_stopped_or_traced(p)))
+ if (unlikely(!task_is_stopped_or_traced(p)) &&
+ (ptrace || p->exit_state != EXIT_ZOMBIE || !delay_group_leader(p)))
goto unlock_sig;
if (!ptrace && p->signal->group_stop_count > 0)
@@ -1598,9 +1599,20 @@ static int wait_consider_task(struct tas
/*
* We don't reap group leaders with subthreads.
+ * When the group leader is dead, it still serves as
+ * a moniker for the whole group for stop and continue.
+ * But for ptrace, stop and continue are reported per-thread.
*/
- if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
- return wait_task_zombie(p, options, infop, stat_addr, ru);
+ if (p->exit_state == EXIT_ZOMBIE) {
+ if (!delay_group_leader(p))
+ return wait_task_zombie(p, options,
+ infop, stat_addr, ru);
+ *notask_error = 0;
+ if (unlikely(ptrace))
+ return 0;
+ return wait_task_stopped(p, options, infop, stat_addr, ru) ?:
+ wait_task_continued(p, options, infop, stat_addr, ru);
+ }
/*
* It's stopped or running now, so it might
diff --git a/kernel/signal.c b/kernel/signal.c
index b6b3676..0000000 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1653,6 +1653,27 @@ finish_stop(int stop_count)
}
/*
+ * Complete group stop bookkeeping after decrementing sig->group_stop_count,
+ * to new value stop_count. When it reaches zero, mark the process stopped.
+ *
+ * If the group leader is already dead, then it did not participate
+ * normally in the group stop. But its ->exit_code stands for the whole
+ * group in do_wait() bookkeeping, so we need it to reflect the stop.
+ */
+static inline void complete_group_stop(struct task_struct *tsk,
+ struct signal_struct *sig,
+ int stop_count)
+{
+ if (stop_count)
+ return;
+
+ sig->flags = SIGNAL_STOP_STOPPED;
+
+ if (tsk->group_leader->exit_state)
+ tsk->group_leader->exit_code = sig->group_exit_code;
+}
+
+/*
* This performs the stopping for SIGSTOP and other stop signals.
* We have to stop all threads in the thread group.
* Returns nonzero if we've actually stopped and released the siglock.
@@ -1696,8 +1717,7 @@ static int do_signal_stop(int signr)
sig->group_stop_count = stop_count;
}
- if (stop_count == 0)
- sig->flags = SIGNAL_STOP_STOPPED;
+ complete_group_stop(current, sig, stop_count);
current->exit_code = sig->group_exit_code;
__set_current_state(TASK_STOPPED);
@@ -1933,9 +1953,8 @@ void exit_signals(struct task_struct *ts
if (!signal_pending(t) && !(t->flags & PF_EXITING))
recalc_sigpending_and_wake(t);
- if (unlikely(tsk->signal->group_stop_count) &&
- !--tsk->signal->group_stop_count) {
- tsk->signal->flags = SIGNAL_STOP_STOPPED;
+ if (unlikely(tsk->signal->group_stop_count)) {
+ complete_group_stop(tsk, sig, --tsk->signal->group_stop_count);
group_stop = 1;
}
out:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: main thread pthread_exit/sys_exit bug!
- From: Oleg Nesterov
- Re: main thread pthread_exit/sys_exit bug!
- References:
- main thread pthread_exit/sys_exit bug!
- From: Kaz Kylheku
- Re: main thread pthread_exit/sys_exit bug!
- From: Oleg Nesterov
- Re: main thread pthread_exit/sys_exit bug!
- From: Kaz Kylheku
- Re: main thread pthread_exit/sys_exit bug!
- From: Oleg Nesterov
- Re: main thread pthread_exit/sys_exit bug!
- From: Roland McGrath
- Re: main thread pthread_exit/sys_exit bug!
- From: Kaz Kylheku
- Re: main thread pthread_exit/sys_exit bug!
- From: Oleg Nesterov
- Re: main thread pthread_exit/sys_exit bug!
- From: Roland McGrath
- Re: main thread pthread_exit/sys_exit bug!
- From: Oleg Nesterov
- main thread pthread_exit/sys_exit bug!
- Prev by Date: Re: [PATCH] PCI/MSI: fix msi_mask() (rev. 2)
- Next by Date: Re: Crypto Fixes for 2.6.29
- Previous by thread: Re: main thread pthread_exit/sys_exit bug!
- Next by thread: Re: main thread pthread_exit/sys_exit bug!
- Index(es):
Relevant Pages
|