Re: main thread pthread_exit/sys_exit bug!



I don't know about other problems with the zombie leaders.

Ok, then we can just concentrate on the test case I posted.

Except, I am worried whether the fix I have in mind is correct ;)
It is simple, wait_task_stopped() should do

I think the first problem is we'll never even get into wait_task_stopped().
We'll be in wait_consider_task() on the group leader, which is EXIT_ZOMBIE.
First we need to adjust this:

if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
return wait_task_zombie(p, options, infop, stat_addr, ru);

to maybe:

if (p->exit_state == EXIT_ZOMBIE) {
if (delay_group_leader(p))
return wait_task_zombie_leader(p, options,
infop, stat_addr, ru);
return wait_task_zombie(p, options, infop, stat_addr, ru);
}

In wait_task_zombie_leader(), it will have to take the siglock and try to
figure out if there is a completed group stop to report.

if we tracer:

check ->state, eat ->exit_code

Being the ptracer does not affect the delay_group_leader logic.
It just affects individual vs group stop reports. So the existing
code path is right for ptrace.

else:
check SIGNAL_STOP_STOPPED, use ->group_exit_code

We don't want wait to change group_exit_code. But we need the "reported as
stopped" tracking that wait_task_stopped() gets by clearing exit_code. So
I think what we need is to get the zombie group_leader->exit_code to be set
to ->group_exit_code as it would have been if the leader were alive and had
participated in the group stop.

This looks logical, and should fix the problem. But this is
the user-visible change. For example,

$ sleep 100
^Z
[1]+ Stopped sleep 100
$ strace -p `pidof sleep`
Process 11442 attached - interrupt to quit

strace hangs in do_wait(). But after the fix strace will happily
proceed. I can't know whether this behaviour change is bad or not.

I think this would only happen if the "reported as stopped" bookkeeping I
mentioned above were broken. The "Stopped" line means that the shell just
did do_wait(WUNTRACED), so wait_task_stopped() cleared ->exit_code when
reporting it as stopped. Now strace does PTRACE_ATTACH and then a wait;
it can't see a fresh wait result here because ->exit_code is still zero.

100% untested concept patch follows.


Thanks,
Roland

==========
diff --git a/kernel/exit.c b/kernel/exit.c
index f80dec3..0000000 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1437,7 +1437,8 @@ static int wait_task_stopped(int ptrace,
exit_code = 0;
spin_lock_irq(&p->sighand->siglock);

- if (unlikely(!task_is_stopped_or_traced(p)))
+ if (unlikely(!task_is_stopped_or_traced(p)) &&
+ (ptrace || p->exit_state != EXIT_ZOMBIE || !delay_group_leader(p)))
goto unlock_sig;

if (!ptrace && p->signal->group_stop_count > 0)
@@ -1598,9 +1599,20 @@ static int wait_consider_task(struct tas

/*
* We don't reap group leaders with subthreads.
+ * When the group leader is dead, it still serves as
+ * a moniker for the whole group for stop and continue.
+ * But for ptrace, stop and continue are reported per-thread.
*/
- if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
- return wait_task_zombie(p, options, infop, stat_addr, ru);
+ if (p->exit_state == EXIT_ZOMBIE) {
+ if (!delay_group_leader(p))
+ return wait_task_zombie(p, options,
+ infop, stat_addr, ru);
+ *notask_error = 0;
+ if (unlikely(ptrace))
+ return 0;
+ return wait_task_stopped(p, options, infop, stat_addr, ru) ?:
+ wait_task_continued(p, options, infop, stat_addr, ru);
+ }

/*
* It's stopped or running now, so it might
diff --git a/kernel/signal.c b/kernel/signal.c
index b6b3676..0000000 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1653,6 +1653,27 @@ finish_stop(int stop_count)
}

/*
+ * Complete group stop bookkeeping after decrementing sig->group_stop_count,
+ * to new value stop_count. When it reaches zero, mark the process stopped.
+ *
+ * If the group leader is already dead, then it did not participate
+ * normally in the group stop. But its ->exit_code stands for the whole
+ * group in do_wait() bookkeeping, so we need it to reflect the stop.
+ */
+static inline void complete_group_stop(struct task_struct *tsk,
+ struct signal_struct *sig,
+ int stop_count)
+{
+ if (stop_count)
+ return;
+
+ sig->flags = SIGNAL_STOP_STOPPED;
+
+ if (tsk->group_leader->exit_state)
+ tsk->group_leader->exit_code = sig->group_exit_code;
+}
+
+/*
* This performs the stopping for SIGSTOP and other stop signals.
* We have to stop all threads in the thread group.
* Returns nonzero if we've actually stopped and released the siglock.
@@ -1696,8 +1717,7 @@ static int do_signal_stop(int signr)
sig->group_stop_count = stop_count;
}

- if (stop_count == 0)
- sig->flags = SIGNAL_STOP_STOPPED;
+ complete_group_stop(current, sig, stop_count);
current->exit_code = sig->group_exit_code;
__set_current_state(TASK_STOPPED);

@@ -1933,9 +1953,8 @@ void exit_signals(struct task_struct *ts
if (!signal_pending(t) && !(t->flags & PF_EXITING))
recalc_sigpending_and_wake(t);

- if (unlikely(tsk->signal->group_stop_count) &&
- !--tsk->signal->group_stop_count) {
- tsk->signal->flags = SIGNAL_STOP_STOPPED;
+ if (unlikely(tsk->signal->group_stop_count)) {
+ complete_group_stop(tsk, sig, --tsk->signal->group_stop_count);
group_stop = 1;
}
out:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Profiling ftw?
    ... The stuff here is a brain dump; there's stuff about ptrace and strace ... Trace and track processes separately, ... Report on library calls ... It also won't trace calls to dlopened libraries. ...
    (Ubuntu)
  • [opensuse] kpowersave krashing on 10.2 and comments
    ... First problem #1: ... the crash handler reports that kpowersave ... moving kpowersave, linking it as /bin/true, but all for naught. ... for another user and having to see this crash report upon every ...
    (SuSE)
  • A wee bit of good news...very wee bit....
    ... Given that I am always considered to be "negative" on my evaluation of the world situation but actually do report some positive developments, I'd like to report another "wee bit" of good news: ... Second key question is about grants. ... There is going to be 24 groups, each with its group leader, and up to six members per group. ... creating some 144 jobs in a country where job creation for biomedical people is almost totally dependent on NIH funding and that funding spins out more job seekers than created jobs, we're still talking about a lot of fishermen around fishing ponds with not enough fish. ...
    (sci.research.careers)
  • Re: 2.6.28 headers break kbd and net-tools userspace builds
    ... packages and others against every new release of linux and linux-stable, (hence my initial report). ...
    (Linux-Kernel)
  • Include pdf-files in reports
    ... I have build a report which generates a packing slip for each order. ... I would like to print out invoices included in the report if possible. ... Each orders invoice is found as a pdf-file located in C:\orders\ and named like the order, ... The first problem is how do I qualify this so that access will get C:\orders\1234.pdf, and next, can this print be included in the report as a separate page following the packing list? ...
    (microsoft.public.access.reports)