Re: 2.6.26-rc5-mm1
- From: Mariusz Kozlowski <m.kozlowski@xxxxxxxxxx>
- Date: Wed, 18 Jun 2008 00:26:27 +0200
Hello,
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote on 10.06.2008 07:01:49:
On Tue, 10 Jun 2008 06:57:02 +0200 Mariusz Kozlowski <m.;)
kozlowski@xxxxxxxxxx> wrote:
Witam,7fff7006e6d0 error 7
On Mon, 9 Jun 2008 21:14:54 +0200
Mariusz Kozlowski <m.kozlowski@xxxxxxxxxx> wrote:
Hello Balbir,
Andrew Morton wrote:
Temporarily at
http://userweb.kernel.org/~akpm/2.6.26-rc5-mm1/
I've hit a segfault, the last few lines on my console are
Testing -fstack-protector-all feature
registered taskstats version 1
debug: unmapping init memory ffffffff80c03000..ffffffff80dd8000
init[1]: segfault at 7fff701fe880 ip 7fff701fee5e sp
trace. Here it goes:
With absolutely no stack trace. I'll dig deeper.
Hey, I see something similar and I actually have a stack
7fffa3d010f0 error 7
bash[498] segfault at ffffffff80868b58 ip ffffffffff600412 sp
7fff9e97f640 error 7init[1] segfault at ffffffff80868b58 ip ffffffffff600412 sp
7fff9e97eed0 error 7init[1] segfault at ffffffff80868b58 ip ffffffffff600412 sp
Kernel panic - not syncing: Attemted to kill init!
Pid 1, comm: init Not tainted 2.6.26-rc5-mm1 #1
Call Trace:
[<ffffffff80254632>] panic+0xe2/0x260
[<ffffffff802fa8ba>] ? __slab_free+0x10a/0x630
[<ffffffff80265a8e>] ? __sigqueue_free+0x5e/0x70
[<ffffffff802851eb>] ? trace_hardirqs_off+0x1b/0x30
[<ffffffff802851eb>] ? trace_hardirqs_off+0x1b/0x30
[<ffffffff80259b54>] do_exit+0xb84/0xc30
[<ffffffff80259c5a>] do_group_exit+0x5a/0x110
[<ffffffff8026a3b5>] get_signal_to_deliver+0x2c5/0x620
[<ffffffff8020bb3b>] do_notify_resume+0x11b/0xd10
[<ffffffff8028da5b>] ? trace_hardirqs_on+0x1b/0x30
[<ffffffff805cd0f3>] ? _spin_unlock_irqrestore+0x93/0x130
[<ffffffff8026865c>] ? force_sig_info+0x10c/0x130
[<ffffffff8022fb9c>] ? force_sig_info_fault+0x2c/0x40
[<ffffffff802dd7dd>] ? print_vma_addr+0x10d/0x1d0
[<ffffffff805cbb67>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8028d8da>] ? trace_hardirqs_on_caller+0x15a/0x2c0
[<ffffffff8020d4c9>] retint_signal+0x46/0x8d
This was copied manually so typos are possible.
Thanks. Could someone send a config please? Or a bisection result
is attached.
In my case it turns out to be gcov patches - in which I'm interested
in to see (and play with) the tests coverage.
#
# gcov
#
kernel-call-constructors.patch
kernel-introduce-gcc_version_lower-macro.patch
seq_file-add-function-to-write-binary-data.patch
GOOD
gcov-add-gcov-profiling-infrastructure.patch
GOOD
gcov-create-links-to-gcda-files-in-build-directory.patch
gcov-architecture-specific-compile-flag-adjustments.patch
BAD
I can not bisect between the last two due to build error. Config
(cc Peter)
Sorry for the delay. Unfortunately I don't have as much time for this
as I'd like to.
Thanks for the report. These look like the "known architecture problems"
that I've hinted at in the gcov announcement post (I'm assuming this is
x86_64 as I've seem similar reports in the past).
Possible reasons:
1) initrd overwrites kernel: When kernel and initrd are loaded to static
addresses, the oversized gcov kernel may overlap with the initrd load
address. Solution: move initrd loading address.
Not using initrd.
2) out-of-memory: Kernel plus profiling code may just not fit into a
minimal memory configuration any more. Solution: add memory.
2G is ok I assume.
3) write-protection of kernel code: gcc keeps profiling code and data
close together in the .text section. Solution: any mechanism that
protects .text against writes should be disabled when running a
profiled kernel.
Not using it.
4) as of yet undiscovered incompatibilities between arch-dependent code
and gcc's -fprofile-arcs option. Examples would be:
* code which is run before memory access preparations were made
* hard coded section sizes
* relative address displacements which are out of range
Unfortunately I neither have access to a machine nor the skill to debug
4) myself, so if 1)-3) can be ruled out,
Yes they can be ruled out.
I'd like to ask for more help
on this one:
First off, someone needs to track down the offending file(s). This is
done by putting a line containing "GCOV := n" in all Makefiles below
arch/x86_64 (or go one step further back and set
CONFIG_GCOV_PROFILE_ALL=n). If my assumption is correct, then the
kernel should boot fine afterwards. In that case, remove the lines
again one-by-one, while compiling and booting after each change. If the
problem can be narrowed down to a single Makefile, replace the single
"GCOV := n" line with multiple "GCOV_file.o := n" lines, one for each
generated object file. Then again, same approach as before: remove
those lines, compile and boot until it breaks. Finally post your
results.
After a few hours and tons of reboots I narrowed it down to
arch/x86/kernel/Makefile:
a) works
obj-y += tsc_$(BITS).o io_delay.o rtc.o
GCOV_tsc_$(BITS).o := n
#GCOV_io_delay.o := n
#GCOV_rtc.o := n
b) doesn't work
obj-y += tsc_$(BITS).o io_delay.o rtc.o
#GCOV_tsc_$(BITS).o := n
#GCOV_io_delay.o := n
#GCOV_rtc.o := n
and that points to arch/x86/kernel/tsc_64.c
At this point we would need someone with x86_64 arch skills to look at
the file and find out why this code is broken with "-fprofile-arcs"
enabled (on s390 we discovered at least one actual code bug this way,
so the analysis might just be of general use). Alternatively we can
just keep these files from being profiled.
Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: 2.6.26-rc5-mm1
- From: Peter Oberparleiter
- Re: 2.6.26-rc5-mm1
- References:
- Re: 2.6.26-rc5-mm1
- From: Peter 1 Oberparleiter
- Re: 2.6.26-rc5-mm1
- Prev by Date: Re: linux-next: Tree for June 13: IO APIC breakage on HP nx6325
- Next by Date: [2.6 patch] input/serio/hil_mlc.c: make code static
- Previous by thread: Re: 2.6.26-rc5-mm1
- Next by thread: Re: 2.6.26-rc5-mm1
- Index(es):
Relevant Pages
|