Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: ebiederm@xxxxxxxxxxxx (Eric W. Biederman)
- Date: Tue, 01 Jul 2008 14:39:27 -0700
Jeremy Fitzhardinge <jeremy@xxxxxxxx> writes:
I just looked and gcc does not use this technique for thread local data.
Which technique?
A section located at 0.
It does assume you put the thread-local data near %gs (%fs in
userspace), and it uses a small offset (positive or negative) to
reach it.
Nope. It achieves that affect with a magic set of relocations instead
of linker magic.
At present, the x86-64 only uses %gs-relative addressing to reach the pda, which
are always small positive offsets. It always accesses per-cpu data in a
two-step process of getting the base of per-cpu data, then offsetting to find
the particular variable.
x86-32 has no pda, and arranges %fs so that %fs:variable gets the percpu variant
of variable. The offsets are always quite large.
As a practical matter I like that approach (except for extra code size
of the offsets).
My initial concern about all of this was not making symbols section relative
is relieved as this all appears to be a 64bit arch thing where that doesn't
matter.
Why's that? I thought you cared particularly about making the x86-64 kernel
relocatable for kdump, and that using non-absolute symbols was part of that?
That is all true but unconnected.
For x86_64 the kernel lives at a fixed virtual address. So absolute or
non absolute symbols don't matter. Only __pa and a little bit of code
in head64.S that sets up the intial page tables has to be aware of it.
So relocation on x86_64 is practically free.
For i386 since virtual address space is precious and because there were
concerns about putting code in __pa we actually relocate the kernel symbols
during load right after decompression. When we do relocations absolute
symbols are a killer.
Has anyone investigated using the technique gcc uses for thread local storage?
http://people.redhat.com/drepper/tls.pdf
The powerpc guys tried using gcc-level thread-local storage, but it doesn't work
well. per-cpu data and per-thread data have different constraints, and its hard
to tell gcc about them. For example, if you have a section of preemptable code
in your function, it's hard to tell gcc not to cache a "thread-local" variable
across it, even though we could have switched CPUs in the meantime.
Yes, I completely agree with that. It doesn't mean however that we
can't keep gcc ignorant and generate the same code manually.
In particular using the local exec model so we can say:like
movq %fs:x@tpoff,%rax
To load the contents of a per cpu variable x into %rax ?
If we can use that model it should make it easier to interface with things
the stack protector code. Although we would still need to be very careful
about thread switches.
You mean cpu switches? We don't really have a notion of thread-local data in
the kernel, other than things hanging off the kernel stack.
Well I was thinking threads switching on a cpu having the kinds of problems you
described when it was tried on ppc.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: H. Peter Anvin
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: Jeremy Fitzhardinge
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- References:
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: Eric W. Biederman
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: H. Peter Anvin
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: Jeremy Fitzhardinge
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: Eric W. Biederman
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- From: Jeremy Fitzhardinge
- Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- Prev by Date: Re: [PATCH] Add support for power_supply on tosa
- Next by Date: [PATCH] Make loading padlock modules quieter when hardware isn't available
- Previous by thread: Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- Next by thread: Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
- Index(es):
Relevant Pages
|