Re: Hyperthreading vs. SMP
From: Anne & Lynn Wheeler (lynn_at_garlic.com)
Date: Sat, 13 Dec 2003 02:15:55 GMT
wb <firstname.lastname@example.org> writes:
> How is memory contention (cache coherence) maintained
> with these hyperthreading machines ? Does it require an
> external memory agent ? In a SMP or NUMA, a memory
> controller (MIP's called it an agent ) ensured that memory
> integrity was kept. Can each virtual process instance be
> making memory updates ?
hyperthreading just uses more than one instruction stream, typically
in an already superscaler processor ... sharing the same cache.
the superscaler processor has multiple instructions in flight already
... one of the purposes of superscaler is to compensate for cache
misses ... other instructions can proceed in parallel when one
instruction is stalled because of cache miss. The superscaler
processor may also have speculative execution when conditional
branches are encounterd .... i.e. assume that the direction of the
branch is to go one way ... and if it turns out not to ... back-out
all the instructions executed on the wrong path.
one of the first such efforts was a dual i-stream design for the
370/195 (some 30 years ago). 195 had 64 instruction pipeline ... but
w/o support for speculative executions ... so branches in the
instruction stream drained the pipeline. except for some specialized
codes, the 195s tended to run at half (or less) of theoritical thruput
because of the large number of conditional branches commonly found in
standard codes. the dual i-stream project defined two instruction
streams, a duplicate set of registers and a red/black bit flag tagging
each operation in the pipeline (indicating which instruction stream
the operation was associated with).
hyperthreading, in principle supports more than one instruction stream
concurrently within the context of already complex superscaler context
... using common processor cache.
in such configurations ... two or more physical/logical processors
(instruction streams) sharing the same cache won't have a cache
consistency problem (although they may have some serialization
issues). it is when there are multiple caches that the issue of
memory/cache consistency arises.
A possibly configuration is four physical processors with two physical
caches (where each physical cache supports two physical processors).
There are cache consistency issues involved in coordination between
the two caches (it is not between the four processors but between the
caches). If you add hyperthreading to each of the four physical
processors (say it now appears as eight logical instruction streams),
that change is possibly totally transparent to the cache operation and
the coherency operation between the two caches.
serialization of processors is typically done with automatic
operations like compare&swap ... but that is somewhat orthogonal to
the coherency implementation between caches (which can be totally
independent of the number/kind of instruction streams supported).
recent posting from somewhat related thread in comp.arch
http://www.garlic.com/~lynn/2003p.html#1 An entirely new proprietrary hardware strategy
misc. past threads mentioning 370/195 dual i-stream work from 30 years
http://www.garlic.com/~lynn/94.html#38 IBM 370/195
http://www.garlic.com/~lynn/99.html#73 The Chronology
http://www.garlic.com/~lynn/99.html#97 Power4 = 2 cpu's on die?
http://www.garlic.com/~lynn/2000g.html#15 360/370 instruction cycle time
http://www.garlic.com/~lynn/2001j.html#27 Pentium 4 SMT "Hyperthreading"
http://www.garlic.com/~lynn/2001n.html#63 Hyper-Threading Technology - Intel information.
http://www.garlic.com/~lynn/2002g.html#70 Pipelining in the past
http://www.garlic.com/~lynn/2002g.html#76 Pipelining in the past
http://www.garlic.com/~lynn/2003l.html#48 IBM Manuals from the 1940's and 1950's
http://www.garlic.com/~lynn/2003m.html#60 S/360 undocumented instructions?
lots of past mentions of compare and swap (again from 30 some years ago):
http://www.garlic.com/~lynn/93.html#0 360/67, was Re: IBM's Project F/S ?
http://www.garlic.com/~lynn/93.html#14 S/360 addressing
http://www.garlic.com/~lynn/94.html#28 370 ECPS VM microcode assist
http://www.garlic.com/~lynn/2000g.html#16 360/370 instruction cycle time
http://www.garlic.com/~lynn/2001d.html#42 IBM was/is: Imitation...
http://www.garlic.com/~lynn/2001e.html#73 CS instruction, when introducted ?
http://www.garlic.com/~lynn/2001f.html#41 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001f.html#61 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001f.html#69 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001f.html#70 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001f.html#73 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001f.html#74 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001f.html#75 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001f.html#76 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001g.html#4 Extended memory error recovery
http://www.garlic.com/~lynn/2001g.html#8 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001g.html#9 Test and Set (TS) vs Compare and Swap (CS)
http://www.garlic.com/~lynn/2001i.html#2 Most complex instructions (was Re: IBM 9020 FAA/ATC Systems from 1960's)
http://www.garlic.com/~lynn/2001i.html#34 IBM OS Timeline?
http://www.garlic.com/~lynn/2001k.html#8 Minimalist design (was Re: Parity - why even or odd)
http://www.garlic.com/~lynn/2001k.html#65 SMP idea for the future
http://www.garlic.com/~lynn/2001k.html#67 SMP idea for the future
http://www.garlic.com/~lynn/2001n.html#42 Cache coherence [was Re: IBM POWER4 ...]
http://www.garlic.com/~lynn/2001n.html#43 IBM 1800
http://www.garlic.com/~lynn/2002c.html#9 IBM Doesn't Make Small MP's Anymore
http://www.garlic.com/~lynn/2002f.html#13 Hardware glitches, designed in and otherwise
http://www.garlic.com/~lynn/2002h.html#45 Future architecture [was Re: Future micro-architecture: ]
http://www.garlic.com/~lynn/2002l.html#58 Spin Loop?
http://www.garlic.com/~lynn/2002l.html#59 Spin Loop?
http://www.garlic.com/~lynn/2002l.html#69 The problem with installable operating systems
http://www.garlic.com/~lynn/2003.html#12 cost of crossing kernel/user boundary
http://www.garlic.com/~lynn/2003.html#18 cost of crossing kernel/user boundary
http://www.garlic.com/~lynn/2003b.html#20 Card Columns
http://www.garlic.com/~lynn/2003c.html#75 The relational model and relational algebra - why did SQL become the industry standard?
http://www.garlic.com/~lynn/2003c.html#78 The relational model and relational algebra - why did SQL become the industry standard?
http://www.garlic.com/~lynn/2003e.html#67 The Pentium 4 - RIP?
http://www.garlic.com/~lynn/2003g.html#12 Page Table - per OS/Process
http://www.garlic.com/~lynn/2003g.html#15 Disk capacity and backup solutions
http://www.garlic.com/~lynn/2003g.html#30 One Processor is bad?
http://www.garlic.com/~lynn/2003h.html#5 IBM says AMD dead in 5yrs ... -- Microsoft Monopoly vs. IBM
http://www.garlic.com/~lynn/2003h.html#19 Why did TCP become popular ?
http://www.garlic.com/~lynn/2003h.html#20 UT200 (CDC RJE) Software for TOPS-10?
http://www.garlic.com/~lynn/2003j.html#58 atomic memory-operation question
http://www.garlic.com/~lynn/2003m.html#29 SR 15,15
http://www.garlic.com/~lynn/2003o.html#32 who invented the "popup" ?
-- Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ Internet trivia 20th anv http://www.garlic.com/~lynn/rfcietff.htm