Re: Hyperthreading vs. SMP

From: Anne & Lynn Wheeler (
Date: 12/13/03

Date: Sat, 13 Dec 2003 02:15:55 GMT

wb <> writes:
> How is memory contention (cache coherence) maintained
> with these hyperthreading machines ? Does it require an
> external memory agent ? In a SMP or NUMA, a memory
> controller (MIP's called it an agent ) ensured that memory
> integrity was kept. Can each virtual process instance be
> making memory updates ?

hyperthreading just uses more than one instruction stream, typically
in an already superscaler processor ... sharing the same cache.

the superscaler processor has multiple instructions in flight already
... one of the purposes of superscaler is to compensate for cache
misses ... other instructions can proceed in parallel when one
instruction is stalled because of cache miss. The superscaler
processor may also have speculative execution when conditional
branches are encounterd .... i.e. assume that the direction of the
branch is to go one way ... and if it turns out not to ... back-out
all the instructions executed on the wrong path.

one of the first such efforts was a dual i-stream design for the
370/195 (some 30 years ago). 195 had 64 instruction pipeline ... but
w/o support for speculative executions ... so branches in the
instruction stream drained the pipeline. except for some specialized
codes, the 195s tended to run at half (or less) of theoritical thruput
because of the large number of conditional branches commonly found in
standard codes. the dual i-stream project defined two instruction
streams, a duplicate set of registers and a red/black bit flag tagging
each operation in the pipeline (indicating which instruction stream
the operation was associated with).

hyperthreading, in principle supports more than one instruction stream
concurrently within the context of already complex superscaler context
... using common processor cache.

in such configurations ... two or more physical/logical processors
(instruction streams) sharing the same cache won't have a cache
consistency problem (although they may have some serialization
issues). it is when there are multiple caches that the issue of
memory/cache consistency arises.

A possibly configuration is four physical processors with two physical
caches (where each physical cache supports two physical processors).
There are cache consistency issues involved in coordination between
the two caches (it is not between the four processors but between the
caches). If you add hyperthreading to each of the four physical
processors (say it now appears as eight logical instruction streams),
that change is possibly totally transparent to the cache operation and
the coherency operation between the two caches.

serialization of processors is typically done with automatic
operations like compare&swap ... but that is somewhat orthogonal to
the coherency implementation between caches (which can be totally
independent of the number/kind of instruction streams supported).

recent posting from somewhat related thread in comp.arch An entirely new proprietrary hardware strategy

misc. past threads mentioning 370/195 dual i-stream work from 30 years
ago: IBM 370/195 The Chronology Power4 = 2 cpu's on die? 360/370 instruction cycle time Pentium 4 SMT "Hyperthreading" Hyper-Threading Technology - Intel information. Pipelining in the past Pipelining in the past IBM Manuals from the 1940's and 1950's S/360 undocumented instructions?

lots of past mentions of compare and swap (again from 30 some years ago): 360/67, was Re: IBM's Project F/S ? S/360 addressing 370 ECPS VM microcode assist 360/370 instruction cycle time IBM was/is: Imitation... CS instruction, when introducted ? Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Extended memory error recovery Test and Set (TS) vs Compare and Swap (CS) Test and Set (TS) vs Compare and Swap (CS) Most complex instructions (was Re: IBM 9020 FAA/ATC Systems from 1960's) IBM OS Timeline? Minimalist design (was Re: Parity - why even or odd) SMP idea for the future SMP idea for the future Cache coherence [was Re: IBM POWER4 ...] IBM 1800 Microcode? IBM Doesn't Make Small MP's Anymore Hardware glitches, designed in and otherwise Future architecture [was Re: Future micro-architecture: ] Spin Loop? Spin Loop? The problem with installable operating systems cost of crossing kernel/user boundary cost of crossing kernel/user boundary Card Columns The relational model and relational algebra - why did SQL become the industry standard? The relational model and relational algebra - why did SQL become the industry standard? CA-RAMIS The Pentium 4 - RIP? Page Table - per OS/Process Disk capacity and backup solutions One Processor is bad? IBM says AMD dead in 5yrs ... -- Microsoft Monopoly vs. IBM Why did TCP become popular ? UT200 (CDC RJE) Software for TOPS-10? atomic memory-operation question SR 15,15 who invented the "popup" ?

Anne & Lynn Wheeler | 
Internet trivia 20th anv