Re: Hyperthreading vs. SMP

From: Anne & Lynn Wheeler
Date: 12/13/03

Date: Sat, 13 Dec 2003 02:15:55 GMT

wb <> writes:
> How is memory contention (cache coherence) maintained
> with these hyperthreading machines ? Does it require an
> external memory agent ? In a SMP or NUMA, a memory
> controller (MIP's called it an agent ) ensured that memory
> integrity was kept. Can each virtual process instance be
> making memory updates ?

hyperthreading just uses more than one instruction stream, typically
in an already superscaler processor ... sharing the same cache.

the superscaler processor has multiple instructions in flight already
... one of the purposes of superscaler is to compensate for cache
misses ... other instructions can proceed in parallel when one
instruction is stalled because of cache miss. The superscaler
processor may also have speculative execution when conditional
branches are encounterd .... i.e. assume that the direction of the
branch is to go one way ... and if it turns out not to ... back-out
all the instructions executed on the wrong path.

one of the first such efforts was a dual i-stream design for the
370/195 (some 30 years ago). 195 had 64 instruction pipeline ... but
w/o support for speculative executions ... so branches in the
instruction stream drained the pipeline. except for some specialized
codes, the 195s tended to run at half (or less) of theoritical thruput
because of the large number of conditional branches commonly found in
standard codes. the dual i-stream project defined two instruction
streams, a duplicate set of registers and a red/black bit flag tagging
each operation in the pipeline (indicating which instruction stream
the operation was associated with).

hyperthreading, in principle supports more than one instruction stream
concurrently within the context of already complex superscaler context
... using common processor cache.

in such configurations ... two or more physical/logical processors
(instruction streams) sharing the same cache won't have a cache
consistency problem (although they may have some serialization
issues). it is when there are multiple caches that the issue of
memory/cache consistency arises.

A possibly configuration is four physical processors with two physical
caches (where each physical cache supports two physical processors).
There are cache consistency issues involved in coordination between
the two caches (it is not between the four processors but between the
caches). If you add hyperthreading to each of the four physical
processors (say it now appears as eight logical instruction streams),
that change is possibly totally transparent to the cache operation and
the coherency operation between the two caches.

serialization of processors is typically done with automatic
operations like compare&swap ... but that is somewhat orthogonal to
the coherency implementation between caches (which can be totally
independent of the number/kind of instruction streams supported).

