Re: Hyperthreading vs. SMP

From: Robert M. Riches Jr (spamtrap42_at_verizon.net)
Date: 12/13/03


Date: Sat, 13 Dec 2003 05:07:54 GMT

In article <uwu91bgzc.fsf@earthlink.net>, Anne & Lynn Wheeler wrote:
> wb <dead_email@nospam.com> writes:
>> How is memory contention (cache coherence) maintained
>> with these hyperthreading machines ? Does it require an
>> external memory agent ? In a SMP or NUMA, a memory
>> controller (MIP's called it an agent ) ensured that memory
>> integrity was kept. Can each virtual process instance be
>> making memory updates ?
>
> hyperthreading just uses more than one instruction stream, typically
> in an already superscaler processor ... sharing the same cache.
>
> the superscaler processor has multiple instructions in flight already
> ... one of the purposes of superscaler is to compensate for cache
> misses ... other instructions can proceed in parallel when one
> instruction is stalled because of cache miss. The superscaler
> processor may also have speculative execution when conditional
> branches are encounterd .... i.e. assume that the direction of the
> branch is to go one way ... and if it turns out not to ... back-out
> all the instructions executed on the wrong path.
>
> ... (much more excellent explanation and references snipped) ...

Well said, except for one minor point of terminology. At
least in my 17 year experience in microprocessor design,
superscaler basically meant having multiple parallel
execution units, so that multiple instructions could be sent
to execution in the same clock cycle--lowering the best-case
CPI below unity. It is out-of-order execution (or "dynamic
execution" per Pentium Pro marketing literature) that allows
instructions to proceed when an earlier instruction is
stalled by a cache miss.

It is possible (but not too very common) to have either
superscaler or out-of-order without the other. The Intel
960KA/KB/MC/XA had a _very_ limited form of out-of-order to
allow execution past slow loads from memory. (Expired?) US
patent #4,891,753 (with my name on it) is for that
mechanism. The Intel 960CA and 960MM/MX were superscalar
without (any additional) significant out-of-order features.

Robert Riches
spamtrap42@verizon.net
(Yes, that is one of my email addresses.)



Relevant Pages

  • Re: Superstitious learning in Computer Architecture
    ... Without a LOT of logic or some other better approach, re-executing the instructions requires re-decoding and it ties up the cache memory bus transferring more data as instructions than the instructions are working on. ... There is most of an order of magnitude in speed sacrificed by even HAVING a cache in a single ALU system, and more than an order of magnitude in multiple-ALU systems! ...
    (comp.arch.arithmetic)
  • Re: Switching to 64 bit
    ... the opportunity to utilize more memory and perhaps faster operations. ... instructions, thus more instructions fit in L1/2/3 caches. ... modern fast CPU with plenty of cache. ...
    (Debian-User)
  • Re: CPU time variance
    ... The variability that VM ... Processor thruput (number of instructions executed per second) is ... sensitve to processor cache hit ratios ... ... execution rate and increasing the corresponding measured CPU utilization ...
    (bit.listserv.ibm-main)
  • Re: Relocating application architecture and compiler support
    ... intermediate executables that had instructions and data intermixed ... into registers for use in "relocation". ... the address constant when the program was loaded into memory. ... instruction execution relocatable ... ...
    (comp.arch.embedded)
  • Re: heap implementation
    ... How big is the code cache? ... Intel has never officially disclosed the size of the Execution Trace ... average equates to roughly 2K x86 instructions. ...
    (comp.lang.asm.x86)