Nicolas Tjioe CSE 520 Wednesday 11122008 HyperThreading in NetBurst Microarchitecture David Koufaty - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Nicolas Tjioe CSE 520 Wednesday 11122008 HyperThreading in NetBurst Microarchitecture David Koufaty

Description:

Pipeline the microarchitecture to finer granularities called super pipelining ... Intel Chipset Software Installation. Utility v4.00.1009. Software ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Nicolas Tjioe CSE 520 Wednesday 11122008 HyperThreading in NetBurst Microarchitecture David Koufaty


1
Nicolas TjioeCSE 520Wednesday
11/12/2008Hyper-Threading in NetBurst
MicroarchitectureDavid KoufatyDeborah T.
MarrIntelPublished by the IEEE Computer
SocietyVolume 23, Issue 2, March-April 2003
Page(s)56 - 65
2
Traditional Processor Design
  • Higher Clock Speed
  • Pipeline the microarchitecture to finer
    granularities called super pipelining
  • Instruction Level Parallelism (ILP)
  • In-Order vs Out-of-Order
  • Cache Hierarchy
  • Data on the cache reduces the frequency of
    access to the slower main memory

3
Design Cont.
  • Existing techniques add die-size and power costs.
  • CMP Full set of execution and architectural
    resources.
  • Time-slice multithreading.
  • Simultaneous multithreading (SMT).

4
Hyper-Threading (HT)
  • HT introduces the SMT approach to the Intel
    architecture.
  • A single physical processor appears as multi-core
    processors. One copy of the architecture state
    for each logical processor sharing a single set
    of physical execution resource.
  • HW more instructions, SW schedule more threads
  • HT added less than 5 to the relative chip size
    and maximum power requirements.

5
Microarchitecture choice tradeoffs
  • Partition
  • Dedicating equal resources to each logical
    processor.
  • Simplicity and low complexity.
  • Good for high structures utilization and
    unpredictable.
  • Eg Pipeline Queue
  • Threshold
  • Flexible resource sharing with a limit on the
    maximum resource usage.
  • Ideal for small structures where the resource
    utilization is bursty and predictible.
  • Eg Processor Scheduler
  • Full Sharing
  • Flexible resource sharing with no limit on the
    maximum resource usage.
  • Good for large structures in which the working
    set-size are variable.
  • Eg Processor caches.

6
Shared vs Partitioned Queue
Dark color Slower Thread, Light color Faster
Thread
7
HT Resources
  • Duplicated
  • Register Renaming Logic
  • Instruction Pointer
  • ITLB
  • Return Stack Predictor
  • Partitioned
  • Reorder Buffer (ROB)
  • Load/Store Buffer
  • Scheduling queues, uop queues.
  • Shared
  • Caches Trace cache, L2.
  • Execution unit.
  • Microarchitectural Registers.

8
Front-End Pipeline
Execution Trace Cache Trace Cache (TC) stores
the decoded instructions called Microoperations
(uops). Microcode ROM For complex instruction
where TC sends microcode instruction pointer to
the Microcode ROM. Instruction Translation
Lookup Buffer (ITLB) In case of Trace Cache
Miss, ITLB receives the request from TC to
deliver new instructions and it translates the
next instruction pointer address to a physical
address. Streaming buffers is 64 bytes. IA32
Instruction Decode Decoding is only needed for
instructions that miss the TC. Alternate between
threads, in this way we need two copies of
decoder logic. Uop Queue Each logical processor
has half the entries only (Partitioned). Sends
uops from Front-end pipeline to the Out-of-Order
Execution Engine.
9
Out-of-Order Execution Engine
Each logical processor can use up to a maximum of
63 ROB, 24 load buffers and 12 store buffers.
Allocator It will alternate select uops from
the logical processor at every clock cycle.
Signal stall if limit is reached. Register
Rename Rename the IA32 registers (8) into the
machine physical registers (128). Allow the
instruction to run at the same time with another
instruction that use the same IA32 registers.
Uses RAT to keep track of the registers. Instruct
ion Scheduling Four uops schedulers are used to
schedule different type of uops for different
execution units. Each scheduler has its own queue
of 8-12 entries Retirement Retirement logic
alternate between two logical processors to track
which uops are ready to be retired. Data is
written to the L1 Data cache.
10
Dispatch Execution Units
  • Maximum of instructions that can be dispatched
    is 6
  • Two microinstructions on Port 0.
  • Two microinstructions on Port 1.
  • One microinstruction on Port 2.
  • One microinstruction on Port 3.
  • Same port has fast unit combine with the slow
    unit.
  • Port 2, 3 is used for memory operations (load and
    store).
  • After execution, uops are placed in the ROB.

11
Single Task (ST), Multi Task (MT) Mode
  • Two types of ST Mode ST0 and ST1.
  • Only one logical processor is active, low-power
    mode.
  • Resources that were partitioned in MT mode are
    recombined to give the single logical processors
    the entire resources.
  • HALT instruction is used to transition from MT to
    ST mode.
  • It is a privileged instruction, only ring-0 or OS
    can execute it.

12
Experiment Setup
13
Result
Cache hit rate and overall performance impact for
a fully shared cache normalized against values
for a partitioned cache
14
Multithreading Multitasking Performance
HT Performance on Multithreaded Software Package
HT Performance on Multitasking workloads
15
Conclusions
  • HT improves multithreaded applications by having
    each logical processor run software threads from
    the same application.
  • HT speeds up workload consisting of multitasking
    applications by multitasking. Each logical
    processor run threads from different
    applications.
  • Nehalem (Intel i7) plan to be released in Q4
    2008. It scales up to 8 physical cores (16
    logical processors).

16
Additional References
  • Hyper-Threading Technology Architecture and
    Microarchitecture
  • ftp//download.intel.com/technology/itj/2002/volu
    me06issue01/art01_hyper/vol6iss1_art01.pdf
  • http//www.hardwaresecrets.com/article/235/6
Write a Comment
User Comments (0)
About PowerShow.com