Intel Multimedia Extensions and Hyper-Threading - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Intel Multimedia Extensions and Hyper-Threading

Description:

Intel Multimedia Extensions and Hyper-Threading Michele Co CS451 Outline Evolution of Intel multimedia extensions x87 (386) MMX (Pentium MMX, Pentium II) SSE (Pentium ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 46
Provided by: csVirgin69
Category:

less

Transcript and Presenter's Notes

Title: Intel Multimedia Extensions and Hyper-Threading


1
Intel Multimedia ExtensionsandHyper-Threading
  • Michele Co
  • CS451

2
Outline
  • Evolution of Intel multimedia extensions
  • x87 (386)
  • MMX (Pentium MMX, Pentium II)
  • SSE (Pentium III)
  • SSE2 (Pentium 4 Willamette)
  • SSE3 (Pentium 4 Prescott)
  • Hyper-Threading

3
(No Transcript)
4
X87 FPU
  • 8 80-bit data registers (double extended
    precision floating point)
  • Data registers treated as a stack
  • Control register FP precision, rounding,
  • Status register FPU busy, TOS, CC, error,
    exception,
  • Tag register- (2 bits) valid, zero, special,
    empty
  • Last instruction pointer register
  • Last data (operand) pointer register
  • Opcode register

5
x87 FPU State
6
X87 Data Types
7
x87 Instructions
  • Data transfer (load, store, move)
  • Basic arithmetic
  • Comparison
  • Transcendental (trigonometric, log, exp)
  • Load constant
  • x87 FPU control

8
MMX
  • SIMD execution
  • 8 64-bit data registers (MMX)
  • Aliased to x87 FPU registers
  • Randomly accessible

9
SIMD Execution
10
MMX State
11
MMX Registers
12
MMX Data Types
13
MMX Instructions
  • Data transfer
  • Arithmetic
  • Comparison
  • Conversion
  • Unpacking
  • Logical
  • Shift
  • Empty MMX state

14
SSE
  • Pentium III
  • 8 128-bit data registers (XMM)
  • Independent of x87 FPU and MMX registers
  • SSE instructions can be executed in parallel with
    MMX/x87
  • MXCSR register control and status for XMM
    registers (similar to x87 status register)
  • EFLAGS register results of compare ops
  • 128-bit packed single-precision fp data type
  • Prefetching, cacheability, store ordering control
    instructions

15
SSE State
16
XMM Registers
17
SSE Data Type
18
SSE Instructions
  • Packed and scalar single-precision floating point
  • Logical
  • Conversion
  • 64-bit SIMD integer
  • MXCSR management
  • State management
  • Cacheability control, prefetch, memory ordering
  • SFENCE (store fence)
  • FXSAVE, FXRSTORE
  • extension of x87 fast save and restore of x87,
    MMX registers to also include save/restore of
    XMM, MXCSR registers

19
Packed Single-Precision FP Operation
20
Scalar Single-Precision FP Operation
21
Shuffle
22
Unpack and Interleave
23
SSE2
  • Pentium 4
  • More data types
  • More instructions to support new data types

24
SSE2 State
25
SSE2 Data Types
26
SSE2 Instructions
  • Support for additional types
  • CLFLUSH (cache line flush)
  • LFENCE (load fence)
  • MFENCE (load store fence)

27
Packed Double-Precision FP Operations
28
Scalar Double-Precision FP Operations
29
SSE3
  • Pentium 4 (Prescott)
  • Support for Hyper-Threading
  • 13 new instructions
  • 10 SIMD support instructions
  • 1 x87 accelerating instruction (fp to int
    conversion)
  • Synchronization of threads
  • MONITOR (monitor write-back stores)
  • MWAIT (wait for write-back store)
  • No new state

30
Asymmetric Processing
31
Horizontal Data Movement
32
Hyper-Threading
33
Terminology
  • Process
  • Program associated with a context (state
    registers, program counter, flags, etc.)
  • Consists of one or more threads
  • Thread
  • lightweight process (less state)

34
Hyper-threading
  • Single physical processor appears as 2 logical
    processors
  • Thread Level Parallelism (TLP)
  • Many applications have software threads that can
    be executed simultaneously
  • Online transaction processing
  • Web services
  • Latency can leave execution units idle
  • Cache misses
  • Branch mispredictions
  • Waiting for loads/stores

35
Techniques for Minimizing Effect of Long Latency
  • Chip multiprocessing (CMP)
  • 2 processors on single die
  • Larger than single core chip, manufacture more
    expensive
  • Time-slice or switch-on-event multithreading
  • Switch threads after fixed time period or on long
    latency events like cache misses
  • Doesnt take advantage of other sources of
    inefficient resource usage (branch
    mispredictions, instruction dependencies, etc.)
  • Simultaneous multithreading (SMT)
  • Multiple threads execute on single processor
    without switching
  • Hyper-Threading is Intels implementation

36
Intel Hyper-Threading Demo
37
Resource Requirements for HT
  • Need to maintain 2 contexts
  • Replicated
  • Register renaming logic (RAT)
  • Instruction Pointer
  • ITLB
  • Return stack predictor
  • Various other architectural registers (GP,
    control, APIC, machine state)
  • Partitioned
  • Re-order buffers (ROBs)
  • Load/Store buffers
  • Various queues, like the scheduling queues, uop
    queue, etc.
  • Shared
  • Caches trace cache, L1, L2, L3, microcode ROM
  • Microarchitectural registers
  • Execution Units

38
Hyper-Threading Goals
  • Minimize die area cost for implementing
  • Ensure forward progress by at least one logical
    processor
  • Maintain single-threaded performance

39
Frontend Changes
  • 2 PCs
  • Arbitration for shared resource access
  • Trace cache, microcode ROM, caches
  • One logical processor at a time per structure
  • Thread tags per trace cache entry
  • Microcode ROM 2 microcode instruction pointers
  • Wider pipeline latches to hold state for 2
    contexts
  • Branch prediction
  • RAS and branch history buffer duplicated
  • Global history shared, but tagged with logical
    processor ID

40
Trace Cache Hit
41
Trace Cache Miss
42
Hyper-threaded Execution
43
Execution Modes
  • Single-task (ST), Multi-task (MT)
  • ST0, ST1
  • HALT transitions ST modes depending on logical
    processor executing
  • Interrupt sent to halted processor transitions to
    MT

44
HT Performance - OLTP
45
HT Performance Web Server
Write a Comment
User Comments (0)
About PowerShow.com