The Intel Itanium 2 Architecture - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

The Intel Itanium 2 Architecture

Description:

Uses dynamic prefetch, branch prediction, register scoreboard. ... Used to hold branching information (branch target address) for indirect branches. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 21
Provided by: Alex364
Category:

less

Transcript and Presenter's Notes

Title: The Intel Itanium 2 Architecture


1
The Intel Itanium 2 Architecture
  • By
  • Alexandra Martinez

2
Contents
  • Overview
  • EPIC Model
  • The Itanium 2 Processor and Pipeline
  • Instruction Processing
  • Instruction Execution
  • Control
  • Memory Subsystem
  • Conclusions
  • Bibliographic Sources

3
1.Overview
  • Itanium 2 is the second implementation of the
    Itanium ISA.
  • EPIC for scaling performance increasing ILP.
  • 6-wide, 8-stage deep pipeline running at 1.5 GHz.
  • Resources
  • 6 integer and 6 multimedia ALUs,
  • 2 load and 2 store units,
  • 3 branch units,
  • 2 extended-precision and 2 single-precision FP
    units
  • 128 integer reg, 128 FP reg, 8 branch reg, 64
    predicate reg.
  • Can fetch, issue, execute, and retire 6 instr (2
    bundles) / clock.
  • Three levels of on-die cache minimize memory
    latency.
  • Uses dynamic prefetch, branch prediction,
    register scoreboard.
  • System bus for MP support (up to 4 proc/bus)
    building block

4
1.1 EPIC Explicitly Parallel Instruction
Computing Model
  • Designed to achieve high degree of parallelism.
  • Provides tighter coupling between hardware and
    software.
  • Compiler exploits compile-time information to
    optimize code.
  • Processor executes instructions as rapidly as
    possible.
  • Current Itanium processors can execute up to 6
    instructions in parallel.
  • Increased parallelism will be enabled by EPIC on
    future processor generations.
  • Maximum number of simultaneous instructions is
    not the only measure of performance.
  • The processor must sustain high levels of
    parallelism to optimize total throughput, e.g.
    extensive computational resources, predication
    and speculation.

5
1.2 Predication
  • Conditional branches would normally stall the
    processor until the conditional statement is
    processed .
  • In branch-intensive code, this can become a major
    limitation for the overall throughput .
  • With EPIC model Predication
  • Predication allows the compiler to explicitly
    identify instruction streams that can be
    processed in parallel.
  • Also allows the processor to pre-load
    instructions and data and begin processing for
    both branches simultaneously.
  • Once the conditional statement is processed, the
    information gathered for the incorrect path is
    simply discarded.
  • Improves performance by eliminating branches and
    associated branch misprediction penalties.

6
1.3 Speculation
  • Speculation allows the compiler to identify
    future data needs, so data can be loaded ahead of
    branches and stores.
  • Helps hiding memory latency.
  • Intel Itanium 2 Processors include a
    fully-associative Advanced Load Address Table
    (ALAT), which manages speculative data more
    efficiently.
  • Reduces penalty (recovery time) for a wrong
    speculation.
  • The Itanium 2 cache subsystem also accelerates
    speculative data loading.
  • Data can be stored and read faster from this
    integrated cache than from an out-of-processor
    cache.

7
  • 2. The Itanium 2 Processor
  • and Pipeline

8
2.1 The Pipeline
Exception detection
Write Back
Instruction Fetch
Instruction Rotation
Expand (disperse)
Register Rename
Register Read
9
2.2 The Processor
Instruction Processing
IA-32 execution engine
Control Block
Memory Subsystem
Execution Block
10
2.3 Instruction Processing (1)
  • Instruction Prefetch and Fetch
  • Instruction Prefetch?
  • Moving instructions cache lines from higher
    levels of cache or memory into L1i cache.
  • Speculative prefetches are based on the branch
    prediction strategy and compiler hints.
  • The processor reads 2 instruction bundles from
    L1i cache and places them in the 8-bundle
    instruction buffer (IB).
  • The IB holds bundles of instructions until they
    are consumed by the functional units.
  • The bundles are then sent to the instruction
    issue and rename logic, according to available
    execution resources.
  • The instruction address generator unit selects
    the next IP.

11
2.3 Instruction Processing (2)
  • Branch Prediction (BP)
  • Itanium 2 employs both static and dynamic methods
    for branch prediction.
  • For static BP, hint completers are used.
  • Branch hints encode information about branch
    behavior that is used by the processor to improve
    branch prediction.
  • Branch hints in Itanium 2 are sptk, spnt, dptk,
    dpnt
  • sp for static prediction, dp for dynamic
    prediction, tk for taken, nt for not taken.
  • For dynamic prediction, various hardware
    structures are used.
  • BP in Itanium 2 is closely tied to the L1i cache,
    which allows for the zero-bubble BP algorithm.

12
2.3 Instruction Processing (3)
  • Dispersal Logic (DL)
  • The process of mapping instructions within
    bundles to functional units is called dispersal.
  • The dispersal logic sends each instruction to one
    functional unit through its issue ports.
  • Instructions type and position define its issue
    port.
  • Instruction Buffer (IB) can hold up to 8
    instruction bundles, and delivers 2 bundles per
    cycle to the dispersal logic.
  • An instruction bundle has a template indicator,
    assigned by the compiler.
  • 12 templates for Itanium 2 instructions.
  • Template concatenate types of instructions
    within a bundle.
  • Instructions issued as a group will proceed as a
    group through the pipeline (if one stalls, whole
    group stalls).

13
2.4 Instruction Execution (1)
  • Processor execution logic contains
  • 6 multimedia units 6 integer units
  • 4 load/store units 3 branch units 2 FP units
  • 128 int registers 128 FP registers
  • 8 branch registers 64 predicate registers
  • Integer loads processed by L1 cache.
  • Integer stores processed by L2 cache.
  • FP loads and stores processed by L2 cache.
  • Multimedia engines treats 64-bit data as
  • 2 x 32-bit, 4 x 16-bit, or 8 x 8-bit packed data
    types.
  • Ops on packed (SIMD) data by MM engines
  • Arithmetic, shift, and data arragement
  • Ops non-packed data by Integer engines
  • Up to 6 integer arithmetic and logical ops.

14
2.4 Instruction Execution (2)
  • Floating-Point Unit (FPU)
  • The FPU has 4 pipeline stages.
  • Bypassing logic allows data forwarding from
    various FP stages to the FP write back stage.
  • FPU includes
  • 2 FP Multiply-Accumulate (FMAC) units
  • 2 FMISC units
  • Support for SIMD formats
  • Operands are check for exceptions before
    instruction enters FP pipeline.
  • A maximum of 4 FP ops can be executed per cycle.

15
2.4 Instruction Execution (3)
  • Predicate Registers
  • Set of 64 1-bit predicate registers (PR0 PR63).
  • Used to hold results of compare instructions for
    conditional execution of instructions.
  • Partition in 2 subsets
  • 0-15 static predicate registers
  • 16-63 rotating predicate registers
  • Branch Registers
  • Set of 8 64-bit registers (BR0 BR7).
  • Used to hold branching information (branch target
    address) for indirect branches.

16
2.5 Control
  • The control block includes the exception handler
    and pipeline control.
  • The exception handler implements exception
    prioritizing.
  • The pipeline control
  • Has a scoreboard that detects dependencies,
    supports data speculation, and tracks multi-cycle
    operations.
  • Supports predication via predication registers.
  • Contains a Performance Monitoring unit.
  • The processor stalls only when source operands
    are not yet available.

17
2.6 Memory Subsystem (1)
  • Three-level non-blocking cache structure.
  • Split L1 cache for data and instructions.
  • Unified L2 cache.
  • Unified L3 cache.

18
2.6 Memory Subsystem (2)
  • Advanced Load Address Table (ALAT)
  • Cache structure that enables data speculation by
    holding the state for advanced load and check
    operations.
  • It keeps information of speculative data loads
    issued and stores that are aliased with these
    loads.
  • 32 entries, fully associative array
  • It can handle 2 stores and 2 loads per cycle.
  • Translation Lookaside Buffers (TLBs)
  • 2 Types Data TLB (DTLB) and Instruction TLB
    (ITLB).
  • DTLB and ITLB each has 2 levels (L1 and L2)
  • TLB-L1 32 entries, fully associative, supports
    4KB pages.
  • TLB-L2 128 entries, fully associative, pages 4KB
    to 4GB.

19
3. Conclusions
  • Last Released Itanium 2 Performance
  • Provides performance increases of up to 30 to 50
    percent over the original Itanium 2 processor.
  • Itanium 2 processors are ideal for...
  • Enterprise requirements, including database
    consolidation, enterprise resource planning,
    business intelligence, and high performance
    computing.
  • Pricing
  • The Itanium 2 with 6 MB of L3 cache at 1.50 GHz
    costs 4,226
  • Itanium Family Roadmap
  • Mid 2003 Itanium 2 processor with on-chip 6MB L3
    cache and 1.5 GHz speed (formerly Madison)
  • End 2003 a low-voltage Intel Itanium 2 processor
    will be released.
  • Mid 2004 Madison 9M processor with an on-chip
    9MB L3 cache and a further clock increase.
  • 2005 Montecito will add a new feature called
    dual-core which means two separate logical
    processors on one physical chip.

20
Sources
  • Intel Itanium 2 Processor Reference Manual for
    Software Development and Optimization, April 2003
    ftp//download.intel.com/design/Itanium2/manuals/2
    5111002.pdf
  • Intel Itanium 2 Processor Hardware Developer's
    Manual, July 2002 ftp//download.intel.com/design/
    Itanium2/manuals/25110901.pdf
  • Intel Itanium 2 Processor http//www.intel.com/ebu
    siness/products/itanium/index.htm
  • Intel Itanium 2 Processor Product Overview
    http//developer.intel.com/design/itanium2/prodbre
    f/index.htm
  • Intel Itanium 2 Processor Product Features
    http//www.intel.com/design/itanium2/index.htm?iid
    ipp_srvr_proc_itanium2prod_procspecfeatures
  • EPIC Technology Moves Forward http//developer.int
    el.ru/download/eBusiness/pdf/wp022404.pdf
  • Intel Press Release New Intel Server Processors
    Strengthen High-End Line-Up http//www.intel.com/p
    ressroom/archive/releases/20030630comp.htm
  • Intel E8870 Chipset http//www.intel.com/design/ch
    ipsets/e8870/index.htm
  • The Intel Itanium Architecture Comes of Age
    http//www.intel.com/ebusiness/pdf/prod/itanium/ec
    osystem.pdf
  • Roadmap Report How Industry Leading Software
    Maps to Intel's Hardware Plans
    http//cedar.intel.com/cgi-bin/ids.dll/content/con
    tent.jsp?cntKeyGenericEditorial3a3axeon_develo
    percntTypeIDS_EDITORIALcatCodeDASpath3
Write a Comment
User Comments (0)
About PowerShow.com