Scaling to the End of Silicon with EDGE Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

Scaling to the End of Silicon with EDGE Architectures

Description:

Scaling to the End of Silicon with EDGE Architectures ... R.G. McDonald, W. Yoder and the TRIPS Team (presented by Khalid El-Arini) Overview ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 17
Provided by: khalide
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Scaling to the End of Silicon with EDGE Architectures


1
Scaling to the End of Silicon with EDGE
Architectures
  • D. Burger, S.W. Keckler, K.S. McKinley, M.
    Dahlin, L.K. John, C. Lin,
  • C.R. Moore, J. Burrill,
  • R.G. McDonald, W. Yoder and the TRIPS Team
  • (presented by Khalid El-Arini)

2
Overview
  • Motivation
  • High-level architecture description
  • Compiling for TRIPS
  • Discussion

3
Why do we need a new ISA?
  • For the last 20 years, we have witnessed dramatic
    improvements in processor performance
  • Acceleration of clock rates (x86)
  • 1990 33 MHz
  • 2004 3.4 GHz
  • Aggressive pipelining responsible for
    approximately half of performance gain
  • However, all good things come to an end
  • Hrishikesh et al, ISCA 02

4
Explicit Data Graph Execution
  • Direct instruction communication
  • Producer and consumer instructions interact
    directly
  • An instruction fires when its inputs are
    available
  • Dataflow explicitly represented in hardware
  • No rediscovery of data dependencies
  • Higher exposed concurrency
  • More power-efficient execution

5
TRIPS An EDGE Architecture
  • Four goals
  • Increase in concurrency
  • Power-efficient high performance
  • Mitigation of communication delays
  • Increased flexibility

6
Block Atomic Execution
  • Compiler groups instructions into blocks
  • Called hyperblocks, and contain up to 128
    instructions
  • Each block is fetched, executed, and committed
    atomically
  • (similar to conventional notion of transactions)
  • Sequential execution semantics at block level
    each block is a megainstruction
  • Dataflow execution semantics within each block

7
Hyperblocks and Predication
  • 128 instructions?!
  • Predication allows us to hide branches within
    dataflow graph
  • Loop unrolling and function inlining also help

8
TRIPS Instructions
  • RISC add
  • ADD R1, R2, R3
  • TRIPS add
  • T5 ADD T13, T17
  • Compiler statically determines locations of
    instructions
  • Block mapping/execution model eliminates need to
    go through shared data structures (e.g., register
    file) while executing within a hyperblock

9
TRIPS Processor Core
10
Compiling for TRIPS
11
Compiling for TRIPS
12
Compiling for TRIPS
13
Compiling for TRIPS
  • Two new responsibilities
  • Generating hyperblocks
  • Spatial scheduling of blocks

14
Predicated Execution
  • Naïve implementation
  • Route a predicate to every instruction in a
    predicated basic block
  • Wide fan-out problem
  • Better implementations
  • Predicate only the first instruction in a chain
  • Saves power if predicate is false
  • Predicate only the last instruction in a chain
  • Hide latency of predicate computation

15
Spatial Scheduling
  • Two competing goals
  • Place independent instructions on different ALUs
    to increase concurrency
  • Place instructions near one another to minimize
    routing distances and communication delays

16
Discussion
  • Now that intermediate results within a
    hyperblocks dataflow are directly passed between
    instructions, how will register allocation be
    affected?
  • Compare EDGE compiler/hardware responsibilities
    with RISC and VLIW
Write a Comment
User Comments (0)
About PowerShow.com