Runahead Execution: An Alternative to Very Large Instruction Windows for Outoforder Processors - PowerPoint PPT Presentation

About This Presentation
Title:

Runahead Execution: An Alternative to Very Large Instruction Windows for Outoforder Processors

Description:

Long Running Instruction. Commited Instruction. Instruction Window. Filling the Instruction Window ... instructions during long stalls. Disregard results ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 22
Provided by: mte3
Category:

less

Transcript and Presenter's Notes

Title: Runahead Execution: An Alternative to Very Large Instruction Windows for Outoforder Processors


1
Runahead Execution An Alternative to Very Large
Instruction Windows for Out-of-order Processors
  • Onur Mutlu, The University of Texas at Austin
  • Jared Start, Microprocessor Research, Intel Labs
  • Chris Wilkerson, Desktop Platforms Group, Intel
    Corp
  • Yale N. Patt, The University of Texas at Austin
  • Presented by Mark Teper

2
Outline
  • The Problem
  • Related Work
  • The Idea Runahead Execution
  • Details
  • Results
  • Issues

3
Brief Overview
  • Instruction Window
  • Set of in-order instructions that have not yet
    been commited
  • Scheduling Window
  • Set of unexecuted instructions needed to
    selected for execution
  • What can go wrong?

Program Flow
Instruction Window

Scheduling Windows
Execution Units
4
The Problem
Instruction Window

Program Flow
Unexecuted Instruction
Executing Instruction
Long Running Instruction
Commited Instruction
5
Filling the Instruction Window
IPC
6
Related Work
  • Caches
  • Alter size and structure of caches
  • Attempt to reduce unnecessary memory reads
  • Prefetching
  • Attempt to fetch data into nearby cache before
    needed
  • Hardware software techniques
  • Other techniques
  • Waiting instruction buffer (WIB)
  • Long-latency block retirements

7
RunAhead Execution
  • Continue executing instructions during long
    stalls
  • Disregard results once data is available

Instruction Window

Program Flow
Checkpoint
Unexecuted Instruction
Executing Instruction
Long Running Instruction
Commited Instruction
8
Benefits
  • Acts as a high accuracy prefetcher
  • Software prefetchers have less information
  • Hardware prefetchers cant analyze code as well
  • Biase predictors
  • Makes use of cycles that are otherwise wasted

9
Entering RunAhead
  • Processors can enter run-ahead mode at any point
  • L2 Cache Misses used in paper
  • Architecture needs to be able to checkpoint and
    restore register state
  • Including branch-history register and return
    address stack

10
Handling Avoided Read
  • Run Ahead trigger returns immediately
  • Value is marked as INV
  • Processor continues fetching and executing
    instructions

R1
  • ld r1, r2
  • Add r3, r2, r2
  • Add r3, r1, r2
  • move r1, 0

R2
R3
11
Executing Instruction in RunAhead
  • Instructions are fetched and executed as normal
  • Instructions are committed retired out of the
    instruction window in program order
  • If the instructions registers are INV it can be
    retired without executing
  • No data is ever observable outside the CPU

12
Branches during RunAhead
  • Divergence Points Incorrect INV value branch
    prediction

13
Exiting RunAhead
  • Occurs when stalling memory access finally
    returns
  • Checkpointed architecture is restored
  • All instructions in the machine are flushed
  • Processor starts fetching again at instruction
    which caused RunAhead execution
  • Paper presented optimization where fetching
    started slightly before stalled instruction
    returned

14
Biasing Branch Predictors
  • RunAhead can cause branch predictors to be biased
    twice on the same branch
  • Several Alternatives
  • Always train branch predictors
  • Never train branch predictors
  • Create list of predicted branches
  • Create separate Branch Predictor

15
RunAhead Cache
  • RunAhead execution disregards stores
  • Cant produce externally observable results
  • However, this data is needed for communication
  • Solution Run-Ahead cache
  • Loop
  • store r1, r2
  • add r1, r3, r1
  • store r1, r4
  • load r1, r2
  • bne r1, r5, Loop

16
Stores and Loads in Run Ahead
  • Loads
  • If address is INV data is automatically INV
  • Next look in
  • Store buffer
  • RunAhead Cache
  • Finally go to memory
  • In in cache treat as valid
  • If not treat as INV, dont stall
  • Stores
  • Use store-buffer as usual
  • On Commit
  • If address is INV ignore
  • Otherwise write data to RunAhead Cache

17
Run-Ahead Cache Results
  • Found that not passing data from stores to loads
    resulted in poor performance
  • Significant number of INV results

18
Details Architecture
19
Results
20
Results (2)
21
Issues
  • Some wrong assumptions about future machines
  • Future baseline corresponds poorly to modern
    architectures
  • Not a lot of details of architectural requirement
    for this technique
  • Increase architecture size
  • Increase power-requirements
Write a Comment
User Comments (0)
About PowerShow.com