TurboROB A Low Cost Checkpoint/Restore Accelerator - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

TurboROB A Low Cost Checkpoint/Restore Accelerator

Description:

Selective TROB as an ROB accelerator. Even the smallest TROB accelerates recovery. 18 /25 ... Low Cost Checkpoint/Restore Accelerator. Patrick Akl and Andreas ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: davorca
Category:

less

Transcript and Presenter's Notes

Title: TurboROB A Low Cost Checkpoint/Restore Accelerator


1
TurboROBA Low Cost Checkpoint/Restore Accelerator
Patrick Akl and Andreas Moshovos AENAO Research
Group Department of Electrical and Computer
Engineering University of Toronto pakl,
moshovos_at_eecg.toronto.edu
2
Recovering From Control Flow Mispredictions
  • Execution Timeline

Predict a Branch Outcome
Misprediction Discovered
Recover Processor State Redirect Fetch
Correct Path
Predicted Path
Resume Execution
  • Accelerate Recovery Improve Performance

3
State-of-the-Art Recovery
Log of Changes
State Snapshot
Predict a Branch Outcome
ROB
Misprediction Discovered
what
old value
  • Scalability and/or Performance Issues

4
Turbo-ROB
Log of Changes
Predict a Branch Outcome
ROB
Misprediction Discovered
Partial Log of Changes
  • Make common case fast
  • Recover only at branches
  • Store only as much as needed
  • Partial Log

5
Outline
  • Control Flow Mispeculation Recovery
  • TurboROB
  • Methodology and Results
  • Summary

6
State Recovery Example Register Alias Table
Lg( arch. regs)
Original Code
RAT
A add r1, r2, 100 B breq r1, E C sub r1, r2, r2
p1
p4
p5
p5
p4
Architectural Register
p2
p3
arch. regs
Renamed Code
A add p4, p2, 100 B breq p4, E C sub r5, p2, p2
Physical Register
7
ROB Slow, Fine-Grain Recovery
  • Each entry contains
  • Architectural destination register
  • Its previous RAT map

Program Order
3. Undo RAT updates in reverse order
Reorder Buffer
  • Misprediction discovered
  • 2. Locate newest instruction

INVALID
RAT
  • Too slow recovery latency proportional to number
    of instructions to squash

8
Global Checkpoints Fast, Coarse-Grain Recovery
Program Order
checkpoint
checkpoint
checkpoint
checkpoint
Reorder Buffer
  • Misprediction discovered

INVALID
RAT
  • Branch w/ GC Recovery is Instantaneous

9
Impact of More Checkpoints
Concept
Actual Implementation
architectural register
physical register
  • More checkpoints ?
  • Power hungry structure
  • Increased delay
  • Only a few checkpoints can practically be
    implemented
  • Cannot always cover all branches

10
Intelligent Checkpointing BranchTap
checkpoint
checkpoint
checkpoint
checkpoint
  • Use Few Checkpoints Effectively
  • BranchTap
  • Throttle Speculation

11
Conventional Mechanisms Recovery Scenarios
B
B
B
checkpoint
B
B
B
checkpoint
Re-Execution
B
B
B
checkpoint
12
Outline
  • Background
  • Turbo-ROB
  • Methodology and Results
  • Summary

13
Turbo-ROB
Recovery Cost
B
R2
R1
R1
R1
R2
ROB Recovery
useful
redundant
We only need to reverse the first subsequent
change for every RAT entry
14
Turbo-ROB Replacing the ROB
B
B
B
TROB
Re-Execution
B
B
B
TROB
15
Selective Turbo-ROB w/ ROB
B
B
B
TROB
Selective Turbo-ROB w/ GCs
B
B
B
TROB
checkpoint
16
Outline
  • Background
  • TurboROB
  • Methodology and Results
  • Summary

17
Results Overview
  • TROB as an ROB replacement
  • BranchTap offers better performance than ROB
  • Fewer resources
  • Even for smaller windows
  • Selective TROB as a GC reduction mechanism
  • TROB reduces pressure for GCs
  • Offload a critical structure RAT
  • In the paper
  • Selective TROB as an ROB accelerator
  • Even the smallest TROB accelerates recovery

18
Methodology
  • Simulator based on Simplescalar
  • Alpha/OSF
  • 24 SPEC CPU 2000 benchmarks
  • Reference Inputs
  • Processor configurations
  • 4-way OoO core
  • 128/256/512 in-flight instructions
  • 1K-entry confidence table for low confidence
    branch identification / similar results with
    Anyweak
  • 1B committed instructions after skipping 2B

19
Perfect Checkpointing Configuration
  • A checkpoint is auto-magically taken at all
    mispredicted branches
  • All recoveries are fast
  • We report the deterioration relative to perfect
    checkpointing

20
TROB Replacing the ROB/512-Entry Window
  • 64-entry TROB ROB on the Average
  • Pathological cases exist ? 256-entry needed
  • 512-Entry TROB better than ROB

21
TROB Replacing the ROB/128-Entry Window
  • 64-Entry 50 better than ROB
  • Fewer pathological cases
  • 128-Entry TROB better than ROB

22
sTROB and Global Checkpoints/128-Entry Window
  • TROB 1 GC better than 4GCs

23
Summary
  • TROB vs. ROB
  • Replacement
  • Same resources ? better performance
  • Fewer resources ? often better performance
  • Except when accuracy is high
  • Acceleration
  • ¼ resources ? 35 improvement
  • TROB vs. GCs
  • Reduce pressure from the critical path
  • With just 1 GC match the performance of four GCs
  • One more alternative for designers
  • Allows different area/performance/power tradeoffs

24
TurboROBA Low Cost Checkpoint/Restore Accelerator
Patrick Akl and Andreas Moshovos AENAO Research
Group Department of Electrical and Computer
Engineering University of Toronto pakl,
moshovos_at_eecg.toronto.edu
25
TROB Replacing the ROB/512-Entry Window
  • 64-entry TROB ROB on the Average
  • Pathological cases exist ? 256-entry needed
  • 512-Entry TROB better than ROB

26
TROB Replacing the ROB/128-Entry Window
  • 64-Entry 50 better than ROB
  • Fewer pathological cases
  • 128-Entry TROB better than ROB
Write a Comment
User Comments (0)
About PowerShow.com