Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS - PowerPoint PPT Presentation

About This Presentation
Title:

Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS

Description:

Assumptions in current methods to find Worst Case Execution Time (WCET) Execution time of an instruction is not fixed - Due to pipeline stalls or cache misses ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 20
Provided by: mossCs
Category:

less

Transcript and Presenter's Notes

Title: Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS


1
Timing Anomalies in Dynamically Scheduled
MicroprocessorsThomas Lundqvist, Per Stenstrom
(RTSS 99)
  • Presented by Kaustubh S. Patil

2
Assumptions in current methods to find Worst Case
Execution Time (WCET)
  • Execution time of an instruction is not fixed
  • - Due to pipeline stalls or cache misses
  • - Input data dependency
  • eg. mulhw, mulhwu, mullw in PowerPC
    architecture
  • In such cases, current methods assume longest
    instruction latency for every instruction
  • - eg. if the outcome of a cache access is
    unknown, a
  • cache miss is assumed.
  • - Intuition-based

3
Claim Making such assumptions for dynamically
scheduled processors is wrong !
  • Dynamically scheduled processors
  • - out-of-program-order instruction execution
  • For such processors, counter-intuitive increase
    or decrease in execution time is possible
  • - eg. a cache miss can actually reduce the
    overall
  • execution time.
  • - These are termed as timing anomalies.

4
Organization of the presentation
  • Description of architectural features that may
    cause anomalies
  • Examples of timing anomalies
  • Handling of such anomalies in previous methods
  • Proposed methods to eliminate such anomalies
  • Case study of a previous method in the context of
    proposed solutions

5
Terms and definitions
  • Formal definition of timing anomaly
  • - Instruction latency same as instruction
    execution time
  • - case 1 latency of first instruction increased
    by i cycles
  • - case 2 it is decreased by d cycles
  • - C be resulting future change in execution time
  • Definition
  • A situation where, in the first case, Cgti or
    Clt0, or in the second case, Clt-d or Cgt0.
  • In-order and out-of-order resources
  • If a processor only contains in-order resources,
    no timing anomalies can occur

6
Architecture used for illustrating
7
Timing anomaly examples
  • A cache-hit results in WCET
  • B is dependent on A
  • In cache-hit case, B gets priority over C
  • In cache-miss case, D E execute 1 cycle earlier
  • The reason for this anomaly
  • - IU is an out-of-order resource

8
Timing anomaly examples (contd)
  • Overall miss penalty can be higher than a single
    cache miss penalty
  • A,B,C have dependencies
  • C always results in a miss
  • C finishes 11 cycles later instead of one miss
    penalty of 8 cycles
  • MCIU allows B and D to execute out-of-order

9
Timing anomaly examples (contd)
  • Unbounded impact on WCET
  • A and B make a loop body
  • Fast case
  • - A executes as soon as dispatched
  • Slow case
  • - A is delayed by one cycle
  • - Old B gets priority over new A
  • - A gets delayed in each iteration
  • - Total penalty k cycles if k iterations

10
Limitations of previous methods
  • Such methods make locally safe decisions, at
    basic block or instruction level.
  • Timing anomalies due to variable latency
    instructions and different pipeline states do not
    allow this.
  • Consider an instruction sequence with n variable
    latency instructions.
  • Each such instruction can have k different
    latencies.
  • Need to examine kn possibly different schedules

11
Methods for eliminating anomalies
  • The pessimistic serial-execution method
  • - All instructions are executed in-order.
  • - All memory references are considered misses.
  • - Which instruction sequence is considered ?
  • - Very pessimistic approach
  • The program modification method
  • - All unknown events and variable latency
    instructions
  • must result in a predictable pipeline state
  • - If a path is selected as a WCET path among a
    set of
  • paths, then the end cache pipeline state
    must be the
  • same.

12
The program modification method(contd)
  • Making pipeline-state predictable
  • Forced in-order resource use is one solution
  • - little processor support
  • Use of sync instruction in PowerPC architecture
  • - to take care of variable latency instructions
  • - also when cache hits are unpredictable
  • sync works for both the previous conditions

13
The program modification method(contd)
  • Making cache state predictable
  • After each path invalidate all cache blocks
  • - poor performance
  • Invalidate only differing cache blocks
  • - poor performance again
  • Preload cache blocks
  • - special instruction support eg. icbt,dcbt in
    PowerPC

14
Case study symbolic execution method
  • Instruction level simulation
  • Extended instruction semantics to take care of
    unknown
  • operands
  • eg. Add A,B,C
  • A ? B C ,if both B and C
    are known
  • A ? unknown , either B or C is
    unknown
  • Elimination of infeasible paths
  • Merging of paths to avoid exponential number of
    paths

15
Changes to this existing method
  • First pass identifies all places where local
    decisions need to be made
  • - eg. merging of paths and variable latency
    instructions
  • Addition of sync and preload instructions at such
    sites
  • Tserial sum of all latencies and misses
  • T Tserial / 2 in the ideal case

16
Benchmarks used
  • PSIM, existing instruction-level simulator was
    extended for symbolic execution and modification
    of program approach
  • The benchmarks used were
  • - matmult Multiplies 2 5050 matrices
  • - bsort Bubblesort of 100 integers
  • - isort Insertsort of 10 integers
  • - fib Calculates nth element of
    Fibonacci sequence for
  • nlt30
  • - DES Encrypts 64 bit data
  • - jfdctint Discrete cosine transform of
    an 88 pixel image
  • - compress Compresses 50 bytes of data

17
Evaluation results
Program Actual WCET Unsafe WCET Ratio Serail WCET Ratio Modified WCET Ratio Modified slowdown
matmult 5283287 5283287 1 10566574 2 6323287 1.20 1.20
bsort 230490 230490 1 460981 2 256854 1.11 1.11
isort 2085 2085 1 4170 2 2325 1.12 1.12
fib 797 797 1 1594 2 797 1 1
DES 186166 186358 1.001 372716 2.002 186358 1.001 1
jfdctint 9409 9409 1 18819 2 9921 1.05 1.05
compress 16846 54583 3.31 109167 6.62 69291 4.20 1.27
18
Summary
  • Timing anomalies in dynamically scheduled
    processors may cause wrong WCET estimation using
    previous methods.
  • Using architecture support to control state of
    the cache and pipeline, it is possible to
    eliminate anomalies and the previous methods can
    be used on such modified programs.

19
Thank you !!!
  • Questions ?
Write a Comment
User Comments (0)
About PowerShow.com