The Potential of TraceLevel Parallelism in Java Programs - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

The Potential of TraceLevel Parallelism in Java Programs

Description:

... a practical upper-bound on parallelism. not an accurate measurement of ... Granularity of parallelism can vary. Traces simplify control flow and analysis ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 26
Provided by: borysb
Category:

less

Transcript and Presenter's Notes

Title: The Potential of TraceLevel Parallelism in Java Programs


1
The Potential of Trace-Level Parallelism in Java
Programs
  • Borys J. Bradel
  • Tarek S. Abdelrahman
  • University of Toronto
  • Principles and Practices of Programming in Java
  • September 7th 2007

2
Motivation
  • Gap exists between hardware and software
  • Hardware
  • Majority of computer chips contain multiple cores
  • Athlon X2, Core 2 Duo, Power5, Cell, Niagara
  • Software
  • Writing parallel software is difficult
  • Bridging the gap may lead to better utilization
    of hardware and therefore improved performance

3
Automatic Parallelization
  • Traditional compile time
  • Perform analysis at compile time
  • Divide program based on analysis
  • Limited success
  • Runtime
  • New approach to automatic parallelization is
    needed
  • Combine analysis with runtime information
  • What information to use?
  • Trace-Based
  • Our solution is to use traces

3
4
How successful can using traces be?
  • We answer this question by simulating trace
    execution
  • monitor a programs execution
  • simulate the execution of traces in parallel
  • Measure a practical upper-bound on parallelism
  • not an accurate measurement of performance

5
Outline
  • Traces
  • Execution Model
  • Simulation Platform
  • Experimental Evaluation
  • Conclusion

6
Trace Definition
  • A trace is a frequently executed sequence of
    unique basic blocks or instructions
  • Identified by a trace collection system at runtime

public static int foo() int a0 for (int
i0iltni) ai return a
7
Benefits
  • Source code is not required
  • Granularity of parallelism can vary
  • Traces simplify control flow and analysis
  • Traces are simple to identify

8
Execution Model
parallel
sequential
CFG
Method
9
Dependence Communication
Method
Dependences limit parallelism

ai
10
Dependence Communication
Different types of communication
Instruction-Instruction
Trace-Trace
i4
i4
ai
Communication Delay
Trace-Instruction
ai
i4
ai
11
Requirements
  • Java Virtual Machine
  • Execute bytecode
  • Interpreted or compiled
  • Trace Collection System
  • monitor control flow
  • create traces

JVM
Code Execution
control flow
TCS
12
Parallel Identification Engine
  • Records memory information
  • Keeps track of dependences
  • Ignore instructions that read and write to the
    same variablee.g. dependence between i and
    itself is ignored
  • Schedules instructions
  • Instruction Window
  • Communication
  • Processor Count

JVM
Code Execution
control flow
instruction info
traces
13
Scheduling
Record trace information when traces execute
sequentially Schedule when instruction window
is full
Schedule
Schedule
14
Schedule around Dependences
4 processors 12 traces per window
  • Dependent traces are scheduled far enough apart
    to have correct execution

15
Speedup
  • Ratio
  • Cycles aggregated all scheduled traces on
    parallel system
  • Cycles over all scheduled traces on one processor
    system
  • Each trace executes sequentially on one processor
  • A cycle represents the write of one memory
    location

ai i
B1
2 cycles
if (iltn) goto B1
B2
16
Experimental Evaluation
  • Jupiter Patrick Doyle
  • RedSpot Borys Bradel
  • Modified Critical-Path Min-You Wu scheduler
  • Benchmarks
  • Java Grande Section 3
  • SPECjvm98

17
Effect of Window Size
18
Effect of Communication Cost
19
Effect of Communication Type
20
Effect of Processor Count
21
Conclusion
  • How successful can using traces be?
  • Built simulator to measure parallel execution of
    traces
  • Traces have the potential to be used to
    parallelize programs
  • Some benchmarks do not scale well
  • Some benchmarks scale very well
  • Most benchmarks have at least 2x speedup on four
    processors
  • Future work create a system that performs
    trace-based parallelization

22
Jupiter and RedSpot
Interpreter emulate a0 emulate i0 emulate goto
B2 call RedSpot emulate if (iltn)
goto B1 call RedSpot emulate ai emulate
i emulate if (iltn) goto B1 call RedSpot
Trace 1
emulate ai emulate i emulate if (iltn) goto B1
call RedSpot
23
Parallel Identification Engine
Interpreter emulate if (iltn) goto B1 call
RedSpot call PIE emulate ai
call PIE emulate
i
call PIE emulate if (iltn) goto B1 call
RedSpot call PIE emulate ai
call PIE emulate
i
call PIE emulate if (iltn) goto B1 call
RedSpot call PIE
call call PIE for each instruction and each
memory access
24
Processor Count
Maximum number of processors limits performance
2 processors
25
Scheduling Window
Can only schedule a limited number of tracesat a
time
4 traces per window
Write a Comment
User Comments (0)
About PowerShow.com