Micro benchmarks - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Micro benchmarks

Description:

Record information about System Configuration. Performance Vs Functionality ... Memory Computers Using Micro-Benchmarks- Rafael H. Saavedra, R. Stockton Gaines, ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 29
Provided by: shr66
Category:

less

Transcript and Presenter's Notes

Title: Micro benchmarks


1
Micro benchmarks
  • By,
  • Shruti Sundaresh

2
Introduction
  • A standard by which something can be measured or
    judged
  • Difficult to compare performance by looking at
    specifications
  • Assessing relative performance
  • Mimic a particular type of workload
  • Used for
  • Engineering
  • Marketing

3
Benchmarks in Computing
  • Comparing the performance of subsystems
  • Measures
  • Performance (speed)
  • System Quality
  • Power Consumption (energy)
  • Maybe used as a monitoring/diagnostic tool
  • Synthetic and Application benchmarks

4
Popular Benchmarks
  • Prior to 2000 SPEC.
  • Transaction Processing Performance Council (TPC)
  • BAPCo Consortium
  • Khornerstone
  • 007

5
Contd...
  • AIM
  • Dhrystone
  • LINPACK
  • Nhfsstone
  • Whetstone

6
Interpretation of results
  • Unrealistically high performance during test
  • Several iterations
  • Single figure of Merit
  • Important Considerations
  • Which Benchmark to use?
  • What configuration was the Benchmark run on?
  • How does the performance of the Benchmark relate
    to my workload?

7
Some Pitfalls
  • Metrics of Performance Chosen
  • Attention to Scaling
  • Document what is being run
  • Record information about System Configuration
  • Performance Vs Functionality
  • More on this later.

8
Types of Workloads
  • Multiprog?Appls?Kernels?Micro benchmarks
  • Multiprogs Appls
  • Realistic
  • Complex
  • Higher Level Interactions
  • Kernels Micro benchmarks
  • Easier to understand
  • Controlled
  • Repeatable

9
Micro benchmarks
  • Measure a particular aspect under controlled
    conditions
  • Measure performance for specific tasks
  • Rendering polygons
  • Reading or writing files
  • Operations on matrices

10
Overheads
  • Cache misses
  • Synchronization
  • Loop Scheduling
  • Handling of thread private data

11
Memory Hierarchy Performance of Cache Coherent
Multiprocessors
  • Measurements on SGI Origin 2000 Sun Ultra
    Enterprise 10000
  • Gap between processor and memory system speeds
  • Memory Contention Issues
  • Measuring memory performance is critical

12
Contd
  • Cache miss detection penalties
  • In-Isolation misses
  • Back to back misses
  • Pipelined misses

13
Pipelined Memory Bandwidth
  • PipelinedBandwidth()
  • StartTime(time)
  • for (i 0 i lt size i stride)
  • j arryi
  • EndTime(time)
  • do_dummy(j)
  • return ( size/time )
  • Multiple memory accesses with independent
    addresses

14
Micro benchmarks for OpenMP directives
  • Time taken sequentially Vs In parallel with
    directive
  • Used for,
  • Comparing different implementations
  • Choosing between directives
  • Allowing estimation of synchronization
    scheduling overheads
  • Results from Sun HPC 6500 SGI Origin 3000
    systems
  • Compiler Sun Workshop f90 versions 6.1 6.2

15
Contd
  • Tp Execution time on p processors
  • Ts Execution time of the serial version
  • Overhead Tp - Ts /p

16
Synchronization BenchmarksDirective Workshare
  • Parallel Version
  • !OMP PARALLEL
  • do j1,innerreps
  • !OMP WORKSHARE
  • a a cos(a) - sin(a)
  • !OMP END WORKSHARE
  • end do
  • !OMP END PARALLEL
  • Serial Version
  • do j1,innerreps
  • a a cos(a) - sin(a)
  • end do

17
Array BenchmarksDirective Private
  • Parallel Version
  • do j1,innerreps
  • !OMP PARALLEL PRIVATE(a)
  • call delay(delaylength,a)
  • !OMP END PARALLEL
  • enddo
  • Serial Version
  • do j1,innerreps
  • call delay(delaylength,a)
  • enddo
  • Array size varied in powers of three

18
Results 1
  • Sun HPC 6500 with Workshop Version 6.1

19
Contd
  • Sun HPC 6500 with Workshop Version 6.2

20
Analysis
  • Linear scaling of overhead for parallel regions
  • Source of overhead unknown
  • Possibly a less efficient barrier

21
Results 2
  • Clauses with array arguments

22
Analysis
  • Large Overheads with poor scaling
  • Possible contention of resources during Private
    and Firstprivate
  • Heap Vs Stack allocation
  • One solution is to use thread aware malloc
    libraries . For ex. libmtmalloc in Fortran

23
Results 3
  • SGI Origin 3000 with MIPSPro Compiler

24
Analysis
  • Different barrier problems
  • Parallel Do issues
  • High Overhead associated with Single directive

25
Results 4
  • Mutual exclusion directives

26
Pitfalls Contd
  • Cache Effects
  • Write Buffers

27
Conclusions
  • Relevant only if measured behavior occurs in
    reality
  • Requires understanding subtleties in architecture
  • Gap between memory and CPU speed
  • Wide array of benchmarks to choose from

28
References
  • A Microbenchmark Suite for OpenMP 2.0-J. Mark
    Bull and Darragh ONeil
  • Using Benchmarks to Evaluate System Performance
    Brian N. Bershad, Richard P. Draves Alessandro
    Forin
  • Lock Prefetching in Distributed Virtual Shared
    Memory Systems -Magnus Karlsson and Per Stenström
  • Characterizing the Performance Space of Shared
  • Memory Computers Using Micro-Benchmarks- Rafael
    H. Saavedra, R. Stockton Gaines,
  • and Micheal J. Carlton
  • OpenMP Microbenchmarks Version 2.0- Fiona J. L.
    Reid and J. Mark Bull
Write a Comment
User Comments (0)
About PowerShow.com