Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture

Description:

Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture ... Platform Xbox 360. 3 Cores. Graphics Engine ... Xbox 360 with Ubuntu Linux ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 12
Provided by: howar1
Learn more at: http://www.urop.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture


1
Parallelization of the Telemedicine Benchmark for
the Xbox 360 Architecture
  • Howard Wong, SURF-IT Fellow
  • Professor Jean-Luc Gaudiot, EECS
  • August 29, 2008

PASCAL PArallel Systems and Computer
Architecture Lab.
University of California, Irvine
2
Outline
  • Background (Benchmark, Platform)
  • Current Work
  • Methodology (Compiler, Data Set)
  • Results
  • Conclusions
  • Future Work

3
Background
Work
  • Why Parallel Programming?
  • Advent of everyday multicomputers
  • Ultimate goal Auto-parallelization
  • Basic concepts
  • Problems
  • Programming primitives
  • Telemedicine Benchmark
  • Platform Xbox 360
  • 3 Cores
  • Graphics Engine
  • Vector Processing

?
Core 1
Core 2
Core n
4
Current Work
  • Goal Identify the parallelization process
  • Efficiency measured in performance
  • Performance in relation to load
  • POSIX threads (pthreads) and OpenMP
  • Sorting Routines
  • 'fallbackSort'
  • Making search 'brackets'
  • 'mainSort'
  • Dependencies between loop iterations

5
Methodology
  • Compilation
  • gcc or g version 4.2
  • Data Sets
  • Monkey brain image in PPM format
  • Derived data via netpbm
  • Test Platform
  • Xbox 360 with Ubuntu Linux

Images courtesy of Neuroscience Center, UC Davis,
and Joerg Meyer, Center of GRAVITY, Calit2, UC
Irvine.
6
Initial Results
7
Analysis
  • Possible thread contention
  • 'bitmap' of data as former optimization
  • Optimized for long runs of 0's or 1's
  • Extra mutex locks required
  • Thread Creation
  • Sorting algorithm called at least 300 times for
    the large image
  • Thread creation efficiency
  • Thread management structures

8
Results (Contd)?
9
Conclusions Discussion
  • Speedup dependent on the load size
  • Possible improvements
  • Use a 'threadpool'
  • Create other important compression functions
  • Examine alternative algorithms with a parallel
    mindset
  • End result
  • Thread creation
  • Thread management overhead
  • Heavy contention

10
Questions for Future Work
  • What is the impact of thread creation?
  • Do the other TMB programs have the same features?
  • Can vector instructions improve program
    performance?
  • Are new, more efficient parallel programming
    primitives needed for our application?

11
Acknowledgments
  • Professor Jean-Luc Gaudiot and the PASCAL group
  • UC Davis Neuroscience Center
  • Professor Joerg Meyer, Center of GRAVITY, Calit2
  • Calit2
  • UROP
Write a Comment
User Comments (0)
About PowerShow.com