Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System

Description:

Air Force Research Laboratory (AFRL) Center for Information Technology Research (CITR) ... K-means clustering; used for data organization and analysis ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 42
Provided by: melissac153
Category:

less

Transcript and Presenter's Notes

Title: Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System


1
Analytical Modeling of High Performance
Reconfigurable ComputersPrediction and Analysis
of System Performance
  • A Dissertation Proposal forthe Doctor of
    Philosophy Degree in Electrical Engineering
  • Melissa C. Smith
  • March 6, 2002

Research Partially Supported by Air Force
Research Laboratory (AFRL) Center for Information
Technology Research (CITR)
2
Outline
  • Introduction, Background, Related Work
  • Model Methodology Development
  • Model Validation
  • Model Applications
  • Status and Remaining Work

3
HPC, RC, HPRC
  • High Performance Computing (HPC) Advanced
    architectures (vector supercomputers, MPPs, NOWs,
    etc.) designed to work collectively on a common
    problem. My focus narrowed to distributed
    memory, MIMD class machines.
  • Reconfigurable Computing (RC) Integration of
    reconfigurable logic with processor to achieve
    hardware-like performance with software-like
    flexibility
  • High Performance Reconfigurable Computing (HPRC)
    Marriage of HPC and RC elements

4
HPRC Introduction
  • Independently, HPC RC demonstrate performance
    advantages for many applications
  • Individually, HPC RC are challenging to program
    and utilize

5
Problem Statement
HPRC performance analysis must address new issues
(compared to traditional HPC)
Rich design space of HPRC yields potentially
complex performance analysis
  • Proposed research will bridge analysis gap
    between HPC RC domains by developing
  • Analytical model for characterizing RC system
    performance
  • Analytical model for HPRC platform

6
Modeling Framework
What is it? What can it do?
  • Performance analysis design tradeoffs of
    architecture
  • Equations to estimate/predict performance
  • Optimization cost functions
  • Potential performance metrics design tradeoffs
    include

7
Modeling Techniques
  • What are they?
  • Simulation
  • Behavior model driven with abstracted workload or
    trace data
  • Also used to validate other models
  • Measurement
  • HW/SW monitors
  • Often used to calibrate/validate other models
  • Probes often perturb behavior
  • Analytical Modeling
  • Mathematical model
  • Can be difficult to solve
  • Queuing, Petri Nets, Markov
  • Why analytical?
  • Even simple models can yield accurate results
  • Useful for trend analysis
  • Even intractable models can provide valuable
    insight

8
HPC Background Related Work
  • Architecture to a large extent determines system
    performance (memory, processing nodes,
    interconnection network)
  • My focus on multiprocessor MIMD architectures
    with distributed memory systems (examples MPPs,
    grid computing, and Beowulf clusters)
  • HPC Performance Analysis Many studies exist in
    literature
  • Common metrics are Speedup Efficiency
  • Peterson, Atallah, Noble, Others

9
RC Background Related Work
  • ASIC-like performance with software-like
    flexibility
  • Example systems
  • Annapolis (Wildforce/Firebird)
  • Pilchard (configuration offline)
  • ISI (SLAAC)
  • Nallatech
  • Virtual Computer
  • PipeRench
  • Many others .
  • RC Performance related literature limited
  • Are Speedup Efficiency definitions same as HPC?
    What does efficiency mean for RC?
  • Embedded users often concerned with power, area,
    cost, etc.

10
HPRC Background Related Work
  • Cluster of Processing Nodes with RC units
    connected by an Interconnection Network
  • Architecture options include
  • RC coupling to Processor
  • Number of RC units
  • Size of FPGAs
  • Number of Nodes
  • Network Bandwidth
  • Configuration Latency
  • Dedicated FPGA network
  • Memory for FPGAs
  • New fertile territory
  • Combine HPC RC metrics or do we need new ones?
  • Plan is for an analytical modeling approach
  • CHAMPION and NetSolve tools expanded with model
    for HPRC use

11
Modeling Methodology
  • Phase 1 Isolate HPC issues (P-to-P
    communication, ntwk setup, synchronization,
    serial overhead, load imbalance)
  • Phase 2 Isolate RC issues (FPGA config/setup,
    data distribution, HW/SW compute time load
    imbalance)
  • Phase 3 Combine HPC RC models (load balance
    studies overall accuracy)
  • Iterate for accuracy model generalizations

12
HPC Studies
  • Goal Isolate communication, synchronization,
    relative workstation speed/performance issues
  • Initial measurements conducted on UT ECE vlsi
    cluster

13
RC Model (1)
  • Single node running a synchronous iterative
    algorithm
  • Goal Model interaction between processor RC
    unit
  • HW/SW trees can be arbitrarily complex (simple
    here)

14
RC Model (2)
  • Runtime for given iteration equal to time for
    last task to complete (HW or SW) plus total
    overhead
  • Model each as random values assume
  • Each iteration requires roughly same amount of
    computation
  • Random variables are Independent Identically
    Distributed (iid)

15
RC Model (3)
  • Rewrite HW/SW tasks in terms of total work
    (assume HW SW tasks take same time tavg_task)
  • Account for HW/SW load imbalance
  • Combine rewrite

16
RC Model (4)
  • SW-only runtime on single processor (HW
    acceleration factor s)
  • RC runtime

17
HPRC Model (1)
  • Limit study to synchronous iterative algorithms
    (focus on communication synchronization)
  • Begin with dedicated homogeneous system (i.e. no
    background load)

18
HPRC Model (2)
  • Runtime for given iteration equal to time for
    last task to complete plus total overhead
  • Model each as random value assume
  • Each iteration requires roughly same amount of
    computation
  • Random variables are iid

19
HPRC Model (3)
  • Rewrite tasks in terms of total work
  • Account for application and RC load imbalance
  • Combine rewrite

20
HPRC Model (4)
  • SW-only runtime on single processor (HW
    acceleration sk)
  • Parallel RC runtime

21
Validation Method
AFRL cluster of Pentium/Firebird nodes
UT cluster of Pentium/Pilchard nodes
  • CHAMPION Hi-Pass START Demos
  • Relatively simple algorithms allowing isolation
    of RC interfaces issues
  • Already implemented on Wildforce allowing focus
    to be on the RC system, model, measurements
    rather than debugging
  • Will need port to Pilchard Firebird
    architectures
  • Elementary parallel application to study data
    distribution and synchronization/communication
    parameters

Champion
k-means
Holography
  • Classification Algorithm
  • K-means clustering used for data organization
    and analysis
  • Hardware implementation exists with fixed
    precision Manhattan distance calculation is
    adaptable to our platforms
  • Permits study of load balance issues due to large
    amount of data
  • Holography Reconstruction Algorithm
  • Uses FFT in the digital reconstruction of
    off-axis holograms
  • Only exits in software need C/VHDL version of
    2-D FFT
  • Permits study validation of complete model

22
Validation Measurements
  • Possible Sources of Errors
  • Parameter measurement errors due to probe effects
  • Model assumptions or methods
  • Representation of total work
  • Representation of load balance
  • Need more data
  • Un-modeled effects
  • Caching
  • Packet size optimization
  • Other API optimization techiques
  • 1st pass model validation using Wildforce Board
    on microsys8 during minimal background load
    conditions

23
Model Applications
  • Performance characterization of RC systems
  • Performance evaluation for HPRC platform
  • Tool for constructing optimizing cost functions
    (i.e. power, size, cost, etc.)
  • Building block for other CAD tools (i.e. task
    scheduling, load balancing, CHAMPION NetSolve)
  • SoC design performance analysis

24
Status Remaining Work
  • Three papers published others submitted
    planned
  • Phase 1
  • Initial communication measurements completed
  • Other work ongoing should provide more input
  • Phase 2
  • Sample application used to gather
    characterization data
  • CHAMPION demos used for validation
  • Need port to Firebird Pilchard for more
    measurement results
  • Phase 3
  • 1st pass of HPRC model mathematically formulated
  • Need access to hardware to begin parameter
    measurements, demo development model validation
    measurements
  • Demo Development (assistance from other students)
  • CHAMPION need port to Firebird/Pilchard
    parallel versions
  • K-means need implementation on our HW
    parallel version
  • Holography need C/VHDL for 2-D FFT, HW
    implementation, parallel version

25
Model Status
26
Remaining Work
  • Develop prediction model for load balance factors
    building on previous work by Peterson
  • Plans to generalize model for heterogeneous
    processing sets
  • Review revise as necessary speedup efficiency
    definitions for RC HPRC systems
  • Complete validation with demos

27
Beyond Scope
  • FPGA reconfiguration latency studies
  • RC dedicated configurable network
  • Automated Task scheduling Load balancing
  • Integration with CHAMPION NetSolve

28
Extra Slides
29
Development Environment
  • AFRL Heterogeneous HPC cluster of four
    Pentium nodes populated with Firebird boards
  • UT platform cluster of eight Pentium nodes
    populated with Pilchard boards
  • Currently no specific CAD tools available for HPRC

30
RC Plots
  • Fixed work vary number of tasks load balance
    factor
  • Fixed work vary configuration time load
    balance factor

31
HPRC Plots
  • 1 RC unit/node 6, 11, 16 nodes
  • 2 RC units/node 6, 11, 16 nodes

32
Conferences Journals
  • Published
  • Smith, M. C., Drager, S. L., Pochet, Lt. L., and
    Peterson, G. D., High Performance Reconfigurable
    Computing Systems, Proceedings of 2001 IEEE
    Midwest Symposium on Circuits and Systems, 2001.
  • Smith, M. C. and Peterson, G. D., Programming
    High Performance Reconfigurable Computers (HPRC),
    SPIE International Symposium ITCom 2001,
    8-19-2001, Denver, CO.
  • Peterson, G. D. and Smith, M. C., Programming
    High Performance Reconfigurable Computers, SSGRR
    2001, Rome, Italy.
  • Submitted
  • Smith, M. C. and Peterson, G. D., Analytical
    Modeling for High Performance Reconfigurable
    Computers, The 2002 International Symposium on
    Performance Evaluation of computer and
    Telecommunication Systems, SPECTS2002, July
    14-19, 2002, San Diego, CA.
  • Planned
  • Journal on Parallel and Distributed Computing
    Practices Algorithms Systems and Tools for High
    Performance Computing on Heterogeneous Networks,
    Submission Due April 30, 2002
  • IEEE Design and Test Special Issue
    Platform-Based Design of System-on-Chip,
    Submission Due May 1, 2002 for November-December
    2002 Publication.

33
High Pass Demo
  • 3x3 high pass filter
  • Output pixel value depends on input pixel 8
    neighbor pixels
  • For hardware implementation, a mask of -1/8 is
    used

34
START ATR Demo (1)
35
START ATR Demo (2)
36
START ATR Demo (3)
37
K-means Demo
  • Iteration over fixed number of clusters
  • Maintain class center at mean position of member
    samples
  • Computations
  • Distance between samples center
  • Recalculation of mean
  • Simplifications needed for hardware implementation
  • Initialize
  • Loop until termination condition is met
  • For each sample, assign that sample to a class
    such that the distance from the sample to the
    center of that class is minimized
  • For each class, recalculate the means of the
    class based on the samples in that class
  • End loop

38
K-means Demo Hardware
  • Hardware implementation by Lesser et.al.
  • Hardware
  • Assign pixels to clusters
  • Accumulation step of cluster center calculation
  • Software
  • Compute cluster center

39
Holography Demo (1)
  • Holography image reconstruction signal processing
    block diagram
  • Holography image can be represented by

40
Holography Demo (2)
  • FFT result autocorrelation sidebands
  • Isolate information which is in the sideband by
    centering one of the sidebands and filtering
  • Final step, Inverse FFT

41
Gnatt Chart
Write a Comment
User Comments (0)
About PowerShow.com