Research Overview SC08 - PowerPoint PPT Presentation

1 / 101
About This Presentation
Title:

Research Overview SC08

Description:

Research Overview SC08 – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 102
Provided by: compile
Category:
Tags: dui | k9 | overview | pu | research | sc08

less

Transcript and Presenter's Notes

Title: Research Overview SC08


1
Research Overview SC08
  • Guang R. Gao
  • ACM Fellow and IEEE Fellow
  • Endowed Distinguished Professor University of
    Delaware
  • ggao_at_capsl.udel.edu

2
(No Transcript)
3
Selected Research Topics
  • Open64
  • Cyclops-64 (C64) Applications
  • Sorting
  • LU Decomposition
  • Ray Tracing
  • Mstack Benchmark
  • SWSWEEP3D
  • Cyclops-64 Architechure
  • FlashNode I/O
  • OpenOpell
  • The following slides will provide a brief look at
    these topics by providing
  • A selected slide
  • Members of the team associated with this project

4
OPEN64 SELECTED SLIDE
5
This slide describes the history of Open64 and
how it came to University of Delaware
6
Open64 Team
Ge Gan
Jean Christophe Beyler
Juergen Ributzka
Tom St. John
Handong Ye
7
C64 APP1 SORTINGSELECTED SLIDE
8
This slide describes our sorting algorithm which
is optimized for the memory hierarchy of the
Cyclops-64
9
C64 App1 Sorting Team
Jean Christophe Beyler
Kelly Livingston
Joseph Manzano
10
C64 APP2 LU DECOMPOSITION SELECTED SLIDE
11
This slide shows the Performance of the LU
Decomposition on the C64 after each optimization
was added.
12
C64 App2 LU Team
Daniel Orozco
Ioannis Vennetis
Juergen Ributzka
13
C64 APP3 RAY TRACINGSELECTED SLIDE
14
This slide describes an optimization in Ray
Tracing that will be used in our Cyclops-64
implementation of Ray Tracing
15
C64 App3 Ray Tracing Team
Kelly Livingston
16
C64 APP4 MSTACK BENCHMARKSELECTED SLIDE
17
This slide shows the memory access pattern of the
Mstack benchmark across the 3 dimensional input
array
18
C64 App4 Mstack Team
Ioannis Vennetis
Joseph Manzano
Mark Pellegrini
Ryan Taylor
Tom St. John
19
C64 APP5 SWSWEEP3DSELECTED SLIDE
20
Sweep3D is an application with wavefront type
dependencies. So far, existing architectures have
failed to exploit fine grain parallelism. Cyclops
64 has adequate support for this kind of
parallelism, and it is likely to achieve good
scalability, even at a fine grain level.
21
C64 App5 SWSWEEP3D Team
Daniel Orozco
22
C64 ARCHITECTURE FLASHNODE I/OSELECTED SLIDE
23
The following slide shows the storage hierarchy
of the C64 with the additions of Memory Mapped
Flash Memory and File Cache Flash Memory
24
FlashNode I/O Team
Yuhei Hayashi
Brian Lucas
Dimitrij Krepis
25
OPENOPELL SELECTED SLIDE
26
The following slide shows how the OpenOpell
toolchain generates code for the Cell Broadband
Engines PPU and SPU processors from a single
source code.
27
OpenOpell Team
Joseph Manzano
Ge Gan
Ziang Hu
Yi Jiang
28
CAPSL Alumni
Seattle
Edmonton
Portland
CAPSL (1996-2007)
Boston
New York
Philadelphia
San Francisco
Washington
Los Angeles
Phoenix
Oversea H. Sakane (Japan) R. Yakay (Turkey) I.
Dogru I. Vennetis (Greece)
Xin Wang (China) Yan. Xie (China)
29
DAPLDS
  • Dynamically Adaptive Protein Ligand Docking

30
Overview
  • The DAPLDS project aims to build a
    computational environment to assist scientists in
    understanding the atomic details of
    protein-ligand interactions

Global Computing Lab 2008
31
Objectives (I)
  • Explore the multi-scale nature of dynamic
    protocol model adaptations for protein-ligand
    docking

Global Computing Lab 2008
32
Objectives (II)
  • Develop methods and models that efficiently
    accommodate computational adaptations in VC
    environments supported by BOINC
  • Extend knowledge with respect to protein-ligand
    complexes and make this knowledge accessible to
    the scientific community via cyber-infrastructures

Global Computing Lab 2008
33
Protein-Ligand Docking
  • Computational methods simulate the protein-ligand
    interaction
  • The resulting structural information can be used
    for the design of new drugs


?
protein
ligand
protein ligand
Global Computing Lab 2008
34
Docking Algorithm
MD Simulated Annealing Conformational Search
Dock onto 3D Grid Protein Model
Restore All-Atom Protein Model Minimize
Energy Sort Conformations by Energy
Model Initial 3D Conformation
Generate Random Rotations
Generate Random Conformations
Global Computing Lab 2008
35
Docking_at_Home
  • High-throughput, protein-ligand docking
    simulations are performed on a computational
    environment that deploys a large number of
    volunteer computers connected to the Internet

Global Computing Lab 2008
36
Docking_at_Home
Global Computing Lab 2008
37
Result Post-Processing
  • Protein-ligand docking complexes are scored based
    on energy values.
  • Estimation of energy values is inaccurate because
    of modelling assumptions
  • A structure with a minimum energy is not always a
    native-like structure

?
Protein docked in the nature
Global Computing Lab 2008
38
Result Post-Processing
Thousands of results provided by Docking_at_Home
Hypothesis If protein-ligand docking is
simulated using a sufficiently accurate model, a
large number of independent simulations can
eventually converge to a native-like structure
Global Computing Lab 2008
39
Result Post-Processing
Clustering thousands of results provided by
Docking_at_Home
  • Adaptive k-means clustering is used to group
    similar ligand conformations
  • If the simulations converge, then the largest
    cluster with minimum energy is also the most
    likely to contain more native-like structures
  • The centroid of the biggest cluster is selected
    as a probable native-like structure

Global Computing Lab 2008
40
Result Post-Processing
The adaptive clustering is a promising method to
aid in the selection of native-like ligand
conformations from a significantly large set of
candidates
Protein-ligand conformations
Selected cluster
Centroid RMSD0.102
2D representation of protein-ligand conformations
(Energy, RMSD)
Global Computing Lab 2008
41
Docking_at_Home Screensaver
42
(No Transcript)
43
Acknowledgements
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Obaidur
Rahaman Kevin Kreiser
Sponsors
DAPLDS Collaborators Sandeep Patel (UD) David
Anderson (UC Berkeley) Kevin Reed (World
Community Grid, IBM) Charles L. Brooks III and
Roger Armen (U Mich) Pat Teller (UTEP)
Group Webpage http//gcl.cis.udel.edu
Global Computing Lab 2008
44
RNAVLab
Case StudyUsing Genetic Algorithms to Generate
Training Data Abel Licon, Reed Martz and Michela
Taufer
44
Global Computing Lab 2008
45
RNAVLab
  • A collection of tools written in Java for
  • Prediction
  • Sampling
  • Analysis
  • Provides high-level interface to parallel
    resources

45
Global Computing Lab 2008
46
RNAVLab Framework
46
Global Computing Lab 2008
47
RNAVLab Framework
  • Provide users with a Web interface
  • Supply services with the RNAVLab backend
  • Parallel resources
  • Sequence alignment
  • Structure comparison
  • Pseudoknot classification

47
Global Computing Lab 2008
48
RNAVLab
48
Global Computing Lab 2008
49
RNAVLab
49
Global Computing Lab 2008
50
Web-Page Interface
50
Global Computing Lab 2008
51
Web-Page Interface
8/17/87 3/15/07 5/6/08 11/19/08 11/28/08 3/29/09 5
/3/09 1/15/10 12/24/11 7/5/12 12/27/12
HIV Type 1 beet soil-borne virus tobacco mild
green mosaic virus foot-and-mouth disease virus,
serotype C Visna-Maedi virus Bacillus
subtilis Escherichia coli human coronavirus
229E SARS coronavirus cucurbit aphid-borne
yellows virus oilseed rape mosaic
virus E.Coli Nemesia ring necrosis virus pepper
mild mottle virus
Global Computing Lab 2008
51
Global Computing Lab 2008
52
Web Service Backend
52
Global Computing Lab 2008
53
Web Service Backend
  • Provide RNAVLab services via the REST protocol
  • Enable Java clients and otherapplications to use
    the Web service

53
Global Computing Lab 2008
54
Case Study
  • Predict very long RNA secondary structures
  • Attempt to build large structures from small
    sub-structures
  • Challenges
  • Search space is huge
  • Possible combinations are 2(n2)
  • Searching entire spaces unfeasible

54
Global Computing Lab 2008
55
Genetic Sampling
  • Use a genetic algorithm to search the space of
    possible sub-structures
  • Submit predictions to Condor Grid via RNAVLab
    interface
  • Use generated training data to train a classifier

55
Global Computing Lab 2008
56
GA Evolution
Global Computing Lab 2008
57
Future Work
  • Use training data to train classifier
  • Introduce more prediction algorithms
  • MFE based
  • Alignment based
  • Use trained predictor on unknown set and quantify
    results

Global Computing Lab 2008
58
Acknowledgments
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Kevin Kreiser
Obaidur Rahaman
Sponsors
RNAVlab Collaborators Ming-Ying Leung, Kyle L.
Johnson, David Mireles, Roberto Araiza, and
Olac Fuentes (UTEP)? Thamar Solorio (UT Dallas)?
Group Webpage http//gcl.cis.udel.edu
58
Global Computing Lab 2008
59
MD on GPUs
Molecular Dynamics Simulations on Graphics
Processing Units Joe Davis, Adnan Ozsoy, Sandeep
Patel, and Michela Taufer
60
Introduction
  • Graphics Processing Units (GPUs) have been
    extensively used in graphics intensive
    applications
  • Development driven by economy, e.g. video game
    industry, motion picture
  • The inherent parallelization of GPUs makes them
    suitable for scientific applications
  • Recent exploration of potential of GPUs for
    mathematics and scientific computing
  • Medical diagnostics
  • GPUs coupled to MRI Hardware (Stone et al. Proc.
    of 2007 Computing Frontiers conference, 7-9 May,
    2008)
  • Molecular modeling
  • Electrostatic Potential Calculation (Stone et al.
    J. Comp. Chem. 28, 16, pp. 2618-2640)
  • Ion Placement (Stone et al. J. Comp. Chem. 28,
    16, pp. 2618-2640)
  • Van der Waals Fluids / Polymers (Anderson et al.
    J. Comput. Physics 2008)

60
Global Computing Lab 2008
61
GPGPUs
  • Special purpose hardware specific types of
    calculations
  • Protein Explorer systems and its LSI 'MDGRAPE-3
    chip (Taiji et al. in Proc. of 2003 ACM/IEEE
    Supercomputing Conference,?15-21 Nov. 2003)
  • Anton and its 12 identical MD-specific ASICs
    (Shaw et al. in Proc. of the 34th Annual
    International Symposium on Computer Architecture,
    9-13 June, 2007)
  • General Purpose GPUs (or GPGPUs) cost effective
    and readily available in recent workstations
  • GeForce FX5600
  • 1.5GBytes memory
  • Cost 2,795
  • GeForce 9800 GX2
  • Dual GPU-based graphics card
  • 512MBytes memory per GPU
  • Cost 665

61
62
Programming GPUs
  • Past APIs originally through graphics interfaces
    e.g., OpenGL
  • Not easy to use for general usage cast
    computation in terms of graphics operations
  • Draw the calculation
  • Interpret image post-calculation
  • Present NVDIA CUDA (Compute Unified Device
    Architecture) language/library
  • Easy to use CUDA provides minimal set of
    extensions necessary to expose power of GPGPUs
  • Includes C-compiler and development tools
  • CUDA optimization strategy
  • Maximize independent parallelism
  • Maximize arithmetic intensive computation
  • Take advantage of on-chip per-block shared memory
  • Do computation on the GPUs and avoid data transfer

From CUDA Programming Guide, NVIDIA
62
Global Computing Lab 2008
63
MD on GPUs
  • Why MD on GPU?
  • Non-bond expand scales of time and physical
    dimension (system complexity)
  • All-atom resolution (micro to milliseconds)
  • Course-graining (seconds)
  • Continuum physics with molecular detail?
  • MD on GPU Non-bond interactions (pair
    interactions)
  • Non-bond list is generated by checking all pair
    distances against the cut-off in parallel
    (efficient tiling approach)
  • A thread iterates through the non-bond list for a
    single atom and accumulates the non-bonded
    interactions

63
Global Computing Lab 2008
64
Water Model
  • Flexible Water SPC/Fw (Wu et al, J. Chem. Phys.,
    2006)
  • Intra-molecular potential
  • Computed on GPU using lists (bond/angle lists)
  • Non-bonded potential
  • Lennard-Jones
  • Shifted-force electrostatics with cut-off only
    (no Ewald)
  • List-based evaluation
  • Computing system
  • GPU NVIDIA Quadro FX 5600
  • CPU (CHARMM) Intel Xeon 5150 2.66 GHz (Woodcrest)

64
Global Computing Lab 2008
65
Performance
  • Performance metrics number of MD time steps
    calculated in one second

GPU is 7x faster on average!
65
Global Computing Lab 2008
66
Accuracy
67
Conclusions
  • Current achievements
  • Implementation of a local version of MD code on
    current generation of GPUs
  • Straightforward, naive implementation
  • Promising results
  • Work in progress
  • Optimization and tuning of performance
  • Expand MD options (additional potentials, PME)
  • Final goals
  • Effective compilation of CHARMM on GPU
  • Study of large solvent systems for
    long simulation times, up to 100ns, with CHARMM

67
Global Computing Lab 2008
68
GCL at UD
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Obaidur
Rahaman Kevin Kreiser Michela Taufer
Sponsors
GPU_at_GCL Collaborators Sandeep Patel (UD) Charles
L. Brooks III and Roger Armen (U Mich)
Group Webpage http//gcl.cis.udel.edu
68
Global Computing Lab 2008
69
jTopaz
Plug your PC into the Grid using Mozilla Patrick
McClory Martin Swany Michela Taufer
69
Global Computing Lab 2008
70
What is GridFTP
  • Extension of the standard File Transfer Protocol
    (FTP)
  • Designed with three main principles in mind
  • Security
  • Reliability
  • High Performance

Global Computing Lab 2008
71
Current Software
  • globus-url-copy - script provided by Globus
    Toolkit
  • can only transfer one file at a time
  • require reauthenticating/reauthorizing for each
    transfer
  • UberFTP interactive client
  • Both require having the Globus Toolkit libraries
    installed on the users machine

Global Computing Lab 2008
72
The Challenge
  • Although GridFTP has numerous advanced features,
    there is a lack of easy to use client software
    for end users to take advantage of the Grid.

72
Global Computing Lab 2008
73
jTopaz
  • Our GridFTP client software addresses this
    challenge by providing a simple, easy to use
    interface to GridFTP servers
  • jTopaz is packaged as a Firefox extension
  • jTopaz is portable across platforms
  • Work on Linux, Windows, and Mac machines

Global Computing Lab 2008
74
(No Transcript)
75
Java CoG Toolkit
  • Java Commodity Grid Toolkit
  • Allow Grid users, administrators, and developers
    to work with the Grid from a higher abstraction
    level
  • jGlobus Library
  • Provide basic API's for interacting with Grid
    services such as GridFTP and MyProxy
  • Key component in jTopaz

Global Computing Lab 2008
76
Future Work
  • Currently jTopaz only implements a simple
    client-server file transfer model
  • Future work include advanced features
  • Third party transfers
  • Parallel transfers
  • Partial file transfers

Global Computing Lab 2008
77
jTopaz Demo
Global Computing Lab 2008
78
jTopaz Demo
Select jTopaz in the Tools menu
79
jTopaz Demo
Enter GridFTP server info
80
jTopaz Demo
Local Files
Remote Files
81
Acknowledgements
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Kevin Kreiser
Obaidur Rahaman
Sponsors
jTopaz Collaborators Martin Swany (UD) Karan
Bhatia (SDSC)
Group Webpage http//gcl.cis.udel.edu
81
Global Computing Lab 2008
82
Intelligent CompilersJohn Cavazos
(cavazos_at_cis.udel.edu)The Adaptive Compilation
Environment (ACE) ProjectDept. of Computer
Information Sciences, University of Delaware
83
Motivation
  • Architectures are getting increasingly more
    complex
  • Finding efficient heuristics to solve hard
    compiler problems challenging
  • Quick retargeting of optimizing compilers for new
    architectures are needed

84
Solution Intelligent Compilers
  • Machine learning
  • Automates the process of tuning optimizing
    compilers
  • Allows specialization of compiler to targeted
    hardware

85
Overview Intelligent Compiler
86
Methodology Description
  • Phrase as a Machine Learning Problem
  • Feature Construction
  • Generate Training Instances
  • Feed Instances to Learning Algorithm
  • Integrate the Learned Heuristic
  • Evaluate the Learned Heuristic

87
Case Study PathScale
88
Case Study
  • PC Model trained using (1) program
    characteristics (from performance counters), (2)
    best optimization sequences for each program, and
    (3) speedups obtained from best seqs.
  • Predictive model predicts which optimizations
    will be beneficial optimizations for each
    application

89
SPEC C/C and Fortran Benchmarks
Obtained 17 average improvement over most
aggressive optimization level (-Ofast) in an
industry-strength compiler. Experiments
performed using PathScale compiler.
90
Future Work
  • Optimization Phase-Ordering
  • Multicore optimizations

For more information http//www.cis.udel.edu/cav
azos
91
Research Overview SC08
  • Murat Bolat, Liang Gu, Jakob Siegel, Ryan Taylore
  • Principal Investigator Xiaoming Li
  • University of Delaware
  • xli_at_ece.udel.edu

92
Library Generation and Optimization Projects -
Model-driven optimization for FFTW - Lattice
Boltzmann Method for CUDA
93
Model-driven optimization for FFTW
94
Model-driven optimization for FFTW
  • Goal
  • Understand why FFTW produces high-performance
    code.
  • Discover the role of FFTWs empirical search
    engine in code optimization.
  • Generate FFT library with equally high quality
    without empirical search.

95
Performance of our model-driven FFTW
Better
Our Code
96
Search time of our model-driven FFTW
Better
Our Code
97
Accelerate Lattice Boltzmann Method (LBM) on CUDA
98
Accelerate LBM on CUDA
  • The LBM models Boltzmann particle dynamics on a
    2D or 3D lattice.
  • LBM is one of the most important physical
    simulation methods.

99
Challenges of Optimize LBM on CUDA (1)
  • Extensive and irregular data exchange between
    lattice cells

Our Optimization Techniques (1) Co-optimize
global memory and shared memory layout. (2)
Coalesce memory accesses. (3) 2-D data padding
and buffering.
100
Challenges of Optimize LBM on CUDA (2)
  • Boundary testing and barrier detection

Our Optimization Techniques (1)
Control-structure splitting. (2) Kernel
splitting. (3) Adaptive thread grid and block
size selection.
101
  • Performance of our LBM code on CUDA
  • 140X speedup
  • Scale up well with problem size

Better
Our Code
Write a Comment
User Comments (0)
About PowerShow.com