Russ Miller - PowerPoint PPT Presentation

About This Presentation
Title:

Russ Miller

Description:

Russ Miller – PowerPoint PPT presentation

Number of Views:231
Avg rating:3.0/5.0
Slides: 41
Provided by: don55
Learn more at: https://cse.buffalo.edu
Category:
Tags: gp | miller | russ

less

Transcript and Presenter's Notes

Title: Russ Miller


1
Enabling Collaborative Science Through Grid
Technology
  • Russ Miller
  • Director, Center for Computational Research
  • UB Distinguished Professor, Computer Science
    Engineering
  • Senior Research Scientist, Hauptman-Woodward
    Medical Inst

Top 10 Worldwide Supercomputing Center -
www.gapcon.com
2
Outline
  • Bioinformatics in Buffalo
  • Supercomputing in Buffalo
  • Grid Computing
  • Grid Computing in Buffalo
  • Shake-and-Bake Computational Crystallography
  • ECCE Computational Chemistry

3
Biomedical Advances
  • PSA Test (screen for Prostate Cancer)
  • Avonex Interferon Treatment for Multiple
    Sclerosis
  • Artificial Blood
  • Nicorette Gum
  • Fetal Viability Test
  • Implantable Pacemaker
  • Edible Vaccine for Hepatitis C
  • Timed-Release Insulin Therapy
  • Anti-Arrythmia Therapy
  • Tarantula venom
  • Direct Methods Structure Determination
  • Listed on Top Ten Algorithms of the 20th
    Century
  • Vancomycin
  • Gramacidin A
  • High Throughput Crystallization Method Patented
  • NIH National Genomics Center Northeast
    Consortium
  • Howard Hughes Medical Institute Center for
    Genomics Proteomics

4
Bioinformatics in BuffaloA 290M Initiative
  • UB Center for Advanced Bioengineering
    Biomedical Technologies
  • 1M/yr NYS
  • Med Tech for Product Dev Commer.
  • Center Disease Modeling Therapy Discovery
  • UB, HWI, RPCI, Kaleida
  • 15.3M NYS
  • Software, device development, and drug therapies
  • Buffalo Center of Excellence in Bioinformatics
  • UB, HWI, RPCI
  • 61M NYS
  • 10M Federal Government
  • 151 Corporate Funding
  • UB Faculty Funding 64M

5
Partnerships
  • Lead Partners SUNY-Buffalo, Hauptman-Woodward
    Medical Research Institute, Roswell Park Cancer
    Institute
  • Corporate PartnersAmersham Pharmacia, ATT,
    Beckman Coulter, BioPharma Ireland, Bristol Myers
    Squibb, Confederation of Indian Industries, Dell,
    General Electric, Human Genome Sciences, HP,
    Immco, InforMax, Invitrogen, Pfizer
    Pharmaceutical, Q-Chem, Sloan Foundation, SGI,
    Stryker, Sun, 3M, Veridian, Wyeth Lederle,
    Zeptometrix

6
Experimental Facilities I
  • Molecular Targeting Laboratory
  • Screen 30-50K compounds every 3 months
  • Apply compound to cell (different genes treated w
    fluor markers)
  • Rapidly identify effect on specific gene
    expression pathways
  • Gene Expression Laboratory
  • High-throughput microarray and gene chip
  • Discover new genes, their functions, and pathways
  • Proteomics and Molecular Kinetics Lab
  • Identify molecular targets found in Gene
    Expression Lab
  • Disease Modeling Laboratory
  • In vivo testing (flies, mice, baboons,)
  • Gene targeting and genetic mapping facilities

7
Experimental Facilities II
  • Bioengineering Support Laboratory
  • Capabilities in photonics and nano-tech research
  • E.g., handheld devices to test for diseases
  • Protein Scale-Up and Purification
  • High-Throughput Robotic Combinatorial Chemistry/
    Parallel Synthetic Chemistry Capabilities
  • Drugs created robotically Tested for interaction
    with target protein
  • Rapid identification of a large number of
    potential drugs
  • Public Health and Molecular Pathology
  • Tissue repositories disease gene maps medical
    informatics
  • High-Throughput Search Process for Structural
    Biology
  • Tests 1536 chemical cocktails to determine
    effective parameters for crystallization

8
SUNY-B 2002-03 Snapshot
  • Personnel
  • Hired Jeff Skolnick as Director (7/02)
  • Brought 13 additional staff to Buffalo
  • Authorized to hire 10 additional research groups
  • Hired Norma Nowak as co-Director (4/03)
  • Authorized to hire 10 additional research groups
  • Additional members TBD
  • External Funding (0)
  • Applications submitted
  • Deliverables
  • Six (6) scientific papers
  • Resources
  • Building
  • 6TF ? 10TF Compute Cluster

9
Center for Computational Research
  • High-Performance Computing and High-End
    Visualization
  • 110 Research Groups in 27 Depts
  • 25 Companies and Institutions
  • Sample Areas
  • Urban Visualization and Simulation
  • Computational Chemistry
  • Ground Water Modeling
  • Geophysical Mass Flows
  • Networked Multimedia
  • Medical Imaging
  • Training
  • Workshops Courses
  • Degree Programs

10
CCR 1999-2003 Snapshot
  • Personnel
  • 18 State-Supported Staff
  • 2 Grant-Supported Staff
  • External Funding
  • 111M External Funding
  • 13.5M as lead
  • 97.5M in support
  • 41.8M Vendor Donations
  • Deliverables
  • 350 Publications
  • Software, Media, Algorithms, Consulting,
    Training, CPU Cycles, etc.

Raptor Image
11
Computational Resources (9TF)
  • SGI Origin3800
  • 64 Processors (400 MHz)
  • 32 GB RAM 400 GB Disk
  • IBM RS/6000 SP
  • 78 Processors
  • 26 GB RAM 640 GB Disk
  • Sun Microsystems Cluster
  • 48 Sun Ultra 5s (333MHz)
  • 16 Dual Sunblades (750MHz)
  • 30 GB RAM, Myrinet
  • SGI Intel Linux Cluster
  • 150 PIII Processors (1 GHz)
  • 75 GB RAM, 2.5 TB Disk Storage
  • Apex Bioinformatics System
  • Sun V880 (3), 6800, 280R (2), PIIIs
  • Sun 3960 7 TB Disk Storage
  • HP/Compaq SAN
  • 25 TB Disk 250 TB Tape
  • Dell Linux Cluster - 22 on top500
  • 600 P4 Processors (2.4 GHz)
  • 600 GB RAM 40 TB Disk Myrinet
  • Dell Linux Cluster - 187 on top500
  • 4036 Processors (PIII 1.2 GHz)
  • 2TB RAM 160TB Disk 16TB SN

UBCOEB System
12
Sample Computational Research
  • Computational Chemistry (King, Kofke, Coppens,
    Furlani, Tilson, Lund, Swihart, Ruckenstein,
    Garvey)
  • Algorithm development simulations
  • Groundwater Flow Modeling (Rabideau, Jankovic,
    Becker, Flewelling)
  • Predict contaminant flow in groundwater
    possible migration into streams and lakes
  • Geophysical Mass Flows (Patra, Sheridan, Pitman,
    Bursik, Jones, Winer)
  • Study of geophysical mass flows for risk
    assessment of lava flows and mudslides
  • Bioinformatics (Zhou, Miller, Hu, Szyperski NIH
    Consortium, HWI)
  • Protein Folding computer simulations to
    understand the 3D structure of proteins
  • Structural Biology Pharmacology
  • Computational Fluid Dynamics (Madnia, DesJardin,
    Lordi, Taulbee)
  • Modeling turbulent flows and combustion to
    improve design of chemical reactors, turbine
    engines, and airplanes
  • Physics (Jones, Sen)
  • Many-body phenomena in condensed matter physics
  • Chemical Reactions (Mountziaris)
  • Molecular Simulation (Errington)

13
Visualization Resources
  • Fakespace ImmersaDesk R2
  • Portable 3D Device
  • Tiled-Display Wall
  • 20 NEC projectors 15.7M pixels
  • Screen is 11?7
  • Dell PCs with Myrinet2000
  • Access Grid Node
  • Group-to-Group Communication
  • Commodity components
  • SGI Reality Center 3300W
  • Dual Barcos on 8?4 screen
  • VREX VR-4200 Stereo Imaging Projector
  • Portable projector works with PC

14
Sample Visualization Areas
  • Computational Science (Patra, Sheridan, Becker,
    Flewelling, Baker, Miller, Pitman)
  • Simulation and modeling
  • Urban Visualization and Simulation (CCR)
  • Public projects involving urban planning
  • Medical Imaging (Hoffmann, Bakshi, Glick,
    Miletich, Baker)
  • Tools for pre-operative planning predictive
    disease analysis
  • Geographic Information Systems (CCR, Bisantz,
    Llinas, Kesavadas, Green)
  • Parallel data sourcing software
  • Historical Reenactments (Paley, Kesavadas, More)
  • Faithful representations of previously existing
    scenarios
  • Multimedia Presentations (Anstey, Pape)
  • Networked, interactive, 3D activities

15
3D Medical Visualization App
  • Collaboration with Childrens Hospital
  • Leading miniature access surgery center
  • Application reads data output from a CT Scan
  • Visualize multiple surfaces and volumes
  • Export images, movies or CAD representation of
    model

16
Multiple Sclerosis Project
  • Collaboration with Buffalo Neuroimaging Analysis
    Center (BNAC)
  • Developers of Avonex, drug of choice for
    treatment of MS
  • MS Project examines patients and compares scans
    to healthy volunteers

17
Multiple Sclerosis Project
  • Compare caudate nuclei between MS patients and
    healthy controls
  • Looking for size as well as structure changes
  • Localized deformities
  • Spacing between halves
  • Able to see correlation between disease
    progression and physical structure changes

18
Grid Computing 2003
DISCOM SinRG APGrid IPG
19
Grid Computing Overview
Thanks to Mark Ellisman
Advanced Visualization
Data Acquisition
Analysis
Computational Resources
Imaging Instruments
Large-Scale Databases
  • Coordinate Computing Resources, People,
    Instruments in Dynamic Geographically-Distributed
    Multi-Institutional Environment
  • Treat Computing Resources like Commodities
  • Compute cycles, data storage, instruments
  • Human communication environments
  • No Central Control No Trust

20
Computational Grids Electric Power Grids
  • Similarities/Goals of CG and EPG
  • Ubiquitous
  • Consumer is comfortable with lack of knowledge of
    details
  • Differences Between CG and EPG
  • Wider spectrum of performance services
  • Access governed by more complicated issues
  • Security
  • Performance
  • Socio-political factors

21
Growth of Data and Load vs. Moores Law
Courtesy of Rick Stevens
Metabolic Pathways
Pharmacogenomics
Human Genome
Combinatorial Chemistry
Computational Load
ESTs
Genome Data
Moores Law
1990
2000
2010
22
Biomedical Data High Complexity and Large Scale
Courtesy of Rick Stevens
billions
Protein-Protein Interactions metabolism
pathways receptor-ligand 4º structure
Physiology Cellular biology Biochemistry
Neurobiology Endocrinology etc.
Polymorphism and Variants genetic variants
individual patients epidemiology
millions
millions
Proteins sequence 2º structure 3º structure
Hundredthousands
ESTs Expression patterns Large-scale screens
Genetics and Maps Linkage Cytogenetic
Clone-based
MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT...
DNA sequences alignments
billions
...atcgaattccaggcgtcacattctcaattcca...
millions
23
Computational Motivation
Courtesy of Rick Stevens
24
A Short History of the Grid
  • Grand Challenge Problems (1980s)
  • NSF and DOE initiatives
  • Science is a team sport
  • Initiate multi-resource projects involving
    computation, instruments, visualization, data
  • Evolution of Related Communities
  • Parallel computation
  • Address resource limitations
  • Networking
  • Gigabit testbed program
  • Investigate potential testbed network
    architectures
  • Explore usefulness for end-users

CASA Gigabit Testbed (1990s)
25
The Globus Project(Ian Foster and Carl Kesselman)
The Grid as a Layered Set of Services
  • Globus model focuses on providing key Grid
    services
  • Resource access and management
  • Grid FTP
  • Information Service
  • Security services
  • Authentication
  • Authorization
  • Policy
  • Delegation
  • Network reservation, monitoring, control

26
Extensible TeraGrid Facility (ETF)
ANL Visualization
Caltech Data collection analysis
LEGEND
Visualization Cluster
Cluster
IA64
Sun
IA32
0.4 TF IA-64 IA32 Datawulf 80 TB Storage
1.25 TF IA-64 96 Viz nodes 20 TB Storage
IA64
Storage Server
Shared Memory
IA32
IA32
Disk Storage
Backplane Router
Extensible Backplane Network
LA Hub
Chicago Hub
30 Gb/s
30 Gb/s
40 Gb/s
30 Gb/s
30 Gb/s
30 Gb/s
Figure courtesy of Rob Pennington, NCSA
10 TF IA-64 128 large memory nodes 230 TB Disk
Storage GPFS and data mining
6 TF EV68 71 TB Storage 0.3 TF EV7
shared-memory 150 TB Storage Server
4 TF IA-64 DB2, Oracle Servers 500 TB Disk
Storage 6 PB Tape Storage 1.1 TF Power4
EV7
IA64
Sun
EV68
IA64
Pwr4
Sun
NCSA Compute Intensive
SDSC Data Intensive
PSC Compute Intensive
27
Enabling the Grid
  • Internet is Infrastructure
  • Increased network bandwidth and advanced services
  • Advances in Storage Capacity
  • Terabyte costs less than 5,000
  • Internet-Aware Instruments
  • Increased Availability of Compute Resources
  • Clusters, supercomputers, storage, visualization
    devices
  • Advances in Application Concepts
  • Computational science simulation and modeling
  • Collaborative environments ? large and varied
    teams
  • Grids Today
  • Moving towards production Focus on middleware

28
X-Ray Crystallography
  • Objective Provide a 3-D mapping of the atoms in
    a crystal.
  • Procedure
  • Isolate a single crystal.
  • Perform the X-Ray diffraction experiment.
  • Determine molecular structure that agrees with
    diffration data.

29
X-Ray Data Corresponding Molecular Structure
Underlying atomic arrangement is related to the
reflections by a 3-D Fourier transform.
Reciprocal or Phase Space
Real Space
  • Phases lost during the crystallographic
    experiment.
  • Phase Problem Determine phases of the
    reflections.

30
Shake-and-Bake Method Dual-Space Refinement
Trial Structures
Shake-and-Bake
FFT
Tangent Formula
Trial Phases
?
Phase Refinement
Phase Refinement
Density Modification (Peak Picking) (LDE)
FFT-1
Parameter Shift
Solutions
Shake
Bake
31
Phasing and Structure Size
32
Ph8755 SnB Histogram
Atoms 74 Phases 740 Space Group P1 Triples
7,400
Trials 100
Cycles 40
Rmin range 0.243 - 0.429
33
Grid-Based SnBObjectives
  • Install Grid-Enabled Version of SnB
  • Job Submission and Monitoring over Internet
  • SnB Output Stored in Database
  • SnB Output Mined through Internet-Based
    Integrated Querying Tool
  • Serve as Template for Chem-Grid Bio-Grid
  • Experience with Globus and Related Tools

34
Proof of Concept
  • Combine CCRs Heterogeneous Compute Platforms
    into a Grid
  • Client/Server Configurations
  • Rapid Prototype 4Q02 (not Globus)
  • Develop a user interface to monitor system
  • Dynamic HTML Grid Interface
  • Key Features for Proof of Concept
  • Load Balancing
  • Fault Tolerance
  • Result and Grid Statistics

35
Client/Server Configuration
Grid Server
Type 1
Type 3
Type 2
36
Internet Grid Console
  • Dynamic HTML Grid Status
  • Grid Server Information
  • Date/Completion Time
  • Parallel Run Time/Serial Run Time/Speedup
  • Trial Result Rate (Trial/Minute)
  • Shows Configured Platform Information
    Dynamically
  • Platform Type/Name/Picture
  • Status Idle/Working/Offline
  • Resources Nodes/Total Process/Available
    Process/Running Process
  • Shows Job Status Dynamically
  • Trails Total Number/Amount Processed
  • Platform Server State Block Queue/Float/Race
  • Result Figure of Merit Histogram

37
Grid Server Console (Vancomycin)
38
Status Report
  • Grid Portal
  • Access control lists, security groups
  • User attributes, history, proxies
  • Managed through MySQL database
  • Distributed data grid
  • Globus
  • Vers 2.2.4 installed and in production
  • Metacomputing Directory Services (MDS) stored in
    MySQL
  • Eliminates need for LDAP
  • Condor and Condor-G
  • Used for resource management and grid job
    submissions

39
Red queue color indicates that there are
currently running or queued jobs.
40
ECCE Grid at CCR
  • Import Scientific Information
  • Application independent input
  • ECCE automatically formats for target application
    (Gaussian98, NWChem)
  • Computing at CCR
  • 881 available CPUs (gt2.5TFlops)
  • (Xeon, P3, Power3, R12K)
  • Uniform access to all platforms via ECCE job
    launcher
  • Chemical Analysis
  • Full complement of visual tools for understanding
    data/publication quality graphics
  • Computational Chemistry
  • Relativistic effects/Heavy elements
  • Algorithm development
  • Theoretical physical chemistry
  • Structural/Systems Biology
  • Protein structure
  • Enzyme catalysis
  • Chemical Engineering
  • Condensed phases/Mixed phase predictions
  • Catalysis
  • Geology, Pharmacology, Medical School

41
(No Transcript)
42
BioGrids
Genomics is powering the new biology, but
Computing is in the drivers seat.
BioGrids provide scalable computing so that
biologists can focus on biology.
  • EUROGRID BioGRID
  • Asia Pacific BioGRID
  • NC BioGrid
  • Bioinformatics Research Network
  • Osaka University Biogrid
  • Indiana University BioArchive BioGrid

43
Contact Information
  • miller_at_buffalo.edu
  • www.ccr.buffalo.edu

44
Acknowledgments
  • Mark Green
  • Steve Gallo
  • Jason Rappleye
  • Jeff Tilson
  • Martins Innus
  • Betty Capaldi
  • Bruce Holm
  • Janet Penksa
  • George DeTitta
  • Herb Hauptman
  • Charles Weeks
  • Steve Potter
  • Rohit Bakshi
  • Philip Glick

45
Protein Folding
  • Ability of proteins to perform biological
    function is attributed to their 3-D structure.
  • Protein folding problem refers to the challenge
    of predicting 3-D structure from amino-acid
    sequence.
  • Solving the protein folding problem will impact
    drug design.

46
Protein Dynamics
  • Dynamics of Hemoglobin (Example)
  • 50 Days of Processing on 16 Processors (800 CPU
    Days)
  • Key
  • White Heme Groups
  • Red Phe97
  • Red Oxygen (in the subunit at bottom)
  • Green His 69 and 101
  • Blue Tyr 72
  • Cyan (Ball) Water Molecules
  • Yellow Helix E/F
  • Interest
  • Flip of the Phe97 ring at top
  • Water movement around Phe97
  • Heme-heme relative movement

47
Academic Programs
  • Bachelors Masters Program in Bioinformatics
  • Related Disciplines
  • Chemical Biology
  • Computational Chemistry
  • Environmental Analysis (Sloan Support)
  • Medical Informatics (Sloan Support)
  • Advanced Degrees under Development
  • Pharmacometrics, Biophotonics
  • UB-HWI Department of Structural Biology
  • Complementary Degrees
  • Canisius College Niagara University

48
Support (2001-2002)
  • New York State 61M
  • Federal 3.1M
  • Competitive Grants 53
  • Proteomics 1.5M
  • Disease Pathogens and Physiology 27M
  • Drug Discovery 6M
  • Genomic and Proteomic Infra. 1.8M
  • Genomics 4.7M
  • Information Technology 12.3M
  • Corporate 135M
  • Foundation 3.5M

49
Confocal Microscopy
  • 3D Reconstruction of an Oral Epithelial Cell
  • Translucent White Surface Represents the Cell
    Membrane
  • Reddish Surface Represents Groups of Bacteria

50
Bioinformatics
  • The creation and development of advanced
    information and computational technologies to
    solve problems in biology.
  • The use of advanced computational resources and
    techniques to analyze data generated by the Human
    Genome Project to improve medical treatment.
  • Precise sequence of 30K human genes have been
    mapped
  • Critical to elucidate the function of each gene.
  • Leads to greater understanding of human
    development.
  • Potential to treat many diseases, including AIDS
    cancer, MS, and Alzheimers and provide
    personalized treatment.
  • From Human Genome
  • Locate genes (tens of thousands in human body)
  • Determine what protein a gene regulates (millions
    of proteins in body)
  • Determine structure
  • Determine protein function
  • Devise drugs to block or enhance protein function

51
Childrens Hospital CT
  • 3D Reconstruction of CT Dataset
  • Created with the Visualization Toolkit (VTK) on a
    Linux Workstation
  • 3D Isosurface Clearly Shows Structure that is
    Nearly Impossible to Determine from 2D Slices

52
Miniature Access Surgery
53
Molecular Structure Determination
  • SnB Software by UB/HWI
  • Top Algorithms of the Century
  • Critical to Rational Drug Design
  • Important Link in Structural Biology
  • Current Effort
  • Grid
  • Collaboratory
  • Intelligent Learning

54
Animal Models and Preclinical Toxicology
55
Antibiotics Supercomputers
  • Vancomycin solved with SnB (UB/HWI)
  • SnB Top Algorithms of the Century
  • Antibiotic of Last Resort
  • Original molecular structure required 5 months
  • (Re)solved in a single day on CCRs
    supercomputers
  • Current Efforts Grid, Collaboratory, Intelligent
    Learning

Result New, better drugs in shorter time
56
Photograph of Crystal
57
Useful Relationships for Multiple Trial Phasing
Tangent Formula
Parameter Shift Optimization
58
Structure of SnB
SnB
Process Trials
Histogram
Visualization
59
Vancomycin Crystal Structure Views(courtesy of
P. Loll P. Axelsen)
60
Computing Platforms
  • Workstations
  • SGI, Sun, DEC/Alpha
  • Linux
  • Parallel Computers
  • Cray T3D/E, TMC CM-5, IBM SP2
  • HP-Convex Exemplar
  • SGI Origin2/3000 Onyx 2/3
  • IBM SP heterogeneous
  • Linux Clusters
  • Sun Cluster
  • Condor Flock
  • Computational Grid

61
Molecular Structure Determination
  • SnB Software by UB/HWI
  • Top Algorithms of the Century
  • Critical to Rational Drug Design
  • Important Link in Structural Biology
  • Current Effort
  • Grid
  • Collaboratory
  • Intelligent Learning

62
Vancomycin Crystal(courtesy of P. Loll)
63
The Diffraction Pattern
  • Experiment yields
  • reflections
  • associated intensities
  • Phase angles are lost in experiment.

64
The Phase Problem
  • Experiment yields
  • reflections
  • associated intensities
  • Phase angles are lost in experiment.
  • Underlying atomic arrangement is related to the
    reflections by a 3-D Fourier transform.
  • Phase Problem determine the set of phases
    corresponding to the reflections.

65
Extensible Teragrid Facility (ETF)
48 Visualization nodes
Sun Storage Server
IA32
LEGEND
Cluster
Storage Server
48 Visualization nodes 1.25 TF IA-64 20 TB Storage
0.4 TF IA-64 IA32 Datawulf 80 TB Storage
IA64
Disk Storage
IA32
IA64
IA32
Visualization Cluster
ANL Visualization
Caltech Data collection analysis
Shared Memory
LA Hub
Chicago Hub
Extensible Backplane Network
Figure courtesy of NSF
30 Gb/s Extension to PSC
NCSA Compute-Intensive
SDSC Data-Intensive
PSC Heterogeneity
30 Gb/s Net 0.4 TF EV7 shmem 50 TB
Storage Storage Server
2.1 TF IA-64 128 lg-mem nodes 110 TB Storage
280 TB Storage DB2 Server 1.1 TF Power4
Pwr4
IA64
EV7
IA64
EV68
IA64
6 TF EV68 70 TB Storage
8 TF IA-64 300 TB Storage
5 TF IA-64 300 TB Storage
66
TeraGrid 13.6 TF, 6.8 TB memory, 79 TB internal
disk, 576 network disk
ANL 1 TF .25 TB Memory 25 TB disk
Extreme Blk Diamond
Caltech 0.5 TF .4 TB Memory 86 TB disk
574p IA-32 Chiba City
256p HP X-Class
32
32
32
32
24
128p Origin
128p HP V2500
32
24
32
24
HR Display VR Facilities
92p IA-32
5
4
5
8
8
HPSS
HPSS
OC-48
NTON
OC-12
Calren
ESnet HSCC MREN/Abilene Starlight
Chicago LA DTF Core Switch/Routers Cisco 65xx
Catalyst Switch (256 Gb/s Crossbar)
Juniper M160
OC-48
OC-12 ATM
OC-12
GbE
NCSA 62 TF 4 TB Memory 240 TB disk
SDSC 4.1 TF 2 TB Memory 225 TB SAN
vBNS Abilene Calren ESnet
OC-12
OC-12
OC-12
OC-3
Myrinet
4
8
HPSS 300 TB
UniTree
2
Myrinet
4
10
1024p IA-32 320p IA-64
1176p IBM SP 1.7 TFLOPs Blue Horizon
14
Sun Server
15xxp Origin
4
16
2 x Sun E10K
67
Grids Form the Basis of a National Information
Infrastructure
August 9, 2001 NSF Awarded 53,000,000 to
SDSC/NPACI and NCSA/Alliance for TeraGrid
  • TeraGrid will provide in aggregate
  • 13.6 trillion calculations per second
  • Over 600 trillion bytes of immediately accessible
    data
  • 40 gigabit per second network speed
  • Provide a new paradigm for data-oriented
    computing
  • Critical for disaster response, genomics,
    environmental modeling, etc.

68
  • PIs Berman, Foster, Messina, Reed, Stevens
  • Sites SDSC/UCSD, Caltech, NCSA/UIUC, ANL
  • Partners IBM, Intel, Qwest, Sun, Myricom,
    Oracle and others
  • Cool Things about the TeraGrid
  • Big data, simulation, modeling
  • Grid computing, Globus, portals, middleware
  • Clusters, Linux
  • Usability, impact, production facility
  • TeraGrid Software Environment
  • Linux
  • Basic and Core Globus Services
  • Advanced Services
  • Data Services
  • Over .6 Petabytes of on-line disk will provide
    ultimate environment for data-oriented
    computation
  • Linux environment provides more direct path from
    development on lab cluster to performance on
    high-end platform

69
Visualization Resources
  • Fakespace ImmersaDesk R2
  • Portable 3D Device
  • VREX VR-4200 Stereo Imaging Projector
  • Portable projector works with PC
  • Tiled-Display Wall
  • 20 NEC projectors Dell PCs Myrinet 15.7M
    pixels
  • Access Grid Node
  • Group-to-Group Communication
  • Commodity components
  • SGI Reality Center 3300W
  • Dual Barcos on 8?4 screen

70
Status of Grid Services
  • Core Grid Services have been Deployed in
    Large-Scale Testbeds
  • Availability of these Services is Enabling Tool
    Application Development Projects
  • Major Challenges Remain
  • Advance reservation, policy, accounting
  • End-to-end application adaptation (events?)
  • Integration with commodity technology
  • Grid Forum http//www.gridforum.org

71
Conventional Direct Methods
72
Ph8755 Trace of SnB Solution
Atoms 74
Space Group P1
SnB Cycles 40
73
Vancomycin
  • Interferes with formation of bacterial walls
  • Last line of defense against deadly
  • streptococcal and staphylococcal bacteria strains
  • Vancomycin resistance exists (Michigan)
  • Cant just synthesize variants and test
  • Need structure-based approach to predict
  • Solution with SnB (Shake-and-Bake)
  • Pat Loll
  • George Sheldrick

74
End-to-End Factors
  • Cross Science Collaboration
  • Multiple Physical and Cultural Communities
  • Open and New Technologies
  • Broadly Accessible
  • Flexible and Extensible
  • Useful to all Scientists and Engineers
  • Contains a Broad Variety of Technologies

75
DTF/ETF Driver Applications
  • Genomics
  • National Virtual Observatory
  • National Ecological Observatory Network
  • National Earthquake Engineering Simulation
  • Neuroscience Imaging
  • Laser Interferometer Gravitational Wave
    Observatory

76
New Results Possible on ETF
  • Biomedical Informatics Research Network BIRN
  • Evolving reference set of brains
  • Essential data for developing therapies for
    neurological disorders (Multiple Sclerosis,
    Alzheimers)
  • Pre-TeraGrid
  • One PET or MRI lab
  • Small patient base
  • 4 TB collection
  • Post-TeraGrid
  • Many collaborating labs
  • Larger population sample
  • 400 TB data collection
  • More brains, higher resolution
  • Multiple scale data integration and analysis

77
Client/Server Configurations
  • Three Main Types
  • Type 1 Standard Cluster Configuration
  • Represents most CCR platforms (same IP subnet)
  • Type 2 Firewall Cluster Configuration
  • Represents remote firewall protected platforms
    (different IP subnets)
  • Type 3 Heterogeneous OS Cluster Configuration
  • Represents different OS architectures combined
    into one internally IP addressed cluster platform

78
Type 1 Configuration
Grid Server
Type 1
  • Standard Cluster Configuration
  • Grid Server communicates with Relay Server that
    has a public IP address
  • Relay Server communicates with Platform Server
    that only has an internal IP address
  • Platform Server communicates with Node Servers
    that process Grid Server tasks
  • All Nodes are of the same OS architecture (Linux,
    AIX, Solaris, etc.), but processor class may be
    different (PIII-Xeon, Pwr2-Pwr3, Ultra3-Ultra2i,
    etc.)

79
Type 2 Configuration
Grid Server
Type 2
  • Firewall Cluster Configuration
  • Grid Server communicates with Platform Server
    that only has an internal IP address through SSH
    tunnels on a firewall
  • Platform Server communicates with Node Servers
    that process Grid Server tasks
  • All Nodes are of the same OS architecture (Linux,
    AIX, Solaris, etc.), but processor class may be
    different (PIII-Xeon, Pwr2-Pwr3, Ultra3-Ultra2i,
    etc.)

80
Type 3 Configuration
  • Heterogeneous OS Cluster Configuration
  • Grid Server communicates with Relay Server that
    has a public IP address
  • Relay Server communicates with Platform Server
    that only has an internal IP address
  • Platform Server communicates with Node Servers
    that process Grid Server tasks
  • Nodes can have different OS architecture
    (Linux-Alpha, etc.), and processor class may also
    be different (PIII-Xeon, Pwr2-Pwr3, etc.)

Grid Server
Type 3
81
Grid Server Console (Vancomycin)
82
Load Balancing Stages
  • Three Stages of Load Balance Implemented
  • Block Queues
  • 85 of the Shake-and-Bake trials are distributed
    based on the Cluster Platform speed
  • A block of Shake-and-Bake trials are reserved for
    each Platform Server in the Grid
  • Float
  • 15 of the Shake-and-Bake trials are reserved
    for dynamic load balancing
  • Wrapup
  • When the Float queue trials have been completed
    distributing unfinished trials are distributed to
    idle Node Servers

83
Grid Server Console (Vancomycin)
84
Load BalancingBlock Queue
  • Block Queue Determination
  • A Platform Server speed is determined by timing
    one trial on the cluster platforms node servers
  • Grid Server starts the time when it requests the
    trial to be done and stops the time when the
    trial solution has been received
  • A Harmonic Mean is calculated from all of the
    respective node servers solution times which
    determines the platform speed
  • The number of processes for each cluster
    platform is then used to determine an appropriate
    platform load factor
  • Platform load factor processes ((
    processes (avg platform speed platform speed)
    / avg platform speed ) / platform speed adjusted
    of processes)
  • The load factors range from 0 1 and the sum of
    all load factors is 1

85
Grid Server Console (Vancomycin)
86
Load BalancingFloat Queue
  • Float Queue
  • 15 of the total trials
  • When a platform has no more trials to distribute
    to its node servers the Block Queue is set at
    100 complete
  • This platform is then at Float status and will
    receive the trial numbers to process directly
    from the Grid Server one at a time as node
    servers become idle
  • All other platforms that reach this level will
    also receive their trial numbers directly from
    the Grid Server until the Float queue has been
    completed

87
Grid Server Console (Vancomycin)
88
Load BalancingWrapup
  • Wrapup
  • Distributing unfinished trials to idle node
    servers
  • When a platform has completed its Block queue
    and the Float queue has also been completed the
    Grid Server determines which trial results have
    not been received and distributes them again one
    at a time
  • There are several possibilities for why trials
    did not complete
  • A much slower platform has not yet finished its
    Block queue trials and is still working on them
  • The trial result was lost during transmission
  • The platform that processed the trial has gone
    offline or lost network connectivity without
    transmitting the trial result
  • The Grid Server will continue to distribute all
    unfinished trials to idle node servers as long as
    a valid result has not been received
  • The first trial result received by the Grid
    Server is accepted and any subsequent trial
    results received by the Grid Server are flagged
    duplicate and ignored

89
Grid Server Console (Vancomycin)
90
Load BalancingFault Tolerance
  • Fault Tolerance
  • A platform has gone offline or lost network
    connectivity
  • The grid-enabled SnB implementation is extremely
    fault tolerant
  • The requested SnB trial results will be
    completed automatically even if all but one
    platform fail
  • One node server has the ability to
  • Complete its platform Block Queue
  • Process the Float Queue
  • Process all other failed platform Block Queues
  • Process failed trials from its own platform
    Block Queue
  • Automatically !

91
Grid Server Console (Vancomycin)
92
Grid Server Console (ILED)
93
Buffalo Center of Excellence in Bioinformatics
  • Act as a research, development, education, and
    economic resource for industries based on
    bioinformatics, including information technology,
    biotech, and pharmaceuticals.
  • Combine state-of-the-art computational facilities
    with high-throughput experimental facilities to
    enable the development of new medical treatments.
  • Develop and exploit new algorithms for data
    acquisition, storage, management, and
    transmission.

94
Life Sciences Complex(Buffalo-Niagara Medical
Campus)
Training interspersed throughout 3 buildings HWI
20M to replace old building (not shown)
  • UB 52M CoE in Bioinformatics
  • Research and business partners
  • 225 employees and business associates
  • 150,000 sq ft 50 labs, 50 computational
    facilities
  • RPCI 60M Pharmacology/Genetics
  • 60 PIs and 200 support staff
  • 170,000 sq ft 85 rsrch labs 15 spprt
Write a Comment
User Comments (0)
About PowerShow.com