Computer Infrastructures for Reliable, LargeScale Simulations - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Computer Infrastructures for Reliable, LargeScale Simulations

Description:

Different methods, e.g., Monte Carlo, Molecular Dynamics ... Between June 2004 and August 2004, 64 targets (sequences amino acids whose ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 12
Provided by: michela8
Category:

less

Transcript and Presenter's Notes

Title: Computer Infrastructures for Reliable, LargeScale Simulations


1
Computer Infrastructures for Reliable,
Large-Scale Simulations
  • Michela Taufer

Global Computing Lab University of Texas at El
Paso
2
What Infrastructures and Simulations?
  • Computer infrastructures
  • Dedicated high performance systems clusters and
    SMP machines on your campus
  • Grid computing the NSF initiative TeraGrid
  • Desktop grid computing and volunteer computing
    BOINC projects
  • Computer simulations
  • Different methods, e.g., Monte Carlo, Molecular
    Dynamics
  • Heterogeneous workflow, i.e., different phases
    (codes) to accomplish a complete simulation

Given a computer simulation, each infrastructure
has its strengths and weaknesses. Choosing the
proper infrastructure is vital for fast, reliable
simulation results.
3
Outline
  • TeraGrid initiative
  • Volunteer computing and BOINC
  • Two applications on volunteer computing
  • Protein structure prediction
  • Protein-ligand docking
  • Research challenges
  • Research opportunities

4
Grid Computing and TeraGrid
  • Short overview of the TeraGrid project

5
Volunteer Computing and BOINC Projects
BOINC project
  • Computing resources (e.g., desktops, notebooks)
    owned by volunteers and connected through the
    Internet
  • Normally used to address fundamental problems in
    science
  • BOINC (Berkeley Open Infrastructure for Network
    Computing) is a well-known representative of VC
  • The computing power of BOINC is currently about
    420 TeraFLOPS (based on credit granted across all
    projects)?
  • The total free disk space on computers running
    SETI_at_home is 12 Petabytes

Developers and Administrators
Results
Master
Internet
Scientists
Worker (home PC)?

Worker (home PC)?
Volunteers
6
Predictor_at_home Project Goals
  • CASP Critical Assessment of Techniques for
    Protein Structure Prediction
  • Biannual competition which aims to advance the
    research in structure prediction methods
  • Between June 2004 and August 2004, 64 targets
    (sequences amino acids whose protein structure
    was unknown) were ultimately solved
    experimentally for comparison with the
    participant predictions
  • Our previous experiences in CASP4 / CASP5
  • Focus on development of algorithms for structure
    prediction
  • Our objective in CASP6
  • Improve predictions upon previous methods by
    augmenting conformational sampling by orders of
    magnitude
  • Our approach
  • Deploy a structure prediction supercomputer
    based on the volunteer computing paradigm
    Predictor_at_Home
  • Our final goal
  • Test the hypothesis that significantly increased
    sampling affordable with Predictor_at_Home indeed
    improves the quality of structure prediction

7
Predictor_at_home Heterogeneous Workflow
8
Predictor_at_home Significant Results
  • Over three months, from June 2004 to August 2004
  • 6786 users joined the project providing a total
    compute time of about 12 billion seconds (380
    years)?
  • P_at_H has identifies four types of targets
  • Easy targets based on good templates do not
    benefit from extensive sampling P_at_H and a
    dedicated cluster provide similar results over
    the same interval of time
  • Medium difficult targets based on loose templates
    of unrelated proteins benefit from high P_at_H
    sampling P_at_H provides better results than a
    dedicated cluster over the same interval of time
  • Hard difficult targets without a template and
    lengths up to 300 amino acids still benefit from
    high P_at_H sampling - as for Medium difficult
    target
  • Very hard targets without a template, longer than
    300 amino acids, and multi-domains show sever
    limitation of model to capture the multi-domians
    - P_at_H and a dedicated cluster provide bad results
    over the same interval of time

9
Predictor_at_home Prediction Samples
Experimental structures P_at_H predictions
Comparative Modeling (easy)? Target t0277, 119
residues GDT 80.34, RMSD 1.88
Fold Recognition (medium)? Target t0274, 159
residues GDT 71.63, RMSD 3.40
New Fold (hard)? Target t0201, 94 residues GDT
43.88, RMSD 5.80
10
Docking_at_home Objectives and Research Fields
  • Objectives
  • to explore the multi-scale nature of algorithmic
    adaptations in protein-ligand docking
  • protein-ligand representation spanning scale
    from rigid to flexible representation of
    protein-ligand interactions
  • solvent representation spanning scale from less
    accurate to more accurate modeling of water
    treatment
  • sampling strategy spanning scale from fixed to
    adaptive sampling of the protein-ligand docking
    space
  • to develop cyber infrastructures based on
    volunteer computing that efficiently accommodate
    these adaptations
  • Interdisciplinary research fields
  • docking methods (Drs. Charles L. Brooks III at
    TSRI and Michela Taufer at UTEP)?
  • decision theory (Dr. Martine Ceberio at UTEP)?
  • modeling for dynamic adaptation (Drs. Patricia J.
    Teller and Michela Taufer at UTEP)?
  • volunteer computing (Drs. David P. Anderson at UC
    Berkeley and Michela Taufer at UTEP)?

11
Docking_at_home Portal
http//docking.utep.edu
Write a Comment
User Comments (0)
About PowerShow.com