CS273 Algorithms for Structure and Motion in Biology PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: CS273 Algorithms for Structure and Motion in Biology


1
CS273Algorithms for Structure and Motion in
Biology
Spring 2006 http//www.stanford.edu/class/cs273/
  • Instructors
  • Serafim Batzoglou and Jean-Claude Latombe
  • Teaching Assistant Sam Gross
  • serafim latombe ssgross _at_
    cs.stanford.edu

2
Need a Scribe!!
3
Range of Bio-CS Interaction
Enormous range over space and time
Body system
Robotic surgery
Tissue/Organs
Soft-tissue simulation andsurgical training
Cells
Simulation ofcell interaction
Molecules
Molecular structures,similaritiesand motions
Gene
Sequencealignment
CS273
4
Focus on Proteins
  • Proteins are the workhorses of all living
    organisms
  • They perform many vital functions, e.g
  • Catalysis of reactions
  • Transport of molecules
  • Building blocks of muscles
  • Storage of energy
  • Transmission of signals
  • Defense against intruders

5
Proteins are also of great interest from a
computational viewpoint
  • They are large molecules (few 100s to several
    1000s of atoms)
  • They are made of building blocks (amino acids)
    drawn from a small library of 20 amino-acids
  • They have an unusual kinematic structure long
    serial linkage (backbone) with short side-chains

6
Proteins are associated with many challenging
problems
  • Predict folded structures and motion pathways
  • Understand why some proteins misfold or partially
    fold, causing such diseases as cystic fibrosis,
    Parkinson, Creutzfeldt-Jakob (mad cow)
  • Find structural similarities among proteins and
    classify proteins
  • Find functional structural motifs in proteins
  • Predict how proteins bind against other proteins
    and smaller molecules
  • Design new drugs
  • Engineer and design proteins and protein-like
    structures (polymers)

7
Central Dogma of Molecular Biology
8
Central Dogma of Molecular Biology
9
Protein Sequence
(residue i-1)
  • Long sequence of amino-acids (dozens to
    thousands), also called residues
  • Dictionary of 20 amino-acids (several billion
    years old)

10
Protein Sequence
T
11
Central Dogma of Molecular Biology
Physiological conditions aqueous solution,
37C, pH 7, atmospheric pressure
12
Levels of Protein Structures
Quaternary
hemoglobin (4 polypeptide chains)
13
Mostly a-helices
Mostly b-sheets
Mixed
14
Folding
Unfolded (denatured) state
Folded (native) state
15
How (we think) a protein folds ...
DG DH - TDS
http//www-shakh.harvard.edu/ProFold2.html
16
How (we think) a protein folds ...
DG DH - TDS
http//www-shakh.harvard.edu/ProFold2.html
17
How (we think) a protein folds ...
DG DH - TDS
http//www-shakh.harvard.edu/ProFold2.html
18
How (we think) a protein folds ...
DG DH - TDS
http//www-shakh.harvard.edu/ProFold2.html
19
How (we think) a protein folds ...
DG DH - TDS
http//www-shakh.harvard.edu/ProFold2.html
20
Motion of Proteins in Folded State
HIV-1 protease
21
Structural variability of the overall ensemble
of native ubiquitin structures
Shehu, Kavraki, Clementi, 2005
22
Flexible Loop
Loop 7
Amylosucrase
23
Central Dogma of Molecular Biology
24
Binding
Inhibitor binding to HIV protease
Ligand-protein binding
Protein-protein binding
25
Binding of Pyruvate to LDH(reduction of pyruvate
to lactase)

ASP-195

HIS-193
THR-245
Pyruvate
ASP-166
NADH
Nicotinamide adenine dinucleotide (coenzyme)

ARG-169
Lactate dehydrogenase environment
26
What is CS273 about?
  • Algorithms and computational schemes for
    molecular biology problems
  • Molecular biology seen by computer scientists

27
The Shock of Two Cultures
  • y f(x)
  • Biologists like experiments, specifics and
    classifications
  • They like it better to know many (xi,yi) i.e.,
    facts and classify them, than to know f
  • Computer scientists like simulation,
    abstractions, and general algorithms
  • They want to know f the explanation of the
    facts and efficient ways to compute it, but
    rarely care for any (xi,yi)
  • One challenge of Computational Biology is to fuse
    these two cultures

28
? Two Views of a BioComputation Class
  • Where are IT resources for biology available and
    how to use them
  • How to design efficient data structures and
    algorithms for biology

29
Main Ideas Behind CS273
  • The information is in the sequence
  • Sequence ? Structure (shape) ? Function
  • Sequence similarity ? Structural/functional
    similarity
  • Sequences are related by evolution

30
Main Ideas Behind CS273
  • The information is in the sequence
  • Sequence ? Structure (shape) ? Function
  • Sequence similarity ? Structural/functional
    similarity
  • Sequences are related by evolution
  • Biomolecules move and bind to achieve their
    functions
  • Deformation ? folded structures of proteins
  • Motion deformation ? multi-molecule complexes
  • One cannot just jump from sequence to function

Ligand protein binding
Protein folding
31
Sequence
Structure
Function
32
Main Ideas Behind CS273
  • The information is in the sequence
  • Sequence ? Structure (shape) ? Function
  • Sequence similarity ? Structural/functional
    similarity
  • Sequences are related by evolution
  • Biomolecules move and bind to achieve their
    functions
  • Deformation ? folded structures of proteins
  • Motion deformation ? multi-molecule complexes
  • One cannot just jump from sequence to function
  • CS273 is about algorithms
  • for sequence, structure and motion - Finding
    sequence and shape similarities
  • - Relating structure to function - Extracting
    structure from experimental data
  • - Computing and analyzing motion pathways

33
Vision Underlying CS273
  • Goal of computational biology Low-cost
    high-bandwidth in-silico biology
  • Requirements
  • Reliable models ?? Efficient algorithms
  • Algorithmic efficiency by exploiting properties
    of molecules and processes
  • Proteins are long kinematic chains
  • Atoms cannot bunch up together
  • Forces have relatively short ranges
  • Computational Biology is more than using
    computers to biological problems or mimicking
    nature (e.g., performing MD simulation)

34
Tentative Schedule
1 April 5 Introduction
2 April 10 Protein geometric and kinematic models
3 April 12 Conformational space
4 April 17 Inverse kinematics and applications
5 April 19 Sequence similarity
6 April 24 Sequence similarity
7 April 26 Sequence similarity
8 May 1 Structure comparison
9 May 3 Structure comparison
10 May 8 Protein phylogeny, clustering, and classification
11 May 10 Protein phylogeny, clustering, and classification
12 May 15 Energy maintenance
13 May 17 Energy maintenance
14 May 22 Structure prediction
15 May 24 Roadmap methods
16 May 31 Structure prediction
17 June 5 Structure prediction
18 June 7 TBA
19 June 12 Project presentations (2 hours)
35
Instructors and TAs
  • Instructors
  • Serafim Batzoglou
  • Jean-Claude Latombe
  • TA
  • Sam Gross
  • Emails serafim latombe ssgross _at_
    cs.stanford.edu
  • Class website http//cs273.stanford.edu

36
Expected Work
  • Regular attendance to lectures and active
    participation
  • Class scribing (assignments will depend on of
    students)
  • Exciting programming projecthttp//www.stanford.
    edu/class/cs273/project/project.html
  • - Structure prediction
  • - Clustering and distance metrics
  • - Protein design
  • - Something else

37
Questions?
Write a Comment
User Comments (0)
About PowerShow.com