3D Protein - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

3D Protein

Description:

Homepage: http://www.ice.mbt.cuhk.edu.hk ... FOOD POISONING/DISEASES OUTBREAKS IN HONG KONG. Human Genome Project ... In the Twilight Zone ( 25%) How to proceed? ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 78
Provided by: saimin
Category:
Tags: protein | twilight | zone

less

Transcript and Presenter's Notes

Title: 3D Protein


1
3D Protein Prof. Sai Ming Ngai, Ice Office EG05
East Block, Science Centre Lab EG08 East Block,
Science Centre Tel 2609 6025 Email
smngai_at_cuhk.edu.hk Homepage http//www.ice.mbt.cu
hk.edu.hk
2
Comparison between Biotechnology and Computer
(electronics)
  • High volume demand
  • Mass market
  • Advancement fostered by entrepreneurial companies

Abelson, PE (1983) Science
3
FOOD POISONING/DISEASES OUTBREAKS IN HONG KONG
4
Human Genome Project
5
TAGCTTTAGCAAAATCCGTCAAGCAAAATACATCTTCAGTGGGGCAGAAG
ATTATTAAAGATGATATAAA ATCACTTCAGTGTAAACAAAAAGATTTGG
AAAACAGGCTTGCATCTGCTAAGCAGGAGATGGAATGTTGT
CTCAACAACATTCTCAAATCAAAACGCTCAACAGAAAAGAAAGGAAAGTT
TACTCTGCCAGGCAGAGAGA AGCAGGCCACTTCTGATGTGCAGGAGTCT
ACTCAGGAATCAGCTACAGTGGAAAAGTTGGAGGAAGACTG
GGAAATAAACAAGGATTCAGCTGTGGAAATGGCTATGTCAAAACAACTTT
CTCTTAATGCTCAAGAAAGC ATGAAAAACACTGAAGATGAGCGGAAAGT
CAATGAGCTGCAAAATCAACCTTTAGAATTAGATACTATGT
TAAGAAATGAACAATTAGAAGAGATAGAGAAATTATATACCCAGTTGGAA
GCAAAGAAAGCAGCCATTAA GCCACTGGAACAAACAGAATGTCTTAACA
AAACAGAAACTGGGGCCTTGGTTCTCCACAATATAGGATAT
TCGGCACAGCATTTGGACAATTTGCTTCAGGCACTTATTACTTTGAAGAA
AAACAAAGAAAGCCAATATT GTGTCCTCAGAGATTTTCAGGAATACCTT
GCTGCAGTTGAATCTTCAATGAAAGCCTTGTTGACAGACAA
GGAAAGTCTTAAAGTAGGACCACTGGACAGTGTAACGTATCTGGACAAAA
TTAAAAAATTCATAGCATCC ATAGAAAAAGAGAAAGATTCTTTAGGCAA
CTTGAAAATCAAATGGGAGAATTTATCAAACCACGTGACTG
ACATGGATAAGAAATTGTTGGAAAGCCAGATTAAGCAACTTGAACATGGT
TGGGAACAAGTGGAACAGCA GATTCAAAAGAAGTATTCTCAGCAGGTAG
TGGAATATGATGAATTTACAACCCTCATGAATAAGGTACAG
GACACTGAGATTTCTCTGCAACAGCAGCAGCAACATCTACAGTTAAGGCT
GAAGTCTCCAGAAGAACGGG CAGGGAACCAAAGCATGATTGCCTTGACC
ACTGACCTCCAGGCTACCAAGCATGGATTTTCTGTTTTAAA
GGGGCAAGCTGAACTTCAGATGAAGAGGATTTGGGGAGAAAAAGAAAAGA
AGAATTTGGAGGATGGAATA AATAACTTGAAGAAACAATGGGAAACATT
GGAGCCATTACACTTAGAAGCAGAAAATCAGATTAAGAAGT
GTGACATAAGGAACAAGATGAAAGAGACTATCTTATGGGCCAAGAATTTG
TTGGGTGAACTTAATCCCTC CATTCCCCTTCTCCCAGATGACATTCTTT
CACAGATCAGAAAGTGCAAAGTGACACATGATGGCATTCTA
GCTAGGCAGCAGTCTGTGGAATCGTTGGCTGAAGAGGTCAAAGATAAGGT
TCCTAGCCTTACAACCTATG AGGGCGGTGATTTAAATAATACCCTAGAG
GACTTACGGAATCAATACCAAATGCTGGTTTTAAAATCAAC
TCAAAGATCACAGCAATTAGAATTTAAGTTGGAAGAAAGAAGCAATTTTT
TTGCTATAATAAGGAAGTTT CAACTTATGGTTCAAGAAAGTGAAACACT
GATAATTCCCAGGGTGGAGACAGCTGCCACGGAAGCTGAAC
TAAAACATCACCATGTTACTTTGGAGGCATCTCAGAAGGAATTGCAAGAA
ATTGACAGTGGAATCTCAAC ACATCTTCAGGAGCTAACAAACATCTATG
AGGAGCTGAATGTGTTTGAAAGATTATTTCTGGAAGATCAG
TTGAAAAATCTTAAGATTAGGACCAACAGAATACAAAGATTCATTCAGAA
TACATGTAATGAAGTGGAAC ACAAGGTAAAGTTTTGCAGACAATTCCAT
GAAAAAACATCAGCGCTTCAGGAGGAGGCTGACAGTATACA
GCGCAATGAACTATTACTTAATCAAGAAGTAAATAAAGGTGTTAAAGAGG
AGATCTATAATCTTAAAGAC AGACTCACCGCTATTAAGTGTTGCATCTT
ACAGGTATTGAAACTTAAAAAAGTGTTTGACTATATTGGAC
TAAACTGGGATTTTTCACAACTTGACCAATTACAAACCCAAGTATTTGAA
AAAGAAAAGGAACTTGAAGA AAAAATTAAGCAGTTGGACACATTTGAGG
AAGAACATGGCAAATATCAGGCATTATTAAGTAAAATGAGA
GCTATTGATTTGCAAATTAAGAAAATGACTGAAGTAGTACTAAAAGCTCC
TGATAGCTCTCCGGAAAGCA
6
Genome (DNA) -Total DNA content of the haploid
cell -1/2 DNA content of a Diploid cell Proteome
(Protein) -Structural and Relationship (3D /
Function) -The complete protein content of a
Cell/organism (At a given time)
7
Bioinformatics Computation methods employed in
studying life sciences Structural Genomics and
Functional Genomics Proteomics Protein profile
studies Protein-Protein Interactions
Methodology Development
8
The Cell
9
Watson and Crick describe structure of DNA(1953)
10
Central DogmaofMolecular Biology
11
DNAmRNAProtein
Transcription
Reverse Transcription
Translation
Post-Translation Modification PTM
Protein
12
A T(U) G C
13
(No Transcript)
14
Codon Table
R.W. Holley
H.G. Khorana
M.W. Nirenberg
The Nobel Prize in Physiology or Medicine 1968
The mystery underlying the genetic code was
deciphered between 1961-66.
15
(No Transcript)
16
Cellular Biology The study of the chemistry of
life Chemical structure of biomolecules Interactio
ns of biomolecules Synthesis and degradation of
biomolecules (metabolism) Conservation and use of
energy Mechanisms for organizing biomolecules and
coordinating their activities Storage,
transmission and expression of genetic information
17
(No Transcript)
18
The goal of the Gene Ontology Consortium To
produce a dynamic controlled vocabulary that can
be applied to all organisms even as knowledge of
gene and protein roles in cells is accumulating
and changing.
19
(No Transcript)
20
(No Transcript)
21
Genome (DNA) -Total DNA content of the haploid
cell -1/2 DNA content of a Diploid cell Proteome
(Protein) -Structural and Relationship (3D /
Function) -The complete protein content of a
Cell/organism (At a given time)
22
3D Protein modeling Concepts and Protocols
23
Overview
  • Homology Modeling
  • Hands-on exercise
  • Modeling using spdbv
  • Modeling using InsightII

24
Background - Protein-protein interaction
  • Drug target
  • Identify binding interface(s)
  • active site(s) Investigation
  • Drug design

25
Background
  • Interface anatomy has been extensively studied
  • Mutation studies
  • Energy calculations

26
Bond Energy (covalent bond)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Non-Bond Energy (non-covalent bond)
  • Electroststic Interaction (Coulombs law)
  • Hydrogen bonds (dipole-dipole)
  • Van der Waals interaction (hydrophobic)

31
Force fields
  • CHARMM19, 27
  • Cff
  • MM2, 3, 4

32
Objectives
  • Find out intrinsic and extrinsic physicochemical
    factors
  • Visualize and potentially utilize such factors
    for protein-protein recognition site
    identification

33
Determination of dominant thermodynamic factors
?G RT ln Kd ?G ?H T?S
34
Determination of dominant thermodynamic factors

35
NMR X-ray
  • NMR
  • Dynamic
  • Multiple Models (Each conformation is a model)
  • Aqueous environment
  • Limitations
  • Size of molecule
  • lt 30kD
  • Example
  • 1BLQ, 1UBA
  • X-ray
  • Static
  • Only one model
  • Crystal
  • Limitations
  • Not limited by size
  • Examples
  • 7LYZ

36
3D Structure Database
  • PDB
  • Brookhaven National Laboratories
  • Research Collaboratory for Structural
    Bioinformatics (RCSB)-Collaborative effort NIST,
    Rutgers and San Diego Super Computing Facility
  • http//www.rcsb.org
  • Publically available 3-D structures of Proteins,
    Proteins Nucleic Acids (DNA), Proteins
    complexed with metals and inhibitor
  • Experimental methods X-ray and NMR

37
Why 3D Modeling?
  • Rate of structure solving through NMR or X-ray is
    slow compared to the deposition of DNA and
    Protein sequences
  • Crystallization is the bottle-neck (time in
    months/years). No generic recipe for
    crystallization
  • Swiss-Prot Release 42.4 of 14-Nov-2003 138,347
    entries
  • PDB as of 11-Nov-03 has 23,188 structures
  • Membrane proteins are difficult to crystallize
  • 30 of proteome of living things
  • Knowledge of 3D structure is essential for the
    understanding of the protein function
  • Structural information enhances our understanding
    of protein-protein or protein-DNA interactions

38
Comparing Homologous enzymes
Family Ubiquitin Conjugating enzyme 1QCQ
Arabidopsis Thaliana 2AAK Bakers Yeast
Sequence Identity 43
Russell et al, JMB, 269, 423-439 1997
39
Overview of Homology Modeling
Sequence from experiment
Experiments
X-ray, NMR, e-Diffraction
Physicochemical Simulations
Comparative ModelingKnowledge-Based
Modeling
40
http//au.expasy.org/spdbv/text/download.htm
41
Proteomics quantitative and physical mapping of
cellular proteins
A General Concept
42
2D Gel Electrophoresis
43
Contemporary Proteomic Processes
2D-PAGE (1D-PAGE)
3 days (1 day)
Visualization
1-3 hours
In-gel Digestion
overnight
MALDI-TOFMS analysis
lt5 minutes /sample
Database Search using Peptides Masses
Identified
10 minutes /sample
Not identified
Peptide Sequencing (PSD or MS/MS)
1 hour to 1 day
Database Search using Peptide Sequences
10 minutes /sample
44
MALDI ToF Mass Spectrometer
45
Welcome!!
46
Applications of Homology Modeling
  • Ion Channel proteins
  • Transmembrane region-no 3D structure available
  • Used Homology Modeling to build a model for the
    channel protein
  • Used InsightII (Ludi) to model the binding of
    inhibitors
  • Docking to study the drug-receptor interaction

47
Homologous Proteins
  • Homologous Proteins
  • Having a common evolutionary origin
  • Evolved evolutionarily from a common ancestor
  • Many of the essential proteins (key regulators)
    present in humans are also present in other
    living organisms (eg. Rat, bacteria )
  • These essential proteins have to conserve their
    functionality throughout evolution
  • DNA polymerases
  • DNA replication
  • Necessary for all organisms
  • MHC Major Histocompatibility Complex
  • Antigen presentation to trigger an immune
    response
  • Present in higher Eukoryates, rats and humans

48
Sequence Dissimilarity Structural Similarity
  • What we already know about homologous proteins
  • Core region is pretty much conserved (main
    secondary structural features)
  • Most dissimilarity is observed in the surface
    (loop) regions
  • Within homologous proteins secondary-structures
    can move relative to each other or even disappear
    but neither order nor orientation will differ (a
    becoming b etc.)
  • Sequence similarity is less conserved compared to
    Structural similarity

49
Homology Modeling Terminology Basic Assumptions
  • Terminology
  • Protein sequence we are modeling is called the
    Target
  • Homologous protein used in the modeling is called
    the Template
  • Basic Assumptions
  • Similar sequences have similar conformations
  • Core regions provide excellent template for
    modeling the target protein. If the Core regions
    share 50 identity, then the two proteins can
    almost always be superimposed with an RMSD of 1 Å
    or less

50
Overview of Homology Modeling
Bioinformatics Basics Rashidi Buehler
51
Database mining
  • Why Sequence Comparison?
  • Search for potential homolog
  • Identification of evolutionary relationship is
    easy when similarity level Is high (gt50)
  • In a Gene Family how many members are known?
  • For Comparative/Homology Modeling
  • two sequences related by divergence from a common
    ancestor
  • What kind of alignment is this?
  • Global Alignment
  • Overall alignmentsequence homologs with known
    3-D str.
  • Local Alignment
  • Best for searching local domains
  • Gaps cannot be introduced endlessly-Biologically
    meaningless

52
PAM250 Matrix (identities at 20 level)
Tryptophan Highly conserved-Hydrophobic core
residue-Important for the structure-difficult to
mutate.W-gtF, W-gtY (aromatic acids are the next
choice to replace W) Cystein Well-known for S-S
linkage Important for structure
Unitary Matrix
53
Searching for Templates
  • Do a Blast/Fasta or use programs within GCG
    (Align, gap, bestfit, etc.) for sequence
    alignment. Restrict search only to PDB database
  • why PDB?
  • Potentially suitable templates
  • Blast Score lt 0.001 (protein), lt10(-6)
    (nucleotide)
  • Safe threshold is gt 25-30 identity
  • In the Twilight Zone (lt 25) How to proceed?
  • Usually more than one protein is chosen as
    templates?
  • Avoid biasing, to model variants (loops etc),
    side chain conformations
  • Final model will be done using one representative
    template (called reference)

54
Structurally Conserved Region (SCR) Modeling
  • After identifying template(s), the next task is
    to identify the SCR
  • What are SCRs?
  • Inner core (not the surface exposed loops)
  • How do we identify them?
  • Multiple Sequence Alignments, secondary structure
    elements
  • The next step is to align the Structurally
    aligned templates with the unknown sequence
  • No gaps are allowed within the SCR regions
  • Special sequence alignment algorithm used which
    discourages gaps within SCR.

55
Structurally Varibale Region (SVR) Modeling (3
methods)
  • If the reference protein has similar loops then
    it can be copied
  • Perform a database (derived from PDB) search for
    structures with loops
  • Criterion is the conserved residues flanking the
    loop area and the of loop residues
  • Software usually keep a loop database derived
    from PDB.
  • de novo method of building and constrained
    minimization
  • If the number of residues in the template and the
    reference differ

56
Modeling Side Chains
  • Given that each side-chain can be in one of many
    different conformationsMultiple minima problem
  • Following options are generally used
  • If the residues are same
  • Copy the same conformation (whysee scoring
    matrices)
  • If they are different
  • Use built-in libraries based on known info (PDB)
  • Random conformations without any collisions

57
Homology Modeling By Example
58
Template Alignment
  • 5 template lysozymeproteins (only a-C shown)
    structurally uncorrected multiple sequence
    alignment
  • Reference Red
  • Query Sequence violet

59
Studying the corrected template alignment
  • Look at CysHow about theStructural
    Cons-veration?
  • Which regions showstructural variation?

60
Structurally corrected MSA
Made using InsightII, Accelrys
Do you see the location of the variable region
(core or surface)
RMS deviation is kept minimum (lt 1 Angs.)
Structurally corrected MSA
61
Target Core Modeling
  • Target sequence is aligned with the template or
    Structurally Corrected Multiple Sequence
    alignment (in case of templates)
  • Which residues can be aligned to the conserved
    block region of the multiple sequence alignment
    of the reference protein so that one can copy the
    coordinates from the reference to the sequence
  • Do a sequence alignment using a chosen matrix,
    gap penalty etc. of the reference with the model
    sequence

62
Target Core Modeling
  • Target sequence is now aligned with the template
    or Structurally Corrected Multiple Sequence
    alignment (in case of templates)

Made using InsightII, Accelrys
63
Sequence Alignment
Before Aligning the model sequence to the template
Are these insertions reasonable?
Gap insertion, conserved region split
After Aligning the model sequence to the template
Made using InsightII, Accelrys
Gap insertion
64
Suspect the alignment
  • Look at the alignment and if the gaps introduced
    are not in the surface exposed then go examine
    the parameters of the alignment (gap-penalty
    etc.)
  • If the deletions occur at the end-terminus,
    surface exposed, not in any recognized secondary
    structure, then they may be valid deletions
  • Finally, copy the coordinates from each conserved
    group of one of the most similar sequence
    template to the model sequence.

65

66
  • Before alignment 2) wrong alignment parameters 3)
    correct alignmentparameters (higher gap penalty)

1
2
3
67
Loop Modeling
68
Side Chains will be added if the template has
identical residues Make sure side-chains not
clashing with the backbone
69
Final Model
70
Homology Model Evaluation
  • Most automated Homology Modeling software
    provides a model, even with an inappropriate
    template
  • How to judge the quality of the model?
  • Absence of R-factors-No way to evaluate the model
  • Correct models usually have atomic positions
    within the experimental uncertainty limit

71
Final Step Energy Minimization
  • Why? The final model now has backboneside-chains
    loops generated from the template(s)
  • Has atom clashes and non-optimal conformations
  • Choose a program to perform Energy Minimization
    to repair the model structure (bad contacts)
  • Swiss-Model uses GROMOS
  • How many steps of Minimization ?
  • Vacuum (non-solvent)

72
Identifying Incorrect Models
  • Hydrophobic residues exposed
  • Buried polar or ionic residues without the
    charges satisfied (H-bonds, salt-bridge etc)
  • Clashes
  • Unusual bond-lengths, bond-angles
  • Sequence alignment is not-optimal
  • Very large RMSD among the templates

73
Quality of Models
  • Procheck Stereo-chemical quality of the protein
    and residue by residue analysis in figures
    http//www.biochem.ucl.ac.uk/roman/procheck/proch
    eck.html
  • PDBREPORT http//www.cmbi.kun.nl/gv/pdbreport

74
CASP Test of the Models
  • Critical Assessment of Techniques for Protein
    Structure
  • http//predictioncenter.llnl.gov/
  • Showcase for the latest methods in the structure
    prediction area
  • Once in two years
  • Competition open in three areas
  • Homology Modeling, Threading and ab-initio
  • CASP 1998, 2000 2002 showed the reliability of
    Homology Modeling when suitable templatesare
    available (gt30, above Twilight Zone)

75
Database of Homology Models
  • Project, 3D-Crunch (1984)
  • Project submitted all sequences of Swiss-Prot and
    trEMBL to SWISS MODEL server
  • The resulting homology models (64000) are stored
    and available to public from SWISS-MODEL
    Repository
  • Database contains Final models, Entire modeling
    projects including aligned coordinates of
    templates

76
Database of Homology Models
  • ModBase Sali and co-workers
  • Software used Modeller
  • Models were built based on spatial restraints
  • Restraints distances between alpha carbons,
    distances within main-chain etc
  • E-minimization techniques are employed to obtain
    these restraints

77
Amino Acid A. Structure of amino acids Amino
acid contains carboxyl group amino group The
acarbon is a chiral centre or asymmetrical centre
Write a Comment
User Comments (0)
About PowerShow.com