1-Month Practical Master Course - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

1-Month Practical Master Course

Description:

1-Month Practical Master Course. Genome Analysis. Jaap Heringa ... Salamander 100 109. Amoeba dubia 670 109. Three main principles. DNA makes RNA makes Protein ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 37
Provided by: heri4
Learn more at: http://www.ibi.vu.nl
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: 1-Month Practical Master Course


1
1-Month Practical Master Course Genome
AnalysisJaap Heringa Centre for Integrative
Bioinformatics VU (IBIVU) Vrije Universiteit
Amsterdam The Netherlands www.ibivu.cs.vu.nl heri
nga_at_cs.vu.nl
2
(No Transcript)
3
Biological Sequence AnalysisPair-wise sequence
alignment Residue exchange matrices Multiple
sequence alignment Phylogeny
4
DNA sequence
.....acctc ctgtgcaaga acatgaaaca nctgtggttc
tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg
ggcccaggac tggggaagcc tccagagctc aaaaccccac
ttggtgacac aactcacaca tgcccacggt gcccagagcc
caaatcttgt gacacacctc ccccgtgccc acggtgccca
gagcccaaat cttgtgacac acctccccca tgcccacggt
gcccagagcc caaatcttgt gacacacctc ccccgtgccc
ccggtgccca gcacctgaac tcttgggagg accgtcagtc
ttcctcttcc ccccaaaacc caaggatacc cttatgattt
cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag
ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac
ggcgtggagg tgcataatgc caagacaaag ctgcgggagg
agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac
cgtcctgcac caggactggc tgaacggcaa ggagtacaag
tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg
cctggtcaaa ggcttctacc ccagcgacat cgccgtggag
tgggagagca atgggcagcc ggagaacaac tacaacacca
cgcctcccat gctggactcc gacggctcct tcttcctcta
cagcaagctc accgtggaca agagcaggtg gcagcagggg
aacatcttct catgctccgt gatgcatgag gctctgcaca
accgctacac gcagaagagc ctctc.....
5
Genome size
  • Organism Number of base pairs
  • ?X-174 virus 5,386
  • Epstein Bar Virus 172,282
  • Mycoplasma genitalium 580,000
  • Hemophilus Influenza 1.8 ? 106
  • Yeast (S. Cerevisiae) 12.1 ? 106
  • Human 3.2 ? 109
  • Wheat 16 ? 109
  • Lilium longiflorum 90 ? 109
  • Salamander 100 ? 109
  • Amoeba dubia 670 ? 109

6
Three main principles
  • DNA makes RNA makes Protein
  • Structure more conserved than sequence
  • Sequence Structure Function

7
Regulation, signalling cascades, chaperonins,
compartmentalisation
8
How to go from DNA to protein sequence
A piece of double stranded DNA 5
attcgttggcaaatcgcccctatccggc 3 3
taagcaaccgtttagcggggataggccg 5
DNA direction is from 5 to 3
9
How to go from DNA to protein sequence
6-frame translation using the codon table (last
lecture) 5 attcgttggcaaatcgcccctatccggc
3 3 taagcaaccgtttagcggggataggccg 5
10
Evolution and three-dimensional protein structure
information
Isocitrate dehydrogenase The distance from the
active site (in yellow) determines the rate of
evolution (red fast evolution, blue slow
evolution)
Dean, A. M. and G. B. Golding Pacific Symposium
on Bioinformatics 2000
11
Protein Sequence-Structure-Function
Ab initio prediction and folding
Sequence Structure Function
Threading
Function prediction from structure
Homology searching (BLAST)
12
Widely used tool for homology detection PSI-BLAST
  • Heuristic tool to cut down computations required
    for database searching (1M sequences in DB)
  • Sensitivity gained by iteratively finding hits
    (local alignments) and repeating search

Q
hits
DB
T
PSSM
13
Threading
Template sequence
Compatibility score
Query sequence
Template structure
14
Threading
Template sequence
Compatibility score
Query sequence
Template structure
15
Fold recognition by threading
Fold 1 Fold 2 Fold 3 Fold N
Query sequence
Compatibility scores
16
Bioinformatics
  • Nothing in Biology makes sense except in the
    light of evolution (Theodosius Dobzhansky
    (1900-1975))
  • Nothing in bioinformatics makes sense except in
    the light of Biology

17
Divergent evolution
  • Ancestral sequence ABCD
  • ACCD (B C)
    ABD (C ø)
  • ACCD or ACCD
    Pairwise Alignment
  • AB-D A-BD

mutation deletion
18
Divergent evolution
  • Ancestral sequence ABCD
  • ACCD (B C)
    ABD (C ø)
  • ACCD or ACCD
    Pairwise Alignment
  • AB-D A-BD

mutation deletion
true alignment
19
Mutations under divergent evolution
G
(a)
G
(b)
Ancestral sequence
G
C
A
C
One substitution - one visible
Two substitutions - one visible
Sequence 1
Sequence 2
G
(c)
G
(d)
1 ACCTGTAATC 2 ACGTGCGATC D 3/10
(fraction different sites (nucleotides))
G
A
A
A
Back mutation - not visible
Two substitutions - none visible
G
20
Convergent evolution
  • Often with shorter motifs (e.g. active sites)
  • Motif (function) has evolved more than once
    independently, e.g. starting with two very
    different sequences adopting different folds
  • Sequences and associated structures remain
    different, but (functional) motif can become
    identical
  • Classical example serine proteinase and
    chymotrypsin

21
Serine proteinase (subtilisin) and chymotrypsin
  • Different evolutionary origins, no sequence
    similarity
  • Similarities in the reaction mechanisms.
    Chymotrypsin, subtilisin and carboxypeptidase C
    have a catalytic triad of serine, aspartate and
    histidine in common serine acts as a
    nucleophile, aspartate as an electrophile, and
    histidine as a base.
  • The geometric orientations of the catalytic
    residues are similar between families, despite
    different protein folds.
  • The linear arrangements of the catalytic residues
    reflect different family relationships. For
    example the catalytic triad in the chymotrypsin
    clan (SA) is ordered HDS, but is ordered DHS in
    the subtilisin clan (SB) and SDH in the
    carboxypeptidase clan (SC).

22
A protein sequence alignment MSTGAVLIY--TSILIKECHA
MPAGNE----- ---GGILLFHRTHELIKESHAMANDEGGSNNS
A DNA sequence
alignment attcgttggcaaatcgcccctatccggccttaa att---
tggcggatcg-cctctacgggcc----

23
What can sequence tell us about structure (HSSP)
Sander Schneider, 1991
24
Searching for similarities What is the function
of the new gene? The lazy investigation (i.e.,
no biologial experiments, just bioinformatics
techniques) Find a set of similar protein
sequences to the unknown sequence Identify
similarities and differences For long proteins
identify domains first
25
  • Evolutionary and functional relationships
  • Reconstruct evolutionary relation
  • Based on sequence
  • -Identity (simplest method)
  • -Similarity
  • Homology (common ancestry the ultimate goal)
  • Other (e.g., 3D structure)
  • Functional relation
  • Sequence Structure Function

26
Searching for similarities
Common ancestry is more interesting Makes it
more likely that genes share the same
function Homology sharing a common ancestor a
binary property (yes/no) its a nice tool When
(an unknown) gene X is homologous to (a known)
gene G it means that we gain a lot of information
on X what we know about G can be transferred to
X as a good suggestion.
27
Biological definitions for related sequences
  • Homologues are similar sequences in two different
    organisms that have been derived from a common
    ancestor sequence. Homologues can be described
    as either orthologues or paralogues.
  • Orthologues are similar sequences in two
    different organisms that have arisen due to a
    speciation event. Orthologs typically retain
    identical or similar functionality throughout
    evolution.
  • Paralogues are similar sequences within a single
    organism that have arisen due to a gene
    duplication event.
  • Xenologues are similar sequences that do not
    share the same evolutionary origin, but rather
    have arisen out of horizontal transfer events
    through symbiosis, viruses, etc.

28
How to evolve
  • Important distinction
  • Orthologues homologous proteins in different
    species (all deriving from same ancestor)
  • Paralogues homologous proteins in same species
    (internal gene duplication)
  • In practice to recognise orthology,
    bi-directional best hit is used in conjunction
    with database search program (this is called an
    operational definition)

29
So this means
Source http//www.ncbi.nlm.nih.gov/Education/BLAS
Tinfo/Orthology.html
30
Example today Pairwise sequence alignment needs
sense of evolution Global dynamic programming
MDAGSTVILCFVG
Evolution
M D A A S T I L C G S
Amino Acid Exchange Matrix
Search matrix
MDAGSTVILCFVG-
Gap penalties (open,extension)
MDAAST-ILC--GS
31
How to determine similarity Frequent evolutionary
events at the DNA level 1. Substitution 2.
Insertion, deletion 3. Duplication 4. Inversion
We will restrict ourselves to these events
32
nucleotide one-letter code
A DNA sequence alignment attcgttggcaaatcgcccctatcc
ggccttaa att---tggcggatcg-cctctacgggcc----
A protein sequence
alignment MSTGAVLIY--TSILIKECHAMPAGNE----- ---GGIL
LFHRTHELIKESHAMANDEGGSNNS

amino acid one-letter code
33
Dynamic programmingScoring alignments
Substitution (or match/mismatch) DNA
proteins Gap penalty Linear gp(k)ak
Affine gp(k)bak Concave, e.g.
gp(k)log(k) The score for an alignment is the
sum of the scores over all alignment columns
34
Dynamic programmingScoring alignments
Sa,b - gp(k) gapinit
k?gapextension affine gap penalties
35
DNA define a score for match/mismatch of
letters Simple Used in genome
alignments
A C G T
A 1 -1 -1 -1
C -1 1 -1 -1
G -1 -1 1 -1
T -1 -1 -1 1
A C G T
A 91 -114 -31 -123
C -114 100 -125 -31
G -31 -125 100 -114
T -123 -31 -114 91
36
Dynamic programmingScoring alignments
T D W V T A L K T D W L - - I K
20?20
10
1
Affine gap penalties (open, extension)
Amino Acid Exchange Matrix
Score s(T,T)s(D,D)s(W,W)s(V,L)-Po-2Px
s(L,I)s(K,K)
About PowerShow.com