Gene predictions for eukaryotes - PowerPoint PPT Presentation

About This Presentation
Title:

Gene predictions for eukaryotes

Description:

Gene predictions for eukaryotes ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 45
Provided by: pub657
Category:

less

Transcript and Presenter's Notes

Title: Gene predictions for eukaryotes


1
Gene predictions for eukaryotes
  • attgccagtacgtagctagctacacgtatgctattacggatctgtagc
    ttagcgtatctgtatgctgttagctgtacgtacgtatttttctagagctt
    cgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgtt
    agctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgta
    gtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttc
    taggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatc
    tgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtc
    tatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgta
    cgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcg
    tagtcgttagcttagtcgtgtagtcttgatctacgtacgtatttttctag
    agcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatg
    ctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctag
    tcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtat
    ttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgtta
    gcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcg
    tagtctatggctagtcgtagtcgtagtcgttagcatctgtatgtacgtac
    gtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgtt
    agcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgta
    gtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagct
    gtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtag
    tcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttct
    aggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatct
    gtatggtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtat
    ttttctaggggagcttcgtagtctatggctag


2
Gene predictions for eukaryotes
  • attgccagtacgtagctagctacacgtatgctattacggatctgtagc
    ttagcgtatctgtatgctgttagctgtacgtacgtatttttctagagctt
    cgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgtt
    agctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgta
    gtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttc
    taggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatc
    tgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtc
    tatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgta
    cgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcg
    tagtcgttagcttagtcgtgtagtcttgatctacgtacgtatttttctag
    agcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatg
    ctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctag
    tcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtat
    ttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgtta
    gcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcg
    tagtctatggctagtcgtagtcgtagtcgttagcatctgtatgtacgtac
    gtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgtt
    agcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgta
    gtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagct
    gtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtag
    tcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttct
    aggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatct
    gtatggtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtat
    ttttctaggggagcttcgtagtctatggctag


3
Gene predictions for eukaryotes

4
Gene predictions for eukaryotes
  • Three different approaches to computational
    gene-finding
  • Intrinsic use statistical information about
    known genes (Hidden Markov Models)
  • Extrinsic compare genomic sequence with known
    proteins / genes
  • Cross-species sequence comparison search for
    similarities among genomes

5
Hidden-Markov-Models (HMM) for gene prediction
  • 3 5 6 6 6 4 6 5 1 6 5 1 2 s
  • B F F U U U U U F F F F F F E f
  • For sequence s and parse f
  • P(f) probability of f
  • P(f,s) joint probability of f and s
  • P(f) P(sf)
  • P(fs) a-posteriori probability of f


6
Hidden-Markov-Models (HMM) for gene prediction
  • 3 5 6 6 6 4 6 5 1 6 5 1 2
  • B F F U U U U U F F F F F F E
  • Goal find path f with maximum a-posteriori
    probability P(fs)
  • Equivalent find path that maximizes joint
    probability P(f,s)
  • Optimal path calculated by dynamic programming
    (Viterbi algorithm)


7
Hidden-Markov-Models (HMM) for gene prediction
  • 3 5 6 6 6 4 6 5 1 6 5 1 2
  • B F F U U U U U F F F F F F E
  • Program parameters learned from training data


8
Hidden-Markov-Models (HMM) for gene prediction
Application to gene prediction A T A A T G C C
T A G T C s (DNA) Z Z Z E E E E E E I I I I f
(parse) Introns, exons etc modeled as states in
GHMM (generalized HMM) Given sequence s, find
parse that maximizes P(fs) (S. Karlin and C.
Burge, 1997)


9



10
AUGUSTUS
  • Basic model for GHMM-based intrinsic gene finding
    comparable to GenScan (M. Stanke)


11
AUGUSTUS


12
AUGUSTUS


13
AUGUSTUS
  • Features of AUGUSTUS
  • Intron length model
  • Initial pattern for exons
  • Similarity-based weighting for splice sites
  • Interpolated HMM
  • Internal 3 content model


14
Hidden-Markov-Models (HMM) for gene prediction
A T A A T G C C T A G T C s (DNA) Z Z Z E E E
E I I I I f (parse) Explicit intron length
model computationally expensive.


15
AUGUSTUS


Intron length model

Intron (expl.)
Exon
Exon
Intron (geo.)
Intron (fixed)
  • Explicit length distribution for short introns
  • Geometric tail for long introns

16
AUGUSTUS


17
AUGUSTUS

  • Extension of AUGUSTUS using include extrinsic
    information
  • Protein sequences
  • EST sequences
  • Syntenic genomic sequences
  • User-defined constraints

18
Gene prediction by phylogenetic footprinting
  • Comparison of genomic
    sequences
  • (human and mouse)

19
Gene prediction by phylogenetic footprinting

20
AUGUSTUS

  • Extended GHMM using extrinsic information
  • Additional input data collection h of hints
    about possible gene structure f for sequence s
  • Consider s, f and h result of random process.
    Define probability P(s,h,f)
  • Find parse f that maximizes P(fs,h) for given s
    and h.

21
AUGUSTUS

  • Hints created using
  • Alignments to EST sequences
  • Alignments to protein sequences
  • Combined EST and protein alignment (EST
    alignments supported by protein alignments)
  • Alignments of genomic sequences
  • User-defined hints

22
AUGUSTUS



EST
G1
Alignment to EST hint to (partial) exon
23
AUGUSTUS



Protein
EST
G1
EST alignment supported by protein hint to exon
(part), start codon
24
AUGUSTUS



ESTs, Protein
G1
Alignment to ESTs, Proteins hints to introns,
exons
25
AUGUSTUS



G2
G1
Alignment of genomic sequences hint to (partial)
exon
26
AUGUSTUS

  • Consider different types of hints
  • type of hints start, stop, dss, ass, exonpart,
    exon, introns
  • Hint associated with position i in s (exons etc.
    associated with right end position)
  • max. one hint of each type allowed per position
    in s
  • Each hint associated with a grade g that
    indicates its source.

27
AUGUSTUS


hi,t information about hint of type t at
position i hi,t grade, strand, (length,
reading frame) if hint available (hints created
by protein alignments contain information about
reading frame) hi,t if no hint of type t
available at i

28
AUGUSTUS
Standard program version, without hints A T A A
T G C C T A G T C s (sequence) Z Z Z E E E E E
E I I I I f (parse) Find parse that maximizes
P(fs)



29
AUGUSTUS
AUGUSTUS using hints A T A A T G C C T A G T C
s (sequence) X h (type
1) h (type 2)
X h (type 3) . . . . Z Z Z E
E E E E E I I I I f (parse) Find parse that
maximizes P(fs,h)



30
AUGUSTUS


As in standard HMM theory maximize joint
probability P(f,s,h) How to calculate P(f,s,h)
?

31
AUGUSTUS


Simplifying assumption Hints of different types
t and at different positions i independent of
each other (for redundant hints ignore weaker
types).
32
AUGUSTUS


Simplifying assumption Hints of different types
t and at different positions i independent of
each other (for redundant hints ignore weaker
types).
33
AUGUSTUS


Simplifying assumption Hints of different types
t and at different positions i independent of
each other (for redundant hints ignore weaker
types).
34
AUGUSTUS
  • Results
  • Gene (sub-)structures supported by hints receive
    bonus compared to non-supported structures
  • Gene (sub-)structures not supported by hints
    receive malus
  • (M. Stanke et al. 2006, BMC Bioinformatics)

35
AUGUSTUS

36
AUGUSTUS

  • Using hints from DIALIGN alignments
  • Obtain large human/mouse sequence pairs (up to
    50kb) from UCSC
  • Run CHAOS to find anchor points
  • Run DIALIGN using CHAOS anchor points
  • Create hints h from DIALIGN fragments
  • Run AUGUSTUS with hints

37
AUGUSTUS

  • Hints from DIALIGN fragments
  • Consider fragments with score 20
  • Distinguish high scores ( 45) from low scores
  • Consider reading frame given by DIALIGN
  • Consider strand given by DIALIGN
  • gt 222 8 grades

38
AUGUSTUS


EGASP competition to evaulate and compare
gene-prediction methods (Sanger Center,
2005) AUGUSTUS best ab-initio method at EGASP

39
EGASP test results


40
EGASP test results


41
EGASP test results


42
EGASP test results


43
EGASP test results
44
Application of AUGUSTUS in genome projects

  • Brugia malayi (TIGR)
  • Aedes aegypti (TIGR)
  • Schistosoma mansoni (TIGR)
  • Tetrahymena thermophilia (TIGR)
  • Galdieria Sulphuraria (Michigan State Univ.)
  • Coprinus cinereus (Univ. Göttingen)
  • Tribolium castaneum (Univ. Göttingen)
Write a Comment
User Comments (0)
About PowerShow.com