Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Introduction to Bioinformatics

Description:

Big Picture only. May use as starting point ... Birds are dinosaurs cladistic perspective. Applications Building Tree of Life ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 19
Provided by: chiche
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics


1
Introduction to Bioinformatics
  • Phylogenetics
  • Part III
  • Character-Based Methods

2
Distance Methods
  • Complexity
  • Distance-based methods much faster than other
    methods
  • Commonly used in multiple sequence alignment
  • UPGMA PILEUP
  • Neighbor-joining CLUSTALW http//www.ebi.ac.uk/c
    lustalw/
  • Problems
  • Both UPGMA neighbor-joining are greedy
    heuristics
  • Possible to be trapped in local maxima (no
    backtracking)
  • Output is a single tree, even if many equal-cost
    alternatives
  • Big Picture only
  • May use as starting point
  • Tree generated provides upper bound for
    branch-and-bound
  • Initial tree for probabilistic branch-swapping
    techniques
  • Other approaches?

3
Character-Based Methods
  • Maximum Parsimony (MP) Fitch 1971
  • Minimize number of sequence changes in tree
  • Assume fewest changes (mutations) most likely
    (evolution)
  • Informative site
  • Position with useful change information (for
    parsimony)
  • I.e., of changes in position dependent on tree
    chosen
  • Must have 2 different bases / residues, such
    that each base / residue appears in 2 sequences

4
MP Example
  • What are the informative sites in this example?
  • Build distance matrix

5
(No Transcript)
6
MP Example
  • Most parsimonious tree
  • Tree with fewest total of changes at
    informative sites
  • Continue with our example
  • Informative sites
  • Seq1 GG
  • Seq2 GT
  • Seq3 AG
  • Seq4 AT
  • Site changed
  • Tree 1 __
  • Tree 2 __
  • Tree 3 __
  • Which tree?

7
MP Method
  • Algorithm
  • Generate all possible tree topologies
  • Count number of changes required
  • Select tree with minimum changes
  • Use branch-and-bound to reduce search
  • Search trees with increasing of leaves
  • Abandon subtree when changes best completed
    tree
  • Characteristics
  • Computationally expensive
  • Analyze only informative sites
  • Misleading if rates of changes vary among
    branches
  • Evolution is not always parsimonious

8
MP Method
  • Can infer ancestors
  • An internal node
  • Intersection of two children, if it is not empty
  • Union of two children, otherwise
  • unions substitutions

(GT)
9
Tree Construction Issues
  • Selecting tree construction algorithm
  • If strong sequence similarity ? maximum parsimony
  • If clearly recognizable sequence similarity ?
    distance methods
  • Otherwise ? maximum likelihood
  • Determining statistical significance
  • Multiple tree shapes possible
  • Find probability that tree shape is as described
  • Sample by bootstrapping Efron Tibshirani
    1993
  • Generate artificial data set by repeatedly
    selecting random columns of alignment
    (pseudo-alignment) with replacement
  • Build tree for pseudo-alignments many (1000)
    times
  • Frequency phylogenetic feature appears ?
    confidence level

10
Tree with Bootstrap Values
Source http//fungal.genome.duke.edu/images/fungi
_subset_tree.jpg
11
Phylogenetics Issues
  • Gene trees vs species trees
  • Gene duplication can complicate phylogenetic
    analysis
  • Paralogues (duplicated genes) do not fit in
    evolutionary tree
  • Choice of target sequence type
  • Ribosomal RNA (slowest change / mutation rate)
  • Use for very long-term evolutionary studies,
    spanning species boundaries biological kingdoms
  • DNA / RNA (fastest change / mutation rate)
  • Use for short-term studies of closely-related
    species
  • Contains more evolutionary information than
    protein
  • Protein (medium change / mutation rate)
  • Use for wide species comparisons
  • More reliable alignment than DNA

12
Phylogenetics Summary
  • Phylogenetic prediction
  • Infer evolutionary relationships from shared
    features
  • May have application to sequence alignment,
    epidemiology
  • Phenotypic vs. genetic (i.e., molecular)
    characteristics
  • Phylogenetic trees
  • May be ultrametric and / or additive
  • Tree construction
  • Inexpensive distance-based (UPGMA,
    neighbor-joining)
  • Expensive (exhaustive) tree searches (parsimony,
    likelihood)
  • Assessing phylogenetic trees
  • Algorithms always produce some tree (of varying
    accuracy)
  • Expert biology knowledge to assess correctness /
    significance

13
Phenetics vs Cladistics
  • Phenetics uses all the data
  • Uses overall similarity to group taxa not
    necessarily evolutionary 
  • Any kind of object could be subjected to a
    phenetic analysis 
  • Taxa that are more similar are grouped together 
  • Cladistics uses only informative sites
  • Taxa are grouped together based on patterns of
    sharing of derived character states
  • Taxa sharing a derived character state do so
    because they inherited this character state
    through a common ancestor
  • Advantages of the cladistic approach
  • Less susceptible to such rate variation
  • Shared, derived character states won't mislead
    you
  • Birds are dinosaurs cladistic perspective

14
Applications Building Tree of Life
15
Applications Building Tree of Life
Source http//www.isem.univ-montp2.fr/PPP/PM/RES/
Phylo/Mamm/PHYLMOL-Placentalia7EEnglish.jpg
16
Application - CSI
  • Which patients are more likely to be infected by
    the dentist?

Source http//trc.ucdavis.edu/djbegun/Lect_12.1.h
tml
17
Application -Human migration
  • Based on mtDNA genome
  • Africans have twice as much diversity among them
    as do non-Africans ? Africans have a longer
    genetic history
  • More recent population expansion for non-Africans
  • Africans and non-Africans diverged recently
  • Out of Africa

Source Discovering Genomics, Proteomics,
Bioinformatics, by Campbell Heyer
18
Software
  • http//evolution.genetics.washington.edu/phylip/so
    ftware.html
  • 301 of the phylogeny packages
  • 39 free servers
Write a Comment
User Comments (0)
About PowerShow.com