Pairwise Sequence Analysis - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Pairwise Sequence Analysis

Description:

What algorithms are used? Which are the existing implementations? ... Isawa - danc-er. Isawablackpanther. Similarity based scoring. Eyl. dkv. 9. Algorithms ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 13
Provided by: Chi66
Category:

less

Transcript and Presenter's Notes

Title: Pairwise Sequence Analysis


1
Pairwise Sequence Analysis
  • Why is it useful?
  • What are the underlying concepts?
  • What algorithms are used?
  • Which are the existing implementations?
  • Limitations/Open questions

2
Why
  • Given sequence A, what are its properties?
  • Find any sequence(s) B similar to A, where B ?
    Sequences with known properties
  • Given sequences A B, are they
  • Identical?
  • Similar?
  • Evolutionary related?
  • Given sequence A, is this a new discovery?
  • Given a set of sequences, can they be clustered
    into families of related sequences?

3
Concepts
  • Use of a scoring scheme to assign a value to a
    candidate alignment
  • Finding the best alignment between two sequences
  • Use of a probability model to assess the
    significance of the similarity

4
Concepts - Alignment
  • Sequences may be aligned locally (look for
    regions/subsequences that are similar) or
    globally (align along entire length)
  • The result of a local alignment may differ from
    that a global alignment

5
Concepts - Alignment
  • Potentially large number of alignments to be
    considered
  • Each alignment is a path through a fully
    connected dot matrix graph, i.e., a subset of the
    set of all diagonal edges, connected by
    horizontal or vertical lines. Complexity?

6
Concepts Scoring scheme
  • Scheme may be based on identity or on similarity
  • A match is assigned a positive score
  • Gaps are assigned a penalty
  • Scores are summed for every aligned position
  • Highest scoring alignment(s) presented

7
Concepts Scoring scheme
  • Scores are based on amino-acid substitution
    matrices (log-odds ratio of related versus random
    substitutions)
  • PAM (Percent amino acid substitution) matrices
    Based on evolutionary model. E.g., PAM 1, PAM
    250.
  • BLOSUM (Blocks substitution) matrices Based on
    percent identity. E.g., Blosum50, Blosum62.

8
Concepts Scoring scheme
  • Identity based scoring
  • Isawa --- dancer
  • Isawatapdancer
  • Isawa ----- danc-er
  • Isawablackpanther
  • Similarity based scoring
  • Eyl
  • dkv

9
Algorithms
  • Optimal/Exact solutions
  • Take longer time
  • Typically used for comparing a small number of
    sequences
  • (Needleman-Wunsch Smith-Waterman)
  • Heuristic
  • Frequently close to the optimal solution
  • Rapid
  • Typically used for searches large number of
    sequences
  • (BLAST, FASTA)

10
Algorithms Optimal solution
  • Dynamic Programming
  • Generate a matrix of optimal sub-scores
  • Keep a record of how sub-scores were derived
  • Retrieve path from the graph with the maximum
    score by using sub-scores and the record of how
    they were derived

11
Existing Implementations
  • Smith-Waterman http//www.ch.embnet.org/software/
    FDF_form.html
  • SIM
  • http//us.expasy.org/tools/sim-prot.html
  • BLAST
  • http//www.ncbi.nlm.nih.gov/BLAST/
  • FASTA http//www.ebi.ac.uk/fasta33/

12
Open Questions/Limitations
  • Highly similar sequences or highly dissimilar
  • sequences are easily found BUT
  • Gray area lies in weak similarity probability
    distribution is continuous
  • Other forms of similarity (e. g., structural
    similarity) are taken into consideration to reach
    a decision
  • Sequence similarity per se does not imply
    functional equivalence small difference can be
    responsible for large difference in function
Write a Comment
User Comments (0)
About PowerShow.com