Sequence Based Analysis Tutorial - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Sequence Based Analysis Tutorial

Description:

Based on Pair-Wise Comparisons. Dynamic Programming Algorithms ... Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 39
Provided by: wuc
Category:

less

Transcript and Presenter's Notes

Title: Sequence Based Analysis Tutorial


1
Sequence Based Analysis Tutorial
  • NIH Proteomics Workshop
  • Lai-Su L. Yeh, Ph.D.
  • Protein Science Team Lead
  • Protein Information Resource at
  • Georgetown University Medical Center

2
Retrieval, Sequence Search Classification
Methods
  • Retrieve protein info by text / UID
  • Sequence Similarity Search
  • BLAST, FASTA, Dynamic Programming
  • Family Classification
  • Patterns, Profiles, Hidden Markov Models,
    Sequence Alignments, Neural Networks
  • Integrated Search and Classification System

3
Sequence Similarity Search
  • Based on Pair-Wise Comparisons
  • Dynamic Programming Algorithms
  • Global Similarity Needleman-Wunch
  • Local Similarity Smith-Waterman
  • Heuristic Algorithms
  • FASTA Based on K-Tuples (2-Amino Acid)
  • BLAST Triples of Conserved Amino Acids
  • Gapped-BLAST Allow Gaps in Segment Pairs
  • PHI-BLAST Pattern-Hit Initiated Search
  • PSI-BLAST Position-Specific Iterated Search

4
Sequence Similarity Search
  • Similarity Search Parameters
  • Scoring Matrices Based on Conserved Amino Acid
    Substitution
  • Dayhoff Mutation Matrix, e.g., PAM250 (20
    Identity)
  • Henikoff Matrix from Ungapped Alignments, e.g.,
    BLOSUM 62
  • Gap Penalty
  • Search Time Comparisons
  • Smith-Waterman 10 Min
  • FASTA 2 Min
  • BLAST 20 Sec

5
Feature Representation
  • Features Residue Physicochemical Properties,
    Context (Local Global) Features, Evolutionary
    Features
  • Alternative Alphabets Classification of Amino
    Acids To Capture Different Features of Amino Acid
    Residues

6
Substitution Matrix
  • Likelihood of One Amino Acid Mutated into Another
    Over Evolutionary Time
  • Negative Score Unlikely to Happen (e.g.,
    Gly/Trp, -7)
  • Positive Score Conservative Substitution (e.g.,
    Lys/Arg, 3)
  • High Score for Identical Matches Rare Amino
    Acids (e.g., Trp, Cys)

7
BLAST
  • BALST (Basic Local Alignment Search Tool)
  • Extremely fast
  • Robust
  • Most frequently used
  • It finds very short segment pairs (seeds)
    between the query and the database sequence
  • These seeds are then extended in both directions
    until the maximum possible score for extensions
    of this particular seed is reached

8
BLAST Search
  • From BLAST Search Interface
  • Table-Format Result with BLAST Output and SSEARCH
    (Smith-Waterman) Pair-Wise Alignment

9
BLAST/SSEARCH Results
10
Family Classification Methods
  • Based on Family Information
  • ClustalW Multiple Sequence Alignment
  • ProSite Pattern Search
  • Profile Search
  • Hidden Markov Models (HMMs)
  • Neural Networks
  • Integrated Analysis

11
Multiple Sequence Alignment
  • ClustalW
  • Progressive Pairwise Approach
  • Base on Exhaustive Pairwise Alignments
  • Neighbor Joining
  • Joining Order Corresponding to a Tree
  • Alignment Varies
  • Dependent on Joining Order

12
How do you build a tree?
  • Pick sequences to align
  • Align them
  • Verify the alignment
  • Keep the parts that are aligned correctly
  • Build and evaluate a phylogenetic tree

13
Multiple Alignment and Tree
  • From Text/Sequence Search Result or ClustalW
    Alignment Interface

14
(No Transcript)
15
Motif Patterns (Regular Expressions)
  • Signature Patterns for Functional Motifs

ProClass Motif Alignments
16
PIR Pattern Search
  • From Text/Sequence Search Result or Pattern
    Search Interface
  • One Query Sequence Against PROSITE Pattern
    Database
  • One Query Pattern (PROSITE or User-Defined)
    Against Sequence DB

17
Pattern Search Result (I)
  • One Query Sequence Against PROSITE Pattern
    Database

18
Pattern Search Result (II)
  • One Query Pattern Against Sequence Database

19
Profile Method
  • Profile A Table of Scores to Express Family
    Consensus Derived from Multiple Sequence
    Alignments
  • Num of Rows Num of Aligned Positions
  • Each row contains a score for the alignment with
    each possible residue.
  • Profile Searching
  • Summation of Scores for Each Amino Acid Residue
    along Query Sequence
  • Higher Match Values at Conserved Positions

20
PIR HMM Domain/Motif Search
  • From Text/Sequence Search Result or HMM Search
    Interface
  • HMMER Model Building Sequence Search
  • Search One Query Protein Against All HMMs
  • Search One HMM Against Sequence DB

21
HMM Search Result (I)
  • One Query Protein Against All Pfam HMMs

22
HMM Search Result (II)
  • Search User-Built HMM Against Protein Sequence DB
  • Input Sequences (Optional Residue Ranges) -gt
    Multiple Sequence Alignment -gt Model Building -gt
    HMM Search

23
Secondary Structure Features
  • a Helix Patterns of Hydrophobic Residue
    Conservation Showing I, I3, I4, I7 Pattern Are
    Highly Indicative of an a Helix (Amphipathic)
  • b Strands That Are Half Buried in the Protein
    Core Will Tend to Have Hydrophobic Residues at
    Positions I, I2, I4, I6

24
Integrated Bioinformatics System for Function and
Pathway Discovery
  • Data Integration
  • Associative Analysis

25
Analytical Pipeline
26
Integrated Bioinformatics System
  • Global Bioinformatics Analysis of 1000s of Genes
    and Proteins
  • Pathway Discovery, Target Identification

27
Lab Section
28
Peptide Search Results
29
Blast Similarity Search
30
Blast Search Results
31
Pair-Wise Alignment
32
Multiple Sequence Alignment
33
Pattern Search Results
34
HMM Domain Search Result
35
Building HMM Profile
36
Using HMM Profile for Searching
37
Rabbit Alpha Crystallin A Chain An iProClass View
of the entry
38
alpha-Crystallin and Related Proteins
Write a Comment
User Comments (0)
About PowerShow.com