Sequence Based Analysis Tutorial - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Sequence Based Analysis Tutorial

Description:

Sequence Based Analysis Tutorial – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 52
Provided by: wuc
Category:

less

Transcript and Presenter's Notes

Title: Sequence Based Analysis Tutorial


1
Sequence Based Analysis Tutorial
  • NIH Proteomics Workshop
  • Lai-Su L. Yeh, Ph.D.
  • Protein Information Resource at
  • Georgetown University Medical Center

2
Retrieval, Sequence Search Classification
Methods
  • Retrieve protein info by text / UID
  • Sequence Similarity Search
  • BLAST, FASTA, Dynamic Programming
  • Family Classification
  • Patterns, Profiles, Hidden Markov Models,
    Sequence Alignments, Neural Networks
  • Integrated Search and Classification System

3
Sequence Similarity Search (I)
  • Based on Pair-Wise Comparisons
  • Dynamic Programming Algorithms
  • Global Similarity Needleman-Wunch
  • Local Similarity Smith-Waterman
  • Heuristic Algorithms
  • FASTA Based on K-Tuples (2-Amino Acid)
  • BLAST Triples of Conserved Amino Acids
  • Gapped-BLAST Allow Gaps in Segment Pairs
  • PHI-BLAST Pattern-Hit Initiated Search
  • PSI-BLAST Position-Specific Iterated Search

4
Sequence Similarity Search (II)
  • Similarity Search Parameters
  • Scoring Matrices Based on Conserved Amino Acid
    Substitution
  • Dayhoff Mutation Matrix, e.g., PAM250 (20
    Identity)
  • Henikoff Matrix from Ungapped Alignments, e.g.,
    BLOSUM 62
  • Gap Penalty
  • Search Time Comparisons
  • Smith-Waterman 10 Min
  • FASTA 2 Min
  • BLAST 20 Sec

5
Feature Representation
  • Features of Amino Acids Physicochemical
    Properties, Context (Local Global) Features,
    Evolutionary Features
  • Alternative Amino Acids Classification of Amino
    Acids To Capture Different Features of Amino Acid
    Residues

6
Substitution Matrix
  • Likelihood of One Amino Acid Mutated into Another
    Over Evolutionary Time
  • Negative Score Unlikely to Happen (e.g.,
    Gly/Trp, -7)
  • Positive Score Conservative Substitution (e.g.,
    Lys/Arg, 3)
  • High Score for Identical Matches Rare Amino
    Acids (e.g., Trp, Cys)

7
BLAST
  • BALST (Basic Local Alignment Search Tool)
  • Extremely fast
  • Robust
  • Most frequently used
  • It finds very short segment pairs (seeds)
    between the query and the database sequence
  • These seeds are then extended in both directions
    until the maximum possible score for extensions
    of this particular seed is reached

8
BLAST Search
  • From BLAST Search Interface
  • Table-Format Result with BLAST Output and SSEARCH
    (Smith-Waterman) Pair-Wise Alignment

Link to NCBI taxonomy
Link to PIRSF report
Click to see alignment
Links to iProClass and UniProtKB reports
Click to see SSearch alignment
9
Blast Result Pairwise Alignment
BLAST Aligment
10
Classification
  • What is classification?
  • Why do we need protein classification?
  • Different levels of classification
  • Basis for functional protein classification
  • How to classify a protein of unknown function?

11
Classification Databases
  • Protein motif
  • Protein domain
  • 3-D structure
  • Whole-protein

12
Family Classification Methods
  • Based on Other Classification Information
  • Multiple Sequence Alignment (ClustalW)
  • ProSite Pattern Search
  • Profile Search
  • Hidden Markov Models (HMMs)
  • Domain (Pfam) Whole protein (PIRSF)
  • Neural Networks

13
How do you build a tree?
  • Pick sequences to align
  • Align them
  • Verify the alignment
  • Keep the parts that are aligned correctly
  • Build and evaluate a phylogenetic tree
  • Integrated Analysis

14
Multiple Sequence Alignment
  • ClustalW
  • Progressive Pairwise Approach
  • Base on Exhaustive Pairwise Alignments
  • Neighbor Joining
  • Joining Order Corresponding to a Tree
  • Alignment Varies
  • Dependent on Joining Order

15
Multiple Alignment and Tree
  • From Text/Sequence Search Result or ClustalW
    Alignment Interface

16
(No Transcript)
17
Motif Patterns (Regular Expressions)
  • Signature Patterns for Functional Motifs

ProClass Motif Alignments
18
PIR Pattern Search
  • From Text/Sequence Search Result or Pattern
    Search Interface
  • One Query Sequence Against PROSITE Pattern
    Database
  • One Query Pattern (PROSITE or User-Defined)
    Against Sequence DB

19
Pattern Search Result (I)
  • One Query Sequence Against PROSITE Pattern
    Database

20
Pattern Search Result (II)
  • One Query Pattern Against Sequence Database

21
Profile Method
  • Profile A Table of Scores to Express Family
    Consensus Derived from Multiple Sequence
    Alignments
  • Num of Rows Num of Aligned Positions
  • Each row contains a score for the alignment with
    each possible residue.
  • Profile Searching
  • Summation of Scores for Each Amino Acid Residue
    along Query Sequence
  • Higher Match Values at Conserved Positions

22
PIRSF scan
1
Shows PIRSF that the query belongs to
  • Search One Query Protein Against all the
    Full-length and Domain HMM models for the fully
    curated PIRSFs by HAMMER
  • The matched regions and statistics will be
    displayed.

Statistical data for all domains
Statistical data per domain
Alignment with consensus sequence
23
Secondary Structure Features
  • a Helix Patterns of Hydrophobic Residue
    Conservation Showing I, I3, I4, I7 Pattern Are
    Highly Indicative of an a Helix (Amphipathic)
  • b Strands That Are Half Buried in the Protein
    Core Will Tend to Have Hydrophobic Residues at
    Positions I, I2, I4, I6

24
3D Structure
Proteins share the same fold suggesting homology
Beta B1 Crystallin
Gamma Crystallin C
25
Creation and Curation of PIRSFs
26
Integrated Bioinformatics System for Function and
Pathway Discovery
  • Data Integration
  • Associative Analysis

27
Analytical Pipeline
28
Integrated Bioinformatics System
  • Global Bioinformatics Analysis of 1000s of Genes
    and Proteins
  • Pathway Discovery, Target Identification

29
Lab Section
30
Text Search
31
Text Search Result (I)
Extend your search or start over
Choose columns to be displayed
Expand view
Pre-computed BLAST Results
Links to iProClass and UniProtKB reports
Link to NCBI taxonomy
Link to PIRSF report
32
Text Search Result (III)
Number of Related Seq. at 3 different E-value
cut-offs
33
Text Search Result (II)
Extend your search or start over
Choose columns to be displayed
Curated domain architecture with links to
Pfam database
Link to PIRSF report
Extent of family curation
34
Peptide Search
35
Peptide Search Results
36
Batch Retrieval Results (I)
Retrieve more sequences
37
Batch Retrieval Results (II)
38
Blast Similarity Search
39
Blast Search Results
40
Blast / Related Sequences Results
41
Blast Result Pairwise Alignment
BLAST Aligment
42
Pairwise Alignment
43
Multiple Alignment Interactive Phylogenetic Tree
and Alignment
44
Phylogenetic Tree and Alignment View
45
Pattern Search (I)
46

Pattern Search (II)
47
PIRSF scan
48
PIRSF Report
49
PIRSF Family Hierarchy
50
Taxonomic Distribution Phylogenetic Pattern
51
Rabbit Alpha Crystallin A Chain An iProClass View
of the entry
Pre-computed BLAST results
See protein synonyms
See IDs from different databases
52
alpha-Crystallin and Related Proteins
Write a Comment
User Comments (0)
About PowerShow.com