Pfam:%20multiple%20sequence%20alignments%20and%20HMM-profiles%20of%20protein%20domains - PowerPoint PPT Presentation

About This Presentation
Title:

Pfam:%20multiple%20sequence%20alignments%20and%20HMM-profiles%20of%20protein%20domains

Description:

Pfam: multiple sequence alignments and HMM-profiles of protein domains Xianhui Li 03-02-2004 – PowerPoint PPT presentation

Number of Views:206
Avg rating:3.0/5.0
Slides: 14
Provided by: Com373
Learn more at: http://www.sb.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Pfam:%20multiple%20sequence%20alignments%20and%20HMM-profiles%20of%20protein%20domains


1
Pfam multiple sequence alignments and
HMM-profiles of protein domains
  • Xianhui Li
  • 03-02-2004

2
Outline
  • What is Pfam?
  • What is a Hidden Markove model (the methodology
    underlying Pfam)?
  • How to use Pfam and sample output

3
pfam
  • Pfam is a database of multiple alignments of
    protein domains or conserved protein regions.
  • The alignments represent some evolutionary
    conserved structure which has implications for
    the protein's function.
  • Profile hidden Markov models (profile HMMs) built
    from the Pfam alignments can be very useful for
    automatically recognizing that a new protein
    belongs to an existing protein family, even if
    the homology is weak.

4
Overview of Pfam Database
  • Pfam A contains curated families each with an
    associated profile HMM that can be used for
    alignment and database searching
  • Annotation --contains several compulsory fields
  • Seed alignment a manually verified multiple
    alignment of a representative set of sequences
  • HMM profile turned a multiple sequence
    alignment into a position-specific scoring
    system.
  • Full alignment generated automatically from the
    seed HMM-profile by searching Swisssprot for all
    detectable members and aligning them to the HMM
    profile
  • Pfam B are clustered automatically, allowing
    Pfam to be comprehensive

5
Pfam Sequence Database Coverage
residue
Sequence
Data shown is from Pfam v2.0 as of 1998 with 527
families. Current version is Pfam 12.0 (January
2004) contains alignments and models for 7316
protein families, based on the Swissprot 42.5 and
SP-TrEMBL 25.6 protein sequence databases
6
Markov Model
  • Simplest example Each state emits (or,
    equivalently, recognizes) a particular element
    with probability 1.

Example sequences 1234 234 14 121214 2123334

7
Probabilistic Emission
  • If we let the states define a set of emission
    probabilities for elements, we can no longer be
    sure which state we are in given a particular
    element of a sequence BCCD or BCCD ?

8
Hidden Markov Models (HMM)
  • Emission uncertainty means the sequence doesn't
    identify a unique path. The states are
    hidden
  • Probability of a sequence is sum of all paths
    that can produce it

p(bccd) 0.5 0.2 0.1 0.3 0.75 0.6
0.8 0.9 0.5 0.7 0.75 0.6
0.2 0.6 0.8 0.9 0.000972
0.013608 0.01458
9
HMMs for homology
  • Homology model ancestral residue (match)
    states, insertion states, deletion states.

10
Profile HMM
11
Searching Pfam
  • Web site provide users the ability to search
    query protein sequences against one, all, or a
    few PfamHMM.
  • _http//www.sanger.ac.uk/Pfam
  • _http//genome.wustl.edu/Pfam
  • --http//www.cgr.ki.se/Pfam
  • . Software Users can use Pfam HMM-profile to
    search locally using the freely available
    HMMERsoftware package at http//genome.wustle.e
    du/eddy/hmmer.htmlhmmer

12
Sample Pfam Query Results
Score Query Start Query End Hmm Start Hmm end Pfam Family Description
97.57 104 153 1 50 DAG_PE bind Phorbol Estser/ diacylglycerol binding domain
92.44 169 216 1 50 DAG_PE-bind Phorbol Estser/ diacylglycerol binding domain
137.88 240 328 1 92 C2 C2 domain
276.16 413 674 1 247 pkinase Eukaryotic protein kinase domain
84.44 675 741 1 69 pkinase_C Protein kinase C terminal domain
70.99 807 857 17 69 pkinase_C Protein kinase C terminal domain
13
Acknowledgements
  • Some slides adapted from lectures by Larry Hunter
    at University of Colorado Health Sciences Center
  • Altmann Lab for critical comments
Write a Comment
User Comments (0)
About PowerShow.com