Pfam:%20multiple%20sequence%20alignments%20and%20HMM-profiles%20of%20protein%20domains

About This Presentation

Title:

Pfam:%20multiple%20sequence%20alignments%20and%20HMM-profiles%20of%20protein%20domains

Description:

Pfam: multiple sequence alignments and HMM-profiles of protein domains Xianhui Li 03-02-2004 – PowerPoint PPT presentation

Number of Views:206

Avg rating:3.0/5.0

Slides: 14

Provided by: Com373

Learn more at: http://www.sb.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Pfam:%20multiple%20sequence%20alignments%20and%20HMM-profiles%20of%20protein%20domains

1
Pfam multiple sequence alignments and
HMM-profiles of protein domains

Xianhui Li
03-02-2004

2
Outline

What is Pfam?
What is a Hidden Markove model (the methodology
underlying Pfam)?
How to use Pfam and sample output

3
pfam

Pfam is a database of multiple alignments of
protein domains or conserved protein regions.
The alignments represent some evolutionary
conserved structure which has implications for
the protein's function.
Profile hidden Markov models (profile HMMs) built
from the Pfam alignments can be very useful for
automatically recognizing that a new protein
belongs to an existing protein family, even if
the homology is weak.

4
Overview of Pfam Database

Pfam A contains curated families each with an
associated profile HMM that can be used for
alignment and database searching
Annotation --contains several compulsory fields
Seed alignment a manually verified multiple
alignment of a representative set of sequences
HMM profile turned a multiple sequence
alignment into a position-specific scoring
system.
Full alignment generated automatically from the
seed HMM-profile by searching Swisssprot for all
detectable members and aligning them to the HMM
profile
Pfam B are clustered automatically, allowing
Pfam to be comprehensive

5
Pfam Sequence Database Coverage
residue
Sequence
Data shown is from Pfam v2.0 as of 1998 with 527
families. Current version is Pfam 12.0 (January
2004) contains alignments and models for 7316
protein families, based on the Swissprot 42.5 and
SP-TrEMBL 25.6 protein sequence databases
6
Markov Model

Simplest example Each state emits (or,
equivalently, recognizes) a particular element
with probability 1.

Example sequences 1234 234 14 121214 2123334

7
Probabilistic Emission

If we let the states define a set of emission
probabilities for elements, we can no longer be
sure which state we are in given a particular
element of a sequence BCCD or BCCD ?

8
Hidden Markov Models (HMM)

Emission uncertainty means the sequence doesn't
identify a unique path. The states are
hidden
Probability of a sequence is sum of all paths
that can produce it

p(bccd) 0.5 0.2 0.1 0.3 0.75 0.6
0.8 0.9 0.5 0.7 0.75 0.6
0.2 0.6 0.8 0.9 0.000972
0.013608 0.01458
9
HMMs for homology

Homology model ancestral residue (match)
states, insertion states, deletion states.

10
Profile HMM
11
Searching Pfam

Web site provide users the ability to search
query protein sequences against one, all, or a
few PfamHMM.
_http//www.sanger.ac.uk/Pfam
_http//genome.wustl.edu/Pfam
--http//www.cgr.ki.se/Pfam
. Software Users can use Pfam HMM-profile to
search locally using the freely available
HMMERsoftware package at http//genome.wustle.e
du/eddy/hmmer.htmlhmmer

12
Sample Pfam Query Results
Score Query Start Query End Hmm Start Hmm end Pfam Family Description
97.57 104 153 1 50 DAG_PE bind Phorbol Estser/ diacylglycerol binding domain
92.44 169 216 1 50 DAG_PE-bind Phorbol Estser/ diacylglycerol binding domain
137.88 240 328 1 92 C2 C2 domain
276.16 413 674 1 247 pkinase Eukaryotic protein kinase domain
84.44 675 741 1 69 pkinase_C Protein kinase C terminal domain
70.99 807 857 17 69 pkinase_C Protein kinase C terminal domain
13
Acknowledgements