Scoring Matrices - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Scoring Matrices

Description:

Scoring Matrices. Scoring matrices, PSSMs, and HMMs. BIO520 ... Constructing a consensus ... RF no. CS yes. MAP yes. COM ../src/hmmbuild -F rrm.hmm ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 18
Provided by: jiml7
Category:

less

Transcript and Presenter's Notes

Title: Scoring Matrices


1
Scoring Matrices
  • Scoring matrices, PSSMs, and HMMs

BIO520 Bioinfromatics Jim Lund
2
Alginment scoring matrix
  • DNA matrix
  • A C G T
  • A 5 -4 -4 -4
  • T -4 5 -4 -4
  • C -4 -4 5 -4
  • G -4 -4 -4 5

3
Alginment scoring matrix
  • Protein matrix

4
Use of a scoring matrix
  • P L S - - C F G
  • G L T - A C H L
  • 111-2-1111
  • Score 3

5
Consensus sequences
  • Different ways to describe a consensus, from
    crude to refined
  • Consensus site
  • Sequence logos
  • Position Specific Score Matrix (PSSM)
  • Hidden Markov Model (HMM)

6
Constructing a consensus
  • Collect sequences
  • Align sequences (consensus sites are descriptions
    of the alignment)
  • Condense the set of sequences into a consensus
    (to a consensus, PSSM, HMM).
  • Apply the scoring matrix in alignments/searches.

7
Position Specific Score Matrix (PSSM)
  • A position specific scoring matrix (PSSM) is a
    matrix based on the amino acid frequencies (or
    nucleic acid frequencies) at every position of a
    multiple alignment.
  • From these frequencies, the PSSM that will be
    calculated will result in a matrix that will
    assign superior scores to residues that appear
    more often than by chance at a certain position.

8
Creating a PSSM Example
  • NTEGEWI
  • NITRGEW
  • NIAGECC

Amino acid frequencies at every position of the
alignment
9
Creating a PSSM Example
  • Amino acids that do not appear at a specific
    position of a multiple alignment must also be
    considered in order to model every possible
    sequence and have calculable log-odds scores. A
    simple procedure called pseudo-counts assigns
    minimal scores to residues that do not appear at
    a certain position of the alignment according to
    the following equation
  • Where
  • Frequency is the frequency of residue i in column
    j (the count of occurances).
  • pseudocount is a number higher or equal to 1.
  • N is the number of sequences in the multiple
    alignment.

10
Creating a PSSM Example
  • In this example, N 3 and lets use pseudocount
    1
  • Score(N) at position 1 3/3 1.
  • Score(I) at position 1 0/3 0.
  • Readjust
  • Score(I) at position 1 -gt (01) / (320) 1/23
    0.044.
  • Score(N) at position 1 -gt (31) / (320) 4/23
    0.174.
  • The PSSM is obtained by taking the logarithm of
    (the values obtained above divided by the
    background frequency of the residues).
  • To simplify for this example well assume that
    every amino acid appears equally in protein
    sequences, i.e. fi 0.05 for every i)
  • PSSM Score(N) at position 1 log(0.044 / 0.05)
    -0.061.
  • PSSM Score(I) at position 1 log(0.174 / 0.05)
    0.541.

11
Creating a PSSM Example
  • The matrix assigns positive scores to residues
    that appear more often than expected by chance
    and negative scores to residues that appear less
    often than expected by chance.

12
Using a PSSM
  • To search for matches to a PSSM, scan along a the
    sequence using a window the length (L) of the
    PSSM.
  • The matrix is slid on a sequence one residue at a
    time and the scores of the residues of every
    region of length L are added.
  • Scores that are higher than an empirically
    predetermined threshold are reported.

13
Advantages of PSSM
  • Weights sequence according to observed diversity
    specific to the family of interest
  • Minimal assumptions
  • Easy to compute
  • Can be used in comprehensive evaluations.

14
Review Creating HMMs
  • To create an HMM to model data we need to
    determine two things
  • The structure/topology of the HMMstates and
    transitions
  • The values of the parametersemission and
    transition probabilities.
  • Determining the parameters is called training.

15
HMMER structure/topology
M match state I insertion (w.r.t profile -
insert gap characters in profile) D deletion
(w.r.t sequence - insert gap characters in
sequence) N N-terminal un-aligned C
C-terminal un-aligned J Joining segment,
un-aligned
16
Example HMMER parameters
  • HMMER2.0 2.3
  • NAME rrm
  • ACC PF00076
  • DESC RNA recognition motif. (a.k.a. RRM, RBD, or
    RNP domain)
  • LENG 77
  • ALPH Amino
  • RF no
  • CS yes
  • MAP yes
  • COM ../src/hmmbuild -F rrm.hmm rrm.sto
  • NSEQ 90
  • DATE Tue Apr 29 110143 2003
  • CKSUM 8325
  • GA 15.2 0.0
  • TC 15.2 0.3
  • XT -8455 -4 -1000 -1000 -8455 -4 -8455 -4
  • NULT -4 -8455
  • NULE 595 -1558 85 338 -294 453 -1158 (...) -21
    -313 45 531 201 384 -1998 -644
  • HMM A C D E F G H (...) m-gtm m-gti m-gtd i-gtm i-gti
    d-gtm d-gtd b-gtm m-gte -16 -649

17
Example HMMER parameters
  • NULE 595 -1558 85 338 -294 453 -1158 (...) -21
    -313 45 531 201 384
  • HMM A C D E F G H (...) m-gtm m-gti m-gtd i-gtm i-gti
    d-gtm d-gtd b-gtm m-gte
  • 1 -1084 390 -8597 -8255 -5793 -8424 -8268
    (...) 1
  • - -149 -500 233 43 -381 399 106 (...)
  • C -1 -11642 -12684 -894 -1115 -701 -1378 -16
  • 2 -2140 -3785 -6293 -2251 3226 -2495 -727
    (...) 2
  • - -149 -500 233 43 -381 399 106 (...)
  • C -1 -11642 -12684 -894 -1115 -701 -1378
    (...)
  • 76 -2255 -5128 -302 363 -784 -2353 1398 (...)
    103
  • - -149 -500 233 43 -381 399 106 (...)
  • E -1 -11642 -12684 -894 -1115 -701 -1378
  • 77 -633 879 -2198 -5620 -1457 -5498 -4367
    (...) 104
  • - (...)
  • C 0
  • //
Write a Comment
User Comments (0)
About PowerShow.com