Optimatization of a New Score Function for the Detection of Remote Homologs - PowerPoint PPT Presentation

About This Presentation
Title:

Optimatization of a New Score Function for the Detection of Remote Homologs

Description:

Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 13
Provided by: gl55
Category:

less

Transcript and Presenter's Notes

Title: Optimatization of a New Score Function for the Detection of Remote Homologs


1
Optimatization of a New Score Function for the
Detection of Remote Homologs
  • Kann et al

2
Introduction
  • New method to calculate a score function, aiming
    to optimize the ability to discriminate between
    homologs and non-homologs
  • Existing software uses the following to compute
    an alignment score

3
Number of times AA i is aligned with AA j
Number of gaps in alignment
Number of residues in each gap beyond one
Score function / Substitution matrix Contribution
to score for AA match/mismatch
Contribution to score for gap initialization
Contribution to score for gap extension
4
Current Methods to Calculate Homology
  • p(Sr gt x) probability that a random pair of
    proteins of the same length would have that score
  • E expected number of random proteins in the db
    that would have at least that score
  • P probability that there is at least one random
    pair with a higher score
  • As p(Sr gt x), E, P increase, the likelihood that
    the given pair is homologous decreases

5
Current Score Matrices
  • PAM (percent accepted mutations) Dayhoff
  • GCB, JTT used to apply to larger sequence
    datasets
  • BLOSUM62 Henikoff Henikoff, constructed using
    a dataset of aligned sequence blocks
  • STR protein sequences aligned based on their
    observed structures

6
Limitations of Current Score Functions
  • Current score functions assume independent
    evolution of each location, overlooking
    correlations
  • Score functions derived from a db of properly
    aligned proteins, not on alignments between
    random sequences
  • Gap penalty a priori

7
Theory
  • Z score for alignment
  • Characterize the significance of alignment score
    by calculating the likelihood that this score or
    higher would be obtained by a random match
  • Account for variations in E with the length of
    the proteins

8
Theory
  • Score function optimized by maximizing the
    confidence ltCgt over the training set
  • Avoids dependence on extreme E values (easily
    detected or overly distant homologies)
  • Eliminates contribution of falsely identified
    homologies (overly distant)

9
Database Preparation
  • Use set of known homologs whose homology cannot
    be reliably determined with standard pairwise
    comparison, in order to optimize score function
    for detection of distant homologs
  • Training set 900 pairs of protein in same COG
    with lt 25 sequence identity

10
Optimization of Score Function
  • Align using BLOSOM62 matrix
  • Calculate Z and C for each pair of homologs, then
    averaged over pairs in training set to yield ltCgt
  • Generate initial alignments using gap penalties
    that yielded highest C values
  • 10 cycles of optimization and realignments until
    score function converged

11
Results
  • Small changes in gap penalties most of the
    improvement cones from refinements of
  • OPTIMA resulting score function
  • has significantly improved average confidence ltCgt
    value compared with other score matrices
  • ltp(Sr gt x)gt, ltPgt significantly decreased

12
Summary
  • Aim optimize score matrix to discriminate
    between homologs and non-homologs
  • OPTIMA score function more successful at
    discriminating between homologs and non-homologs
    compared with standard score matrices
  • Gap penalties treated as additional parameters to
    be optimized
Write a Comment
User Comments (0)
About PowerShow.com