Optimatization of a New Score Function for the Detection of Remote Homologs

About This Presentation

Title:

Optimatization of a New Score Function for the Detection of Remote Homologs

Description:

Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 13

Provided by: gl55

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Optimatization of a New Score Function for the Detection of Remote Homologs

1
Optimatization of a New Score Function for the
Detection of Remote Homologs

Kann et al

2
Introduction

New method to calculate a score function, aiming
to optimize the ability to discriminate between
homologs and non-homologs
Existing software uses the following to compute
an alignment score

3
Number of times AA i is aligned with AA j
Number of gaps in alignment
Number of residues in each gap beyond one
Score function / Substitution matrix Contribution
to score for AA match/mismatch
Contribution to score for gap initialization
Contribution to score for gap extension
4
Current Methods to Calculate Homology

p(Sr gt x) probability that a random pair of
proteins of the same length would have that score
E expected number of random proteins in the db
that would have at least that score
P probability that there is at least one random
pair with a higher score
As p(Sr gt x), E, P increase, the likelihood that
the given pair is homologous decreases

5
Current Score Matrices

PAM (percent accepted mutations) Dayhoff
GCB, JTT used to apply to larger sequence
datasets
BLOSUM62 Henikoff Henikoff, constructed using
a dataset of aligned sequence blocks
STR protein sequences aligned based on their
observed structures

6
Limitations of Current Score Functions

Current score functions assume independent
evolution of each location, overlooking
correlations
Score functions derived from a db of properly
aligned proteins, not on alignments between
random sequences
Gap penalty a priori

7
Theory

Z score for alignment
Characterize the significance of alignment score
by calculating the likelihood that this score or
higher would be obtained by a random match
Account for variations in E with the length of
the proteins

8
Theory

Score function optimized by maximizing the
confidence ltCgt over the training set
Avoids dependence on extreme E values (easily
detected or overly distant homologies)
Eliminates contribution of falsely identified
homologies (overly distant)

9
Database Preparation

Use set of known homologs whose homology cannot
be reliably determined with standard pairwise
comparison, in order to optimize score function
for detection of distant homologs
Training set 900 pairs of protein in same COG
with lt 25 sequence identity

10
Optimization of Score Function

Align using BLOSOM62 matrix
Calculate Z and C for each pair of homologs, then
averaged over pairs in training set to yield ltCgt
Generate initial alignments using gap penalties
that yielded highest C values
10 cycles of optimization and realignments until
score function converged

11
Results

Small changes in gap penalties most of the
improvement cones from refinements of
OPTIMA resulting score function
has significantly improved average confidence ltCgt
value compared with other score matrices
ltp(Sr gt x)gt, ltPgt significantly decreased

12
Summary

Aim optimize score matrix to discriminate
between homologs and non-homologs
OPTIMA score function more successful at
discriminating between homologs and non-homologs
compared with standard score matrices
Gap penalties treated as additional parameters to
be optimized

Write a Comment

User Comments (0)

About PowerShow.com

Optimatization of a New Score Function for the Detection of Remote Homologs - PowerPoint PPT Presentation

Optimatization of a New Score Function for the Detection of Remote Homologs

Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al – PowerPoint PPT presentation