SVMProt: webbased support vector machine software for functional classification from its primary seq

About This Presentation

Title:

SVMProt: webbased support vector machine software for functional classification from its primary seq

Description:

SVM-Prot: web-based support vector machine software for ... Hydrophobicity. Normalized Van der Waals volume. Polarity. Polarizability. Charge. Surface fusion ... – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 20

Provided by: great8

Category:

more less

Transcript and Presenter's Notes

Title: SVMProt: webbased support vector machine software for functional classification from its primary seq

1
SVM-Prot web-based support vector machine
software for functional classification from its
primary sequence

Nucleic Acids Research, 2003
speaker yfhuang_at_CSIE.NTU

2
Outline

Introduction
Method
Result Discussion
Reference
Extension Reading

3
Introduction

Briefing introduction
Current developed approaches/methods
Problem
Goal

4
Briefing Introduction

Protein function prediction family prediction
Based on protein primary sequence
physicochemical property
Using SVM to prediction functional family

5
Various methods

Sequence similarity
Evolutionary analysis
Structure-based approach
Protein/gene fusion
Protein interaction
Family classification by sequence clustering

6
Problems

Not all homologous proteins have analogous
functions
Many proteins share promiscuous domains

7
Goal

Distantly-related proteins
Closely-related proteins

8
SVMProt

SVMProt classification system
Web-based software
Target using SVM classification to classify
protein into functional family form its primary
sequence
Representative proteins of a number of functional
families Seed proteins of Pfam curated protein
families
URL http//jing.cz3.nus.edu.sg/cgi-bin/svmprot.cg
i

9
Method

Database
SVMProt processing
Example of feature example
Dataset
Scoring function quality measurement

10
Database

Data
46 families Enzymes from BRENDA
G-protein coupled receptors from GPCRDB
Nuclear receptors from NucleaRDB
5 families of channels 1 family of transporters
from TCDB LGICdb
DNA- RNA-binding proteins derived from
SWISS-PROT

11
Processing (1)

Feature vector - properity
Amino acid composition
Hydrophobicity
Normalized Van der Waals volume
Polarity
Polarizability
Charge
Surface fusion
Secondary structure
Solvent accessibility

12
(No Transcript)
13
Dimension of feature vector
From Protein function classification via support
vector machine approach, Mathematical
Biosciences 185 (2003) 111-122
14
Processing (2)

3 descriptors (21 elements)
Composition (C) - 3
C is the number of amino acids of a particular
property divided by the total number of amino
acids
Transition (T) - 3
T characterizes the percent frequency with which
amino acids of particular property is followed by
amino acids of a different property
Distribution (D) - 15
D means the chain length within which the first,
25, 50, 75, 100 of the amino acids of a
particular property is located respectively

15
Example of feature vector
AA amino acid Sequence length (SL) 30 16
alanines ? n1 16 14 glutamic ? n2 14 C n1
100 / (n1n2) 53.33, n2 100 / (n1n2)
46.67 T (A ? E 15, E ? A 15) ? (15 / 29)
100 51.72 D (Index(AA) / SL) 100 As ?
3.33, 16.67, 40, 66.67, 96.67 Es ? 6.67,
26.67, 60, 76.67, 100
16
Dataset

Training set
Positive all distinct protein members in each
family
Negative from seed proteins of the curated
protein families in the Pfam database excluding
those that belong to the family under study
Testing set
Positive all the remaining distinct proteins in
each functional family
Negative all the remaining representative seed
protein in Pfam curated families
Independent evaluation set (evaluate)
Both positive and negative samples

17
Scoring Function Quality Measurement

Scoring for SVMProt
Reliability index
R-value
P-value
Quality measurement for experment
TP (true positive)
TN (true negative)
FP (false positive)
FN (false negative)
Q (overall accuracy)

d the distance between the position of the
vector of classified protein and the optimal
separating hyperplane in hyperspace
18
P-value Probability of correct classification
9932 positive samples 45,999 negative samples
19
Result Discussion

Q of protein classification ranges from 69.1 to
99.6 ? provide more comprehensive sampling of
proteins not in a functional class
Prediction of distantly related proteins ? test
on 24 randomly selected distantly related
proteins in 7 families ? 14 proteins are
correctly classified (58.3)

Write a Comment

User Comments (0)