PepArML: A modelfree, resultcombining peptide identification arbiter via machine learning presentation

About This Presentation

Transcript and Presenter's Notes

Title: PepArML: A modelfree, resultcombining peptide identification arbiter via machine learning

1
PepArML A model-free, result-combining peptide
identification arbiter via machine learning

Xue Wu, Chau-Wen Tseng, Nathan Edwards
University of Maryland, College Park, and
Georgetown University Medical Center

2
Comparison of Search Engines

No single score is comprehensive
Search engines disagree
Many spectra lack confident peptide assignment
Many spectra lack any peptide assignment

Searle et al. JPR 7(1), 2008
3
Black-box Techniques

Significance re-estimation
Target-Decoy search
Bimodal distribution fit
Supervised machine learning
Train predictors on synthetic datasets
Select and/or create (many) good features
Result combiners
Incorrect peptide IDs unlikely to match
Significance re-estimation
Independence and/or supervised model

4
PepArML

Unified machine learning result combiner
Significance re-estimation too!
Model-free feature use and result combination
Use agreement and features if useful
Unsupervised training procedure
No loss of classification performance

5
PepArML Overview
X!Tandem
PepArML
Mascot
OMSSA
Other
6
PepArML Overview
Feature extraction
X!Tandem
PepArML
Mascot
OMSSA
Other
7
Dataset Construction
X!Tandem
Mascot
OMSSA
T
F
T

T
8
Dataset Construction

Calibrant 8 Protein Mix (C8)
4594 MS/MS spectra (LTQ)
618 (11.2) true positives
Sashimi 17mix_test2 (S17)
1389 MS/MS spectra (Q-TOF)
354 (25.4) true positives
AURUM 1.0 (364 Proteins)
7508 MS/MS spectra (MALDI-TOF-TOF)
3775 (50.3) true positives

9
PepArML Machine Learning

Machine learning (generally) helps single search
engines
PepArML result-combiner (C-TMO) improves on
single search engines
Sometimes combining two search engines works as
well, or better, than three

10
PepArML vs Search Engines (C8)
11
True vs. Est. FDR (C-TMO, C8)
12
PepArML vs Search Engines (C8)
13
PepArML Pairs vs PepArML (C8)
14
Sensitivity Comparison
15
Feature Evaluation
Tandem
Mascot
OMSSA
16
Application to Real Data

How well do these models generalize?
Different instruments
Spectral characteristics change scores
Search parameters
Different parameters change score values
Supervised learning requires
(Synthetic) experimental data from every
instrument
Search results from available search engines
Training/models for all parameters x search
engine sets x instruments

17
Model Generalization
Train S17 / Score S17
Train C8 / Score S17
18
Rescuing Machine Learning

Train a new machine learning model for every
dataset!
Generalization not required
No predetermined search engines, parameters,
instruments, features
Perhaps we can guess the true proteins
Most proteins not in doubt
Machine learning can tolerate imperfect labels

19
Unsupervised Learning
20
Unsupervised Learning (S17)
21
Unsupervised Learning (S17)
22
Protein Selection Heuristic

Modeled on typical protein identification
criteria
High confidence peptide IDs
At least 2 non-overlapping peptides
At least 10 sequence coverage
Robust, fast convergence
Easily enforce additional constraints

23
What about real data?

Dr. Rado Goldman (LCCC, GUMC)
Proteolytic serum peptides from clinical
hepatocellular carcinoma samples
200 MALDI MS/MS Spectra (TOF-TOF)
PepArML for non-specific search of IPI-Human
Increase in confidence sensitivity
Observation of ragged proteolytic trimming

24
Protein Identification Example
M T O

25
Future Directions

Apply to more experimental datasets
Integrate
novel features
new search engines, spectral matching
multiple searches with varied parameters,
sequence databases
Construct meta-search engine
FDR by bimodal fit instead of decoys
Release as open source
http//peparml.sourceforge.org

26
http//PepArML.SourceForge.Net
27
Acknowledgements

Xue Wu Dr. Chau-Wen Tseng,
Computer ScienceUniversity of Maryland, College
Park
Dr. Brian Balgley, Dr. Paul Rudnick
Calibrant Biosystems NIST
Dr. Rado Goldman, Dr. Yanming An
Department of OncologyGeorgetown University
Medical Center
Kam Ho To
Biochemistry Masters studentGeorgetown
University
Funding NIH/NCI CPTAC

28
(No Transcript)
29
PepArML vs Search Engines (S17)
30
PepArML vs Search Engines (S17)
31
PepArML Pairs vs PepArML (C8)
32
PepArML Pairs vs PepArML (S17)
33
PepArML Pairs vs PepArML (S17)
34
Unsupervised Learning (C8)
35
Unsupervised Learning (C8)

Write a Comment

User Comments (0)

About PowerShow.com

PepArML: A modelfree, resultcombining peptide identification arbiter via machine learning PowerPoint PPT Presentation