Analysis of Complex Proteomic - PowerPoint PPT Presentation

Loading...

PPT – Analysis of Complex Proteomic PowerPoint presentation | free to download - id: 79116b-MzZiM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Analysis of Complex Proteomic

Description:

Title: Slide 1 Author: jprenni Last modified by: jprenni Created Date: 5/2/2007 9:04:44 PM Document presentation format: On-screen Show Company: CVMBS – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 35
Provided by: jprenni
Learn more at: http://proteome-software.wikispaces.com
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Analysis of Complex Proteomic


1
Analysis of Complex Proteomic Datasets Using
Scaffold
Free Scaffold Viewer can be downloaded
at www.proteomesoftware.com
2
Scaffold Why do we need it?
Shotgun proteomics ? Analysis of complex mixtures
Whole cell extract
10,000 proteins
600,000 peptides
1.2 Million Spectra!!!
  • Beyond the realm of manual interpretation
  • How do we determine what is a valid protein
    identification?

3
Statistical Analysis Using Scaffold
  • All search engines use different scoring
  • algorithms ? Can not directly compare results
  • Many search engines results are described by
  • more than one value

Examples Mascot ? Ion Score and Identity
Score Sequest ? Xcorr and DeltaCn
4
Statistical Analysis Using Scaffold
Peptide Prophet
  • Creates a universal score (discriminant score)
    for the search
  • engine result (e.g. XCorr and DeltaCn are
    compressed to one
  • score for SEQUEST results, Ion score and
    Identity score for
  • Mascot results)
  • Plots a histogram of the discriminant scores and
  • calculates a bimodal distribution based on
    standard
  • statistics to differentiate between correct and
    incorrect hits
  • Computes the probability that the match is
    correct at a
  • given discriminant score

Nesvizhskii, A. I. et al, Anal. Chem. 2003, 75,
4646-4658
5
Statistical Analysis Using Scaffold
200
180
Histogram of discriminate scores
160
140
120
100
Number of spectra in each bin
80
60
40
20
0
-3.9
-2.3
-0.7
0.9
2.5
4.1
5.7
7.3
Discriminant score (D)
6
Statistical Analysis Using Scaffold
Assumes a mixture of standard statistical
distributions
incorrect
correct
7
Statistical Analysis Using Scaffold
Peptide Probability Threshold
incorrect
correct
8
Statistical Analysis Using Scaffold
One Search Engine may not be enough
SEQUEST
X!Tandem
Mascot
www.proteomesoftware.com
9
Statistical Analysis Using Scaffold
  • Peptide Prophet statistics are applied
    separately for
  • each search engine result (i.e. Mascot,
    SEQUEST,
  • and X!Tandem)
  • Scaffold Merger combines the peptide
    probabilities
  • from each search engine to generate a protein
  • probability

The probability of identifying a spectrum The
probability of agreement between search engines
Protein Probability
10
Statistical Analysis Using Scaffold
Advantages using of Scaffold
  • Allows you to choose a statistical error rate by
    setting probability thresholds
  • Allows you to compare and combine results from
    different experiments and different search
    engines
  • Allows sharing of raw data and search results
  • Accepted as a suitable statistical method to
    validate large datasets

11
This is the Samples view
12
List of all the proteins found in your samples
Homologous proteins (proteins matched to the same
peptides) are shown. You can directly like out
to database entries
13
How does Scaffold Deal with peptides that can be
assigned to more than one protein?
General Rule ? Explain the spectral data with
the smallest set of proteins
B
Protein A and Protein B share all the same
peptides so they will be grouped together
A
14
How does Scaffold Deal with peptides that can be
assigned to more than one protein?
General Rule ? Explain the spectral data with
the smallest set of proteins
B
Protein A and protein B each have one unique
peptide ? they will be listed separately only if
the peptide probability is gt 50
A
15
How does Scaffold Deal with peptides that can be
assigned to more than one protein?
General Rule ? Explain the spectral data with
the smallest set of proteins
B
Protein B has two unique peptides ? it will be
listed separately
A
16
Scaffold will extract GO terms from NCBI
annotations
17
Gene Ontology GO terms
  • Controlled vocabulary containing consistent
  • descriptions of gene products in different
  • databases
  • Describe gene products in terms of their
  • associated biological processes, cellular
  • components and molecular functions in a
    species
  • independent manner

Gene Ontology Project http//www.geneontology.org/
GO.doc.shtml
18
List of samples
19
Probability thresholds for peptide and protein
identifications and required number of unique
peptides can be defined
Color coded to represent probability that
protein identification is correct
20
This is the Proteins view
21
Spectrum of each peptide labeled with y and b
ions which can be used for manual validation
22
Manual Spectrum Evaluation
  • Search engine scores ? Is peptide found by
    more
  • than one search engine?
  • Mascot ion score gt 40
  • SEQUEST Xcorr gt 2 (2 ion), 2.5 (3 ion)
  • deltaCn gt
    0.2
  • Good signal-to-noise
  • Long stretches of y and/or b ions
  • All dominant peaks are assigned as y or b ions
  • Fragmentation chemistry

N-terminal cleavage at P ? dominate
y-ion C-terminal cleavage at D and E ? dominate
b-ion Peptides containing W ? abundant y-ions S
and T ? tend to lose water (-18 Da) R, N, and Q ?
tend to lose ammonia (-17 Da)
23
Good Spectrum
Peptide Sequence ?IAELAGFSVPENTK 2 charge on
parent peptide
Good signal-to-noise
Mascot Ion Score 60.1 Identify Score 37.3
SEQUEST Xcorr 2.61 deltaCn 0.4
24
Bad Spectrum
Peptide Sequence ?YPLADYALTPDMAIVDANLVMDMPK 3
charge on parent peptide
Poor coverage of y and b ion series
Mascot Ion Score 9.93 Identity Score 37.3
SEQUEST Xcorr 2.26 deltaCn 0.2
25
This is the Statistics view
26
Scaffold Statistics View
Score Histogram
Blue indicates incorrect proteins
Red indicates correct proteins
Important! Must have enough data to fit two
distributions for the statistics to be valid.
Protein is correct if it passes the peptide and
protein probability and minimum peptide
filters.
27
Scaffold Statistics View
With at least 2 unique Peptides (95 peptide
prob) the maximum protein probability is 100.
With only 1 unique peptide (95 peptide prob)
the maximum protein probability is lt90.
28
Scaffold Statistics View
Missed IDs
SEQUEST only
29
Scaffold Statistics View
Mascot only
Missed IDs
30
Scaffold Statistics View
Using both Mascot and Sequest results in more
correct protein identifications
31
This is the Publish View
32
Publication Guidelines for Proteomic Data
Journal of Molecular and Cellular Proteomics
http//www.mcponline.org/misc/ParisReport_Final.sh
tml
33
Publication Guidelines for Proteomic Data
Data Analysis
  • Name and version of software used to extract
    peak list
  • Name and version of database searching software
    (Mascot, Sequest, Spectrum Mill, or X! Tandem)
  • Values of all search parameters used (enzyme,
    modifications, mass tolerance, etc.)
  • Name and size of the database searched
    (Swisprot or NCBI and the number of sequence
    entries)
  • Name and version of any additional software
    used for statistical analysis and an explanation
    of the analysis (Scaffold, peptide requirements,
    probability settings)

34
Publication Guidelines for Proteomic Data
Each Peptide Identified
  • Peptide sequence noting any modifications or
    missed cleavages
  • Parent peptide ion mass and charge
  • All search engine scores

Each Protein Identified
  • Accession number
  • Sequence coverage and total number of unique
    peptides
About PowerShow.com