Proteomics: A Challenge for Technology and Information Science - PowerPoint PPT Presentation

About This Presentation

Title:

Proteomics: A Challenge for Technology and Information Science

Description:

'Proteomics includes not only the identification and quantification of proteins, ... http://sashimi.sourceforge.net/software_glossolalia.html ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 23

Provided by: tgr70

Learn more at: https://www-users.cse.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Proteomics: A Challenge for Technology and Information Science

1
Proteomics A Challenge for Technology and
Information Science
CBCB Seminar, November 21, 2005 Tim Griffin Dept.
Biochemistry, Molecular Biology and
Biophysics tgriffin_at_umn.edu
2
What is proteomics?
Proteomics includes not only the identification
and quantification of proteins, but also the
determination of their localization,
modifications, interactions, activities, and,
ultimately, their function. -Stan Fields in
Science, 2001.
3
Genomics vs. Proteomics
Similarities ?Large datasets, tools needed
for annotation and interpretation of
results Differences ?Genomics generally
mature technologies, data processing methods,
questions asked usually involve quantitative
changes in RNA transcripts (microarrays) ?Prote
omics still evolving, complexity of protein
biochemical properties expression changes,
modifications, interactions, activities many
questions to ask and data to interpret, methods
changing, different approaches (mass spec, arrays
etc.),
4
Genomics, Proteomics, and Systems Biology
5
Shotgun identification of proteins in mixtures
by LC-MS/MS
Liquid chromatography coupled to tandem mass
spectrometry (MS/MS)
µLC separation (50-100 um)
Tandem mass spectrum (thousands in a matter of
hours)
6
Peptide sequence determination from MS/MS spectra
Collision-induced dissociation (CID) creates two
prominent ion series
y13
y12
y11
y10
y9
y8
y7
y6
y5
y4
y3
y2
y1
y14
y-series
H2N-N--S--G--D--I--V--N--L--G--S--I--A--G--R-COOH
b2
b3
b4
b5
b6
b7
b8
b9
b10
b11
b12
b13
b14
b1
b-series
7
Peptide sequence identifies the protein
GDIVNLGSIAGR
DIVNLGSIAGR
IVNLGSIAGR
VNLGSIAGR
NLGSIAGR
LGSIAGR
GSIAGR
H2N-NSGDIVNLGSIAGR-COOH
Relative Abundance
SIAGR
IAGR
AGR
GR
R
200
400
600
800
1000
1200
m/z
YMR134W, yeast protein involved in iron metabolism
8
High-throughput protein identification by
LC-MS/MS and automated sequence database searching
Raw MS/MS spectrum
Direct identification of 1000 proteins from
complex mixtures
Protein sequence and/or DNA sequence database
search
Peptide sequence match
Protein identification
9
Dealing with the data

Experimental information, metadata capture

1. Data acquisition

Sequence database searching
Quantitative analysis

2. Peak analysis
Integrated workflow?

Database mining
Assignment of function, pathway, localization
etc.
Output for database archiving, publication

3. Knowledge annotation and interpretation
10
1. Data acquisition capturing experimental
information
Proteomics Experimental Data Repository (PEDRo)

Proposed schema

Similar to genomic needs, but experimental info
a bit different

11
2. Peak Analysis
Computational algorithms for searching MS/MS
spectra against protein sequence databases, mRNA
sequences, DNA sequences

ProFound
Mascot
PepSea
MS-Fit
MOWSE
Peptident
Multident
Sequest
PepFrag
MS-Tag

Protein identification

need cpu horsepower (parallel computing)

12
2. Peak Analysis data formats
Format 1
Format 3
Format 2
?
?
Output 3
Output 2
Output 1

Lack of flexibility
Slow to evolve
Lack of incorporation of competing products,
methods

13
2. Peak Analysis need general, flexible,
in-house solutions
Format 1
Format 3
Format 2
reverse engineering of data formats
General tools for analysis of multiple data
formats
14
2. Peak Analysis reverse engineering data formats
http//sashimi.sourceforge.net/software_glossolali
a.html
15
2. Peak analysis quality control of protein
matches
filtering
Unfiltered 105 matches (lots of noise and
junk)
Filtered thousands of true matches

Statistical analysis of database results (tools
are available)

16
2. Peak Analysis Quantitative analysis

External chemical labeling
Metabolic labeling (SILAC)
Enzymatic incorporation (O16/O18)

Flexibility is key need tools to handle
different quantitative methods

17
2. Peak Analysis Quantitative analysis
Sample 2
Relative intensity relative protein abundance
Sample 1
18
Evolving methodologies iTRAQ
Sample 1 2 3 4
Digest to peptides
Digest to peptides
Digest to peptides
Digest to peptides
iTRAQ label 114 115 116 117
Multidimensional separation
3
2
4
1
Intensity
MS/MS spectrum
m/z
114
116
115
117
Diagnostic ions used for quantitative analysis
Peptide fragments used for sequence identification

4-way multiplexing simultaneous comparison of
multiple states, replicates

19
Need for changeable tools
new
3
old
116.0972
2
4
Intensity
115.0963
1
117.1025
114.1005
Automated analysis tools?
20
3. Knowledge annotation making sense of lists of
data
21
3. Knowledge annotation mining proteomic/genomic
databases
22
3. Knowledge annotation needs

Annotation accession numbers and protein names
Functional assignments (functional degeneracy?)
Pathway assignments
Subcellular localization
Disease implications
Comparison of different proteomic datasets (i.e.
expression profiles compared to modification
state profiles, other protein properties)
Automated and streamlined??