Proteomics: A Challenge for Technology and Information Science - PowerPoint PPT Presentation

About This Presentation
Title:

Proteomics: A Challenge for Technology and Information Science

Description:

'Proteomics includes not only the identification and quantification of proteins, ... http://sashimi.sourceforge.net/software_glossolalia.html ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 23
Provided by: tgr70
Category:

less

Transcript and Presenter's Notes

Title: Proteomics: A Challenge for Technology and Information Science


1
Proteomics A Challenge for Technology and
Information Science
CBCB Seminar, November 21, 2005 Tim Griffin Dept.
Biochemistry, Molecular Biology and
Biophysics tgriffin_at_umn.edu
2
What is proteomics?
Proteomics includes not only the identification
and quantification of proteins, but also the
determination of their localization,
modifications, interactions, activities, and,
ultimately, their function. -Stan Fields in
Science, 2001.
3
Genomics vs. Proteomics
Similarities ?Large datasets, tools needed
for annotation and interpretation of
results Differences ?Genomics generally
mature technologies, data processing methods,
questions asked usually involve quantitative
changes in RNA transcripts (microarrays) ?Prote
omics still evolving, complexity of protein
biochemical properties expression changes,
modifications, interactions, activities many
questions to ask and data to interpret, methods
changing, different approaches (mass spec, arrays
etc.),
4
Genomics, Proteomics, and Systems Biology
5
Shotgun identification of proteins in mixtures
by LC-MS/MS
Liquid chromatography coupled to tandem mass
spectrometry (MS/MS)
µLC separation (50-100 um)
Tandem mass spectrum (thousands in a matter of
hours)
6
Peptide sequence determination from MS/MS spectra
Collision-induced dissociation (CID) creates two
prominent ion series
y13
y12
y11
y10
y9
y8
y7
y6
y5
y4
y3
y2
y1
y14
y-series
H2N-N--S--G--D--I--V--N--L--G--S--I--A--G--R-COOH
b2
b3
b4
b5
b6
b7
b8
b9
b10
b11
b12
b13
b14
b1
b-series
7
Peptide sequence identifies the protein
GDIVNLGSIAGR
DIVNLGSIAGR
IVNLGSIAGR
VNLGSIAGR
NLGSIAGR
LGSIAGR
GSIAGR
H2N-NSGDIVNLGSIAGR-COOH
Relative Abundance
SIAGR
IAGR
AGR
GR
R
200
400
600
800
1000
1200
m/z
YMR134W, yeast protein involved in iron metabolism
8
High-throughput protein identification by
LC-MS/MS and automated sequence database searching
Raw MS/MS spectrum
Direct identification of 1000 proteins from
complex mixtures
Protein sequence and/or DNA sequence database
search
Peptide sequence match
Protein identification
9
Dealing with the data
  • Experimental information, metadata capture

1. Data acquisition
  • Sequence database searching
  • Quantitative analysis

2. Peak analysis
Integrated workflow?
  • Database mining
  • Assignment of function, pathway, localization
    etc.
  • Output for database archiving, publication

3. Knowledge annotation and interpretation
10
1. Data acquisition capturing experimental
information
Proteomics Experimental Data Repository (PEDRo)

Proposed schema
  • Similar to genomic needs, but experimental info
    a bit different

11
2. Peak Analysis
Computational algorithms for searching MS/MS
spectra against protein sequence databases, mRNA
sequences, DNA sequences
  • ProFound
  • Mascot
  • PepSea
  • MS-Fit
  • MOWSE
  • Peptident
  • Multident
  • Sequest
  • PepFrag
  • MS-Tag

Protein identification
  • need cpu horsepower (parallel computing)

12
2. Peak Analysis data formats
Format 1
Format 3
Format 2
?
?
Output 3
Output 2
Output 1
  • Lack of flexibility
  • Slow to evolve
  • Lack of incorporation of competing products,
    methods

13
2. Peak Analysis need general, flexible,
in-house solutions
Format 1
Format 3
Format 2
reverse engineering of data formats
General tools for analysis of multiple data
formats
14
2. Peak Analysis reverse engineering data formats
http//sashimi.sourceforge.net/software_glossolali
a.html
15
2. Peak analysis quality control of protein
matches
filtering
Unfiltered 105 matches (lots of noise and
junk)
Filtered thousands of true matches
  • Statistical analysis of database results (tools
    are available)

16
2. Peak Analysis Quantitative analysis
  • External chemical labeling
  • Metabolic labeling (SILAC)
  • Enzymatic incorporation (O16/O18)
  • Flexibility is key need tools to handle
    different quantitative methods

17
2. Peak Analysis Quantitative analysis
Sample 2
Relative intensity relative protein abundance
Sample 1
18
Evolving methodologies iTRAQ
Sample 1 2 3 4
Digest to peptides
Digest to peptides
Digest to peptides
Digest to peptides
iTRAQ label 114 115 116 117
Multidimensional separation
3
2
4
1
Intensity
MS/MS spectrum
m/z
114
116
115
117
Diagnostic ions used for quantitative analysis
Peptide fragments used for sequence identification
  • 4-way multiplexing simultaneous comparison of
    multiple states, replicates

19
Need for changeable tools
new
3
old
116.0972
2
4
Intensity
115.0963
1
117.1025
114.1005
Automated analysis tools?
20
3. Knowledge annotation making sense of lists of
data
21
3. Knowledge annotation mining proteomic/genomic
databases
22
3. Knowledge annotation needs
  • Annotation accession numbers and protein names
  • Functional assignments (functional degeneracy?)
  • Pathway assignments
  • Subcellular localization
  • Disease implications
  • Comparison of different proteomic datasets (i.e.
    expression profiles compared to modification
    state profiles, other protein properties)
  • Automated and streamlined??
  • Publication and deposit in databases
  • Visualization of complex phenomena,
    interpretation of biological relevance
  • Modeling, integration with genomics data
    computational and systems biology
Write a Comment
User Comments (0)
About PowerShow.com