Mass Spectrometry-Based Methods for Protein Identification presentation

About This Presentation

Transcript and Presenter's Notes

Title: Mass Spectrometry-Based Methods for Protein Identification

1
Mass Spectrometry-Based Methods for Protein
Identification
Joseph A. Loo Department of Biological
Chemistry David Geffen School of
Medicine Department of Chemistry and
Biochemistry University of California Los
Angeles, CA USA
2
Genomics and ProteomicsCharacterizing many genes
and gene products simultaneously
3
Proteomics Aids Biological Research
complex protein mixture
Biology
protein identification protein modification protei
n abundance
protein separation
mass spectrometry
4
Proteomics - What is it?

An assay to systematically analyze the diverse
properties of proteins
Biological processes are dynamic
A quantitative comparison of states is required
The study of protein expression and function on a
genome scale
Purpose Examine altered gene expression
pathways in disease states and under different
environmental conditions

The completion of the human genome has provided
researchers with the blueprint for life, and
proteomics offers scientists the means for
analyzing the expressed genome.
5
Genome to Proteome
dsDNA
(Gene)
Transcription
mRNA
Translation
Protein
H2N
COOH
MTDLKASSLRALKLMDLTTLNDDDTDEKVIALCHQAKTPVGNTA
AICIYP 51 RFIPIARKTLKEQGTPEIRIATVTNFPHGNDDIDIAL
AETRAAIAYGADE 101 VDVVFPYRALMAGNEQVGFDLVKACKEACA
AANVLLKVIIETGELKDEAL 151 IRKASEISI
Mass spectrometry
The completion of the human genome has provided
researchers with the blueprint for life, and
proteomics offers scientists the means for
analyzing the expressed genome.
6
Approaches for Protein Identification
What is this protein?

Molecular weight
Isoelectric point
Amino acid composition
Other physical/chemical characteristics
Partial or complete amino acid sequence
Edman (N-terminal sequence) - if N-term. not
blocked
C-terminal sequence - not commonly performed
Mass spectrometry-measured information

7
Protein Identification by Mass Spectrometry
2-D Gel Electrophoresis
150-
75-
40-
25-
MW x 103
18-
10-
6.5
6.0
5.5
5.0
4.5
pI
1547
1089
Peptide mass fingerprint by MALDI-TOF or
LC-ESI-MS. Additional sequence information can
be obtained by MS/MS.
2384
717
1857
1401
1700
1272
2791
500
2500
3000
1500
m/z
8
Mass SpectrometryA method to weigh molecules
A simple measurement of mass is used to confirm
the identity of a molecule, but it can be used
for much more
9
Mass Spectrometer for Proteomics
Pre-Separation
Ion Source
Liquid Chromatography
10
The Nobel Prize in Chemistry 2002
"for the development of methods for
identification and structure analyses of
biological macromolecules"
"for their development of soft desorption
ionisation methods for mass spectrometric
analyses of biological macromolecules"
11
Electrospray Generation of aerosols and droplets
12
Electrospray Ionization (ESI)

Multiple charging
More charges for larger molecules
MW range gt 150 kDa
Liquid introduction of analyte
Interface with liquid separation methods, e.g.
liquid chromatography
Tandem mass spectrometry (MS/MS) for protein
sequencing

ESI
MS
high voltage
highly charge droplets
20
19
18
21
17
16
22
15
14
500
700
900
1100
mass/charge (m/z)
13
ESI-MS of Large Proteins
distribution of multiply charged molecules
(M14H)14
(M15H)15
3323
3102
(M13H)13
3543
(M16H)16
ESI-MS (Q-TOF) pH 7.5
m/z
14
History of Electrospray Ionization

Malcolm Dole demonstrated the production of
intact oligomers of polystyrene up to MW 500,000
mass analysis of large ions was problematic
John Fenn (Yale University)
Chemical engineer - expert in supersonic
molecular beams
Began work on electrospray in 1981
Adapted ESI to operate on a more conventional
mass spectrometer
Recognized that multiply charged ions were
produced by ESI
Reduced the m/z range required

15
Electrospray process
106 charges for 30 micron droplet

Analyte dissolved in a suitable solvent flows
through a small diameter capillary tube
Liquid in the presence of a high electric field
generates a fine mist or aerosol spray of
highly charged droplets

16
Matrix-assisted Laser Desorption/Ionization
(MALDI)
Time-of-Flight (TOF) Analyzer
detector
high voltage
MALDI
sample
laser
drift region
m1 m2 m3
17
MALDI Mass Spectrometry of Large Proteins
100
97430
MALDI-MS of rat MVP
(MH)
Intensity
(M2H)2
98563
48658
36446
58309
30811
50608
70405
90202
110000
129797
m/z
18
MALDI
sample and matrix

Developed by Tanaka (Japan) and Hillenkamp/Karas
(Germany)
Peptide/protein analyte of interest is
co-crystallized on the MALDI target plate with an
appropriate matrix
small, highly conjugated organic molecules which
strongly absorb energy at a particular wavelength
Energy is transferred to analyte indirectly,
inducing desorption from target surface
Analyte is ionized by gas-phase proton transfer
(perhaps from ionized matrix molecules)

pulsed laser light
peptide/protein ions desorbed from matrix
20 kV (sample stage or target)
19
MALDI matrices
2,5-dihydroxybenzoic acid (DHB) peptides and
proteins
4-hydroxy-?-cyanocinnamic acid (alpha-cyano or
4-HCCA) peptides
3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic
acid) proteins
matrices for 337 nm irradiation
20
MALDI

337 nm irradiation is provided by a nitrogen (N2)
laser
The target plate is inserted into the high vacuum
region of the source and the sample is irradiated
with a laser pulse. The matrix absorbs the laser
energy and transfers energy to the analyte
molecule. The molecules are desorbed and ionized
during this stage of the process.
MALDI is most commonly interfaced to a
time-of-flight (TOF) mass spectrometer.

21
R. Aebersold and M. Mann, Nature (2003), 422,
198-207.
22
Time-of-Flight Mass Spectrometer
v1 m1
v2 m2
v3 m3
detector
drift region (L)
high voltage
Principal of Operation of Linear TOF A
time-of-flight mass spectrometer measures the
mass-dependent time it takes ions of different
masses to move from the ion source to the
detector. This requires that the starting time
(the time at which the ions leave the ion source)
is well-defined. Recall that the kinetic energy
of an ion is
where ? is ion velocity, m is mass, e is
charge on electron, and V is electric field.
The ion velocity, ?, is also the length of the
flight path, L , divided by the flight time, t
Substituting this expression for ? into the
kinetic energy relation, we can derive the
working equation for the time-of-flight mass
spectrometer
mass is proportional to (time)2
23
Approaches for Protein Sequencing and
Identification
Top Down
MS/MS
MIRERICACVLALGMLTGFTHAFGSKDAAADGKPLVVTTIGMIADAVKNI
AQGDVHLKGLMGPGVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEV
FSKLRGSRLVVAVSETIPVSQRLSLEEAEFDPHVWFDVKLWSYSVKAVYE
SLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQSLPAERRVLVTAHD
AFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESS
IPHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNID
TIVAALAR
MS/MS
Enzymatic or chemical degradation
Bottom Up
24
Identification of proteins from gels

Proteins are separated first by high resolution
two-dimensional polyacrylamide gel
electrophoresis and then stained. At this point,
to identify an individual or set of protein
spots, several options can be considered by the
researcher, depending on availability of
techniques.
For protein spots that appear to be relatively
abundant (e.g., more than 1 pmol), traditional
protein characterization methods may be employed.
Methods such as amino acid analysis and Edman
sequencing can be used to provide necessary
protein identification information. With 2-DE,
approximate molecular weight and isoelectric
point characteristics are provided. Augmented
with information on amino acid composition and/or
amino-terminal sequence, a confident
identification can be obtained.
The sensitivity gains of using MS allows for the
identification of proteins below the one pmol
level and in many cases in the femtomole regime.

25
Protein Identification by Mass Spectrometry
2-D Gel Electrophoresis
150-
75-
40-
25-
MW x 103
18-
10-
6.5
6.0
5.5
5.0
4.5
pI
1547
1089
Peptide mass fingerprint by MALDI-TOF or
LC-ESI-MS. Additional sequence information can
be obtained by MS/MS.
2384
717
1857
1401
1700
1272
2791
500
2500
3000
1500
m/z
26
Protein Cleavage

For the application of mass spectrometry for
protein identification, the protein bands/spots
from a 2-D gel are excised and are exposed to a
highly specific enzymatic cleavage reagent (e.g.,
trypsin cleaves on the C-terminal side of
arginine and lysine residues). The resulting
tryptic fragments are extracted from the gel
slice and are then subjected to MS-methods. One
of the major barriers to high throughput in the
proteomic approach to protein identification is
the in-gel proteolytic digestion and subsequent
extraction of the proteolytic peptides from the
gel. Common protocols for this process are often
long and labor intensive.

protein digestion robot
27
Protein cleavage - proteolysis and chemical
methods
28
Mass spectrometry-based protein identification

A mass spectrum of the resulting digest products
produces a peptide map or a peptide
fingerprint.
The measured masses can be compared to
theoretical peptide maps derived from database
sequences for identification. There are a few
choices of mass analysis that can be selected
from this point, depending on available
instrumentation and other factors. The resulting
peptide fragments can be subjected to MALDI-MS or
ESI-MS analysis.
A small aliquot of the digest solution can be
directly analyzed by MALDI-MS to obtain a peptide
map. The resulting sequence coverage (relative
to the entire protein sequence) displayed from
the total number of tryptic peptides observed in
the MALDI mass spectrum can be quite high, i.e.,
greater than 80 of the sequence, although it can
vary considerably depending on the protein,
sample amount, etc. The measured molecular
weights of the peptide fragments along with the
specificity of the enzyme employed can be
searched and compared against protein sequence
databases using a number of computer searching
routines available on the Internet.

29
Protein identification from peptide fragments
Tryptic peptides
Mass spectrum
Protein
Theoretical mass spectrum
Theoretical tryptic peptides
Protein sequence
SEMHIKHYTTK ILGFR EEGDSCPLK QWDDSK ILVAVADK LLEYEE
K ILLFNSAK YLLDESSTYK LMHDDSV
SEMHIKHYTTKILGFREEGDSCPLKQWDDSKILVAVADKLLEYEEKILLF
NSAKYLLDESSTYKLMHDDSV
30
MALDI-MS of tryptic peptides
1247.70
all peaks are (MH)
1116.67
1375.76
trypsin autolysis
1505.77
1424.85
1665.89
2005.07
1287.73

2719.48
1574.20
1811.85

1849.12
2476.21
2550.52
1000
1500
2000
2500
3000
m/z
ARIIVVTSGK GGVGKTTSSA AIATGLAQKG KKTVVIDFDI
GLRNLDLIMG CERRVVYDFV NVIQGDATLN QALIKDKRTE
NLYILPASQT RDKDALTREG VAKVLDDLKA MDFEFIVCDS
PAGIETGALM ALYFADEAII TTNPEVSSVR DSDRILGILA
SKSRRAENGE EPIKEHLLLT RYNPGRVSRG DMLSMEDVLE
ILRIKLVGVI PEDQSVLRAS NQGEPVILDI NADAGKAYAD
TVERLLGER PFRFIEEEKK GFLKRLFGG
31
ESI-MS and LC-MS for protein identification

An approach for peptide mapping similar to
MALDI-MS uses ESI-MS. A peptide map can be
obtained by analysis of the peptide mixture by
ESI-MS. An advantage of ESI is its ease of
coupling to separation methodologies such as
HPLC. Thus, alternatively, to reduce the
complexity of the mixture, the peptides can be
separated by HPLC with subsequent mass
measurement by on-line ESI-MS. The measured
masses can be compared to sequence databases.

9.4
LC-MS with ESI
8.4
9.8
8.9
7.7
6.8
6.2
Time (min)
965.3
(M2H)2
MW 1928.6 Da
629.0
m/z
32
LC-MS/MS for protein identification

An improvement in throughput of the overall
method can be obtained by performing LC-MS/MS in
the data dependant mode. As full scan mass
spectra are acquired continuously in LC-MS mode,
any ion detected with a signal intensity above a
pre-defined threshold will trigger the mass
spectrometer to switch over to MS/MS mode. Thus,
the mass spectrometer switches back and forth
between MS- (molecular mass information) and
MS/MS mode (sequence information) in a single LC
run. The data dependant scanning capability can
dramatically increase the capacity and throughput
for protein identification.

9.4
y12
8.4
LC-MS
LC-MS/MS
1261.4
9.8
y10
6.8
y13
1374.5
Time (min)
b6
668.4
b8
965.3
838.5
b5
y9
y11
y14
(M2H)2
MS/MS
y8
1474.4
y4
629.0
b3
m/z
m/z
33
Peptide sequencing by mass spectrometry
N-term.
C-term.

Peptide molecules are fragmented by collisionally
activated dissociation (CAD)
collisions with neutral background gas molecules
(nitrogen, argon, etc)
typically dissociate by cleavage of -CO-NH- bond

A
N-terminal product ions
34
Peptide sequencing by mass spectrometry

Ideally, one can measure the spacings between
product ion peaks to deduce the sequence
if each amide bond dissociates with equal
probability
if only a single amide bond fragments for each
molecule
if only C-terminal or N-terminal products ions
are formed
In reality, this is not the case

C-terminal product ions
35
Nomenclature for MS Sequencing of Peptides
Klaus Biemann, MIT
subscript denotes the number of residues
contained in product ion
N-terminal fragments
b1
b2
b3
H2N - C - C - N - C - C - N - C - C - N - C - COOH
y3
y2
y1
C-terminal fragments
36
Nomenclature for MS Sequencing of Peptides

Low-energy collisions promote fragmentation of a
peptide primarily along the peptide backbone
Peptide fragmentation which maintains the charge
on the C terminus is designated a y-ion
Fragmentation which maintains the charge on the N
terminus is designated a b-ion
Low energy collisions ion trap, QQQ, QTOF,
FT-ICR
High energy collisions TOF-TOF
cleavage of amino acid side chain bonds (d-ion
and w-ion)
differentiate Leu vs. Ile

37
Peptide Sequencing by Mass Spectrometry
y4-14
LVDKVIGITNEEAISTAR
Cysteine Synthase A
b3-17
MS/MS of 2 charged tryptic peptides yield
(often) 1 charged product ions (but 2 charged
products can be observed as well)
y12
1261.4
y10
mixture of b-ions and y-ions are present
1091.5
y13
1374.5
Rel. Abund.
b6
668.4
b8
b5
y9
838.5
y14
555.4
990.5
y11
y5
b7
1474.4
y4
b4
b9
y6
y8
b3
y7
b14
b10
b11
b12
b13
b16
b17
b15
m/z
38
Computer-based Sequence Searching Strategies

A list of experimentally determined masses is
compared to lists of computer-generated
theoretical masses prepared from a database of
protein primary sequences. With the current
exponential growth in the generation of genomic
data, these databases are expanding every day.
There are typically three types of search
strategies employed
searching with peptide fingerprint data
searching with sequence data
searching with raw MS/MS data.
One limiting factor that must be considered for
all of the approaches is that they can only
identify proteins that have been identified and
reside within an available database, or very
homologous to one that resides in the database.

39
Searching with Peptide Fingerprints

The majority of the available search engines
allow one to define certain experimental
parameters to optimize a particular search.
Minimum number of peptides to be matched
Allowable mass error
Monoisotopic versus average mass data
Mass range of starting protein
Type of protease used for digestion
Information about potential protein modification,
such as N- and C-terminal modification,
carboxymethylation, oxidized methionines, etc.

40
Searching with Peptide Fingerprints

Most protein databases contain primary sequence
information only
Any shift in mass incorporated into the primary
sequence as a result of post-translational
modification will result in an experimental mass
that is in disagreement with the theoretical
mass.
Modifications such as glycation and
phosphorylation can result in missed
identifications.
A single amino acid substitution can shift the
mass of a peptide to such a degree that even a
protein with a great deal of homology with
another in the database can not be identified.

41
Searching with Peptide Fingerprints

A number of factors affect the utility of peptide
fingerprinting.
The greater the experimental mass accuracy, the
narrower you can set your search tolerances,
thereby increasing your confidence in the match,
and decreasing the number of "false positive"
responses.
A common practice used to increase mass accuracy
in peptide fingerprinting is to employ an
autolysis fragment from the proteolytic enzyme as
an internal standard to calibrate a MALDI mass
spectrum.
Peptide fingerprinting is also amenable to the
identification of proteins in complex mixtures.
Peptides generated from the digest of a protein
mixture will simply return two or more results
that are a "good" fit.
Peptides that are "left over" in a peptide
fingerprint after the identification of one
component can be resubmitted for the possible
identification of another component.

42
Web addresses of some representative internet
resources for protein identification from mass
spectrometry data
43
Mascot

Among the first programs for identifying proteins
by peptide mass fingerprinting, MOWSE, developed
out of a collaboration between Imperial Cancer
Research Fund (ICRF) and SERC Daresbury
Laboratory, UK.
The name chosen was an acronym of Molecular
Weight Search. The MOWSE databases were fully
indexed so as to allow very rapid searching and
retrieval of sequence data. Subsequently, the
software was further developed and renamed
Mascot.
Licensed and distributed by Matrix Science Ltd.
Specialized tools include Peptide Mass
Fingerprint, Sequence Query, and MS/MS Ion
Search.
Search output Web-based.
Good visual representation of search quality
(graphical probability chart).
Simple graphical user interface.
Reports MOWSE scores as a quantitative measure of
search quality.

44
Mowse Scoring

Rather than just counting the number of matching
peptides, Mowse uses empirically determined
factors to assign a statistical weight to each
individual peptide match. Rapid identification
of proteins by peptide-mass fingerprinting. Curr.
Biol. 3327, 1993.)
Scoring scheme assigns more weight to matches of
higher molecular weight peptides (more
discriminating).
Compensates for the non-random distribution of
fragment molecular weights in proteins of
different sizes.
Was first protein identification program to
recognize that the relative abundance of peptides
of a given length in a proteolytic digest depends
on the lengths of both peptide and protein.
Developed for MALDI peptide mass fingerprinting.
Probability-Based Mowse
Mascot incorporates a probability-based enhanced
Mowse algorithm, described in Perkins et al.
(Probability-based protein identification by
searching sequence databases using mass
spectrometry data. Electrophoresis 203551-3567,
1999).
A simple rule can be used to judge whether a
result is significant or not. Different types of
matching (peptide masses and fragment ions) can
be combined in a single search.

45
Databases

Three components are required for database
searching support of proteomics MALDI or MS/MS
data, the algorithms used to search protein
databases with the MALDI or MS/MS data, and the
protein databases themselves.
The protein databases can be as small as one
protein, can be large, public domain databases of
all known and predicted proteins, or may be
predicted open reading frames based on genomic
sequence.
A major challenge for database searching is that
these protein databases are constantly changing,
making database search results potentially
obsolete as new entries are added that better fit
the MALDI or MS data.
Even as genomes are completed there is still flux
as new coding regions are identified and novel
mechanisms of increased translational complexity
are better understood, such as alternative splice
products, RNA editing, and ribosome slippage
leading to novel, unexpected translation products.

46
Databases

NCBI non-redundant (NCBInr)
Non-redundant database from the National Center
for Biotechnology Information for use with their
search tools BLAST and Entrez comprised of
translated sequences from the Genbank /EMBL/DDBJ
consortium, SwissProt, Protein Information
Resource (PIR), and Brookhaven Protein Data Bank
(PDB).
New releases are published bimonthly while
updates occur daily.
OWL
OWL is comprised of Swiss-Prot, PIR, translated
Genbank, and NRL-3D (PDB). All sequences are
compared to Swiss-Prot to remove identical and
trivially different sequences. Has not been
updated since May, 1999.
SWISSPROT
While SwissProt contains only a subset of
proteins, the proteins in this database are much
better annotated and the sequences are much more
reliable than those available in any other
database.
MSDB
Comprehensive, non-identical protein sequence
database maintained by the Proteomics Department
at the Hammersmith Campus of Imperial College
London. Designed specifically for MS
applications.

47
Databases

EST Clusters (dBEST)
Division of GenBank that contains "single-pass"
cDNA sequences, or Expressed Sequence Tags
(ESTs), from a number of organisms.
ESTs are relatively short, usually 3 end
sequences from isolated mRNA.
ESTs tend to be highly redundant and the
sequence is much lower quality than from other
sources. An advantage to using these ESTs is
that they represent only expressed sequences (no
introns) and include alternative splice variants
their length, redundancy, and low quality are far
improved by using clustered ESTs, such as the
Compugen clusters.
The EST database has some redundancy because it
contains all possible combinations of alternative
splice products, and so it can be very large (and
slow to search).
During a Mascot search, the nucleic acid
sequences are translated in all six reading
frames. dbEST is a very large database, and is
divided into three sections EST_human,
EST_mouse, and EST_others. Even so, searches of
these databases take far longer than a search of
one of the non-redundant protein databases. You
should only search an EST database if a search of
a protein database has failed to find a match.

48
MALDI-MS peptide fingerprint(tryptic digest of a
single protein)
1247.70
all peaks are (MH)
1116.67
1375.76
trypsin autolysis
1505.77
1424.85
1665.89
2005.07
1287.73
2719.48
1574.20

1811.85
1849.12
2476.21
2550.52

1000
1500
2000
2500
3000
m/z
49
Mascot (Matrix Science) for peptide mass
fingerprints
enter peak list
50
Mascot (Matrix Science) for peptide mass
fingerprints
possible identification
51
Mascot (Matrix Science) for peptide mass
fingerprints
get more info on probable proteins
list of all possible matches
52
Mascot (Matrix Science) for peptide mass
fingerprints
53
Mascot (Matrix Science) for peptide mass
fingerprints
tryptic peptides that matched
peptides that did not match
54
Mascot (Matrix Science) for peptide mass
fingerprints
tryptic peptides in protein sequence
better mass accuracy improves identification
process
55
LC-MS/MS for protein identification

To provide further confirmation of the
identification, if a tandem mass spectrometer
(MS/MS) is available, peptide ions can be
dissociated in the mass spectrometer to provide
direct sequence information. Product ions from
an MS/MS spectrum can be compared to available
sequences using powerful software toolsl.
For a single sample, LC-MS/MS analysis included
two discrete steps (a) LC-MS peptide mapping to
identify peptide ions from the digestion mixture
and to deduce their molecular weights, and (b)
LC-MS/MS of the previously detected peptides to
obtain sequence information for protein
identification.

56
Automated LC-MS/MS and database searching

Current mass spectral technology permits the
generation of MS/MS data at an unprecedented
rate. Prior to the generation of powerful
computer-based database searching strategies, the
largest bottleneck in protein identification was
the manual interpretation of this MS/MS data to
extract the sequence information. Today, many
computer-based search strategies that employ
MS/MS data require no operator interpretation at
all.
Analogous to the approach described for peptide
fingerprinting, these programs take the
individual protein entries in a database and
electronically "digest" them to generate a list
of theoretical peptides for each protein.
However, in the use of MS/MS data, these
theoretical peptides are further manipulated to
generate a second level of lists which contain
theoretical fragment ion masses that would be
generated in the MS/MS experiment for each
theoretical peptide.

57
Automated LC-MS/MS and database searching

These programs simply compare the list of
experimentally determined fragment ion masses
from the MS/MS experiment of the peptide of
interest with the theoretical fragment ion masses
generated by the computer program.
The recent advent of data-dependant scanning
functions has permitted the unattended
acquisition of MS/MS data. An example of a raw
MS/MS data searching program that takes
particular advantage of this ability is SEQUEST.
SEQUEST will input the data from a data-dependant
LC/MS chromatogram and automatically strip out
all of the MS/MS information for each individual
peak, and submit it for database searching using
the strategy discussed above.
Each peak is treated as a separate data file,
making it especially useful for the on-line
separation and identification of individual
components in a protein mixture.
SEQUEST cross-correlates uninterpreted MS/MS mass
spectra of peptides from protein/nucleotide
databases. The software can analyze a single
spectrum or an entire LC-MS/MS peptide map.
No user interpretation of MS/MS spectra is
involved.

58
Match?
Proteolytic Digest
MS/MS
Experimental Fragment Masses
100
41.63
HPLC-MS
475.3
100
588.3
456.7
54.28
325.2
50
60.16
50
701.3
815.4
62.59
49.20
38.27
46.75
410.3
33.59
851.5
29.02
212.1
912.5
0
900
1000
25
30
35
40
45
50
55
60
m/z
Time (min)
59
Direct identification of proteins using mass
spectrometry

Removes the requirement to separate proteins by
electrophoresis, etc
MudPIT multidimensional protein identification
technology, or Shotgun approach
Protein lysate is digested with trypsin
The peptide mixture is loaded onto a strong
cation exchange (SCX) column (to separate on the
basis of charge). A discrete fraction of peptides
is displaced from the SCX column using a salt
step gradient to a reversed-phase (RP) column (to
separate on the basis of hydrophobicity).
This fraction is eluted from the RP column into
the MS. This iterative process is repeated,
obtaining the fragmentation patterns of peptides
in the original peptide mixture.
MS/MS spectra are used to identify the proteins
in the original protein complex.

Link et al. Nature Biotechnology 17, 676 (1999)
60
Large-scale analysis of the yeast proteome by
MudPIT

Yates and coworkers, Nature Biotech. (2001) 19,
242-247
Assigned 5,540 peptides to MS spectra leading to
the identification of 1,484 proteins from the S.
cerevisiae proteome
Of 6,216 ORFs in yeast genome, 83 have CAI
values between 0 and 0.20 (i.e., predicted to be
present at low levels) (Fig. A)
MudPIT data 791 or 53.3 of the proteins
identified have a CAI of lt0.2 (1.7 peptides per
protein) (Fig. B)
Number of peptides per protein increases with
increasing CAI (Fig. C)

61
Approaches for Protein Sequencing and
Identification
Top Down
MS/MS
MIRERICACVLALGMLTGFTHAFGSKDAAADGKPLVVTTIGMIADAVKNI
AQGDVHLKGLMGPGVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEV
FSKLRGSRLVVAVSETIPVSQRLSLEEAEFDPHVWFDVKLWSYSVKAVYE
SLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQSLPAERRVLVTAHD
AFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESS
IPHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNID
TIVAALAR
The molecular mass of an intact protein defines
the native covalent state of a genes product,
including the effects of post-transcriptional/tran
slational modifications, and associated
heterogeneity, that are modulated by the actions
of other gene products . Moreover, the
fragmentation pattern from large proteins can
generate sufficient information for
identification from sequence databases,
particularly when combined with accurate mass
measurements of both the intact molecule and its
product ions.
62
In-Source Decay (ISD) for Protein Sequencing

Peptides and large proteins can be fragmented by
ISD
Fragmentation occurs in the MALDI ion source
not generally well controlled
Reflectron TOF not necessary (linear TOF
sufficient to measure product ions

Complete sequence information not present, but
extensive stretches of sequence from the N-
and/or C-termini observed

cut
...TIDE...
63
MALDI-ISD-TOF Mass Spectrometry of Proteins
Sequence Information (In-Source Decay)
Protein 1
Molecular Weight Information
Protein 2
5000
10000
15000
20000
25000
m/z
64
MALDI-ISD-TOF Mass Spectrometry of Proteins

Mass Spectrometry-Based Methods for Protein Identification PowerPoint PPT Presentation