Mass Spectrometry-Based Methods for Protein Identification - PowerPoint PPT Presentation

Loading...

PPT – Mass Spectrometry-Based Methods for Protein Identification PowerPoint presentation | free to download - id: 711cb6-YjRjM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Mass Spectrometry-Based Methods for Protein Identification

Description:

Mass Spectrometry-Based Methods for Protein Identification Joseph A. Loo Department of Biological Chemistry David Geffen School of Medicine Department of Chemistry ... – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 67
Provided by: Josep391
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Mass Spectrometry-Based Methods for Protein Identification


1
Mass Spectrometry-Based Methods for Protein
Identification
Joseph A. Loo Department of Biological
Chemistry David Geffen School of
Medicine Department of Chemistry and
Biochemistry University of California Los
Angeles, CA USA
2
Genomics and ProteomicsCharacterizing many genes
and gene products simultaneously
3
Proteomics Aids Biological Research
complex protein mixture
Biology
protein identification protein modification protei
n abundance
protein separation
mass spectrometry
4
Proteomics - What is it?
  • An assay to systematically analyze the diverse
    properties of proteins
  • Biological processes are dynamic
  • A quantitative comparison of states is required
  • The study of protein expression and function on a
    genome scale
  • Purpose Examine altered gene expression
    pathways in disease states and under different
    environmental conditions

The completion of the human genome has provided
researchers with the blueprint for life, and
proteomics offers scientists the means for
analyzing the expressed genome.
5
Genome to Proteome
dsDNA
(Gene)
Transcription
mRNA
Translation
Protein
H2N
COOH
MTDLKASSLRALKLMDLTTLNDDDTDEKVIALCHQAKTPVGNTA
AICIYP 51 RFIPIARKTLKEQGTPEIRIATVTNFPHGNDDIDIAL
AETRAAIAYGADE 101 VDVVFPYRALMAGNEQVGFDLVKACKEACA
AANVLLKVIIETGELKDEAL 151 IRKASEISI
Mass spectrometry
The completion of the human genome has provided
researchers with the blueprint for life, and
proteomics offers scientists the means for
analyzing the expressed genome.
6
Approaches for Protein Identification
What is this protein?
  • Molecular weight
  • Isoelectric point
  • Amino acid composition
  • Other physical/chemical characteristics
  • Partial or complete amino acid sequence
  • Edman (N-terminal sequence) - if N-term. not
    blocked
  • C-terminal sequence - not commonly performed
  • Mass spectrometry-measured information

7
Protein Identification by Mass Spectrometry
2-D Gel Electrophoresis
150-
75-
40-
25-
MW x 103
18-
10-
6.5
6.0
5.5
5.0
4.5
pI
1547
1089
Peptide mass fingerprint by MALDI-TOF or
LC-ESI-MS. Additional sequence information can
be obtained by MS/MS.
2384
717
1857
1401
1700
1272
2791
500
2500
3000
1500
m/z
8
Mass SpectrometryA method to weigh molecules
A simple measurement of mass is used to confirm
the identity of a molecule, but it can be used
for much more
9
Mass Spectrometer for Proteomics
Pre-Separation
Ion Source
Liquid Chromatography
10
The Nobel Prize in Chemistry 2002
"for the development of methods for
identification and structure analyses of
biological macromolecules"
"for their development of soft desorption
ionisation methods for mass spectrometric
analyses of biological macromolecules"
11
Electrospray Generation of aerosols and droplets
12
Electrospray Ionization (ESI)
  • Multiple charging
  • More charges for larger molecules
  • MW range gt 150 kDa
  • Liquid introduction of analyte
  • Interface with liquid separation methods, e.g.
    liquid chromatography
  • Tandem mass spectrometry (MS/MS) for protein
    sequencing

ESI
MS
high voltage
highly charge droplets
20
19
18
21
17
16
22
15
14
500
700
900
1100
mass/charge (m/z)
13
ESI-MS of Large Proteins
distribution of multiply charged molecules
(M14H)14
(M15H)15
3323
3102
(M13H)13
3543
(M16H)16
ESI-MS (Q-TOF) pH 7.5
m/z
14
History of Electrospray Ionization
  • Malcolm Dole demonstrated the production of
    intact oligomers of polystyrene up to MW 500,000
  • mass analysis of large ions was problematic
  • John Fenn (Yale University)
  • Chemical engineer - expert in supersonic
    molecular beams
  • Began work on electrospray in 1981
  • Adapted ESI to operate on a more conventional
    mass spectrometer
  • Recognized that multiply charged ions were
    produced by ESI
  • Reduced the m/z range required

15
Electrospray process
106 charges for 30 micron droplet
  • Analyte dissolved in a suitable solvent flows
    through a small diameter capillary tube
  • Liquid in the presence of a high electric field
    generates a fine mist or aerosol spray of
    highly charged droplets

16
Matrix-assisted Laser Desorption/Ionization
(MALDI)
Time-of-Flight (TOF) Analyzer
detector
high voltage
MALDI
sample
laser
drift region
m1 m2 m3
17
MALDI Mass Spectrometry of Large Proteins
100
97430
MALDI-MS of rat MVP
(MH)
Intensity
(M2H)2
98563
48658
36446
58309
30811
50608
70405
90202
110000
129797
m/z
18
MALDI
sample and matrix
  • Developed by Tanaka (Japan) and Hillenkamp/Karas
    (Germany)
  • Peptide/protein analyte of interest is
    co-crystallized on the MALDI target plate with an
    appropriate matrix
  • small, highly conjugated organic molecules which
    strongly absorb energy at a particular wavelength
  • Energy is transferred to analyte indirectly,
    inducing desorption from target surface
  • Analyte is ionized by gas-phase proton transfer
    (perhaps from ionized matrix molecules)

pulsed laser light
peptide/protein ions desorbed from matrix
20 kV (sample stage or target)
19
MALDI matrices
2,5-dihydroxybenzoic acid (DHB) peptides and
proteins
4-hydroxy-?-cyanocinnamic acid (alpha-cyano or
4-HCCA) peptides
3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic
acid) proteins
matrices for 337 nm irradiation
20
MALDI
  • 337 nm irradiation is provided by a nitrogen (N2)
    laser
  • The target plate is inserted into the high vacuum
    region of the source and the sample is irradiated
    with a laser pulse. The matrix absorbs the laser
    energy and transfers energy to the analyte
    molecule. The molecules are desorbed and ionized
    during this stage of the process.
  • MALDI is most commonly interfaced to a
    time-of-flight (TOF) mass spectrometer.

21
R. Aebersold and M. Mann, Nature (2003), 422,
198-207.
22
Time-of-Flight Mass Spectrometer
v1 m1
v2 m2
v3 m3
detector
drift region (L)
high voltage
Principal of Operation of Linear TOF A
time-of-flight mass spectrometer measures the
mass-dependent time it takes ions of different
masses to move from the ion source to the
detector. This requires that the starting time
(the time at which the ions leave the ion source)
is well-defined. Recall that the kinetic energy
of an ion is
where ? is ion velocity, m is mass, e is
charge on electron, and V is electric field.
The ion velocity, ?, is also the length of the
flight path, L , divided by the flight time, t
Substituting this expression for ? into the
kinetic energy relation, we can derive the
working equation for the time-of-flight mass
spectrometer
mass is proportional to (time)2
23
Approaches for Protein Sequencing and
Identification
Top Down
MS/MS
MIRERICACVLALGMLTGFTHAFGSKDAAADGKPLVVTTIGMIADAVKNI
AQGDVHLKGLMGPGVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEV
FSKLRGSRLVVAVSETIPVSQRLSLEEAEFDPHVWFDVKLWSYSVKAVYE
SLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQSLPAERRVLVTAHD
AFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESS
IPHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNID
TIVAALAR
MS/MS
Enzymatic or chemical degradation
Bottom Up
24
Identification of proteins from gels
  • Proteins are separated first by high resolution
    two-dimensional polyacrylamide gel
    electrophoresis and then stained. At this point,
    to identify an individual or set of protein
    spots, several options can be considered by the
    researcher, depending on availability of
    techniques.
  • For protein spots that appear to be relatively
    abundant (e.g., more than 1 pmol), traditional
    protein characterization methods may be employed.
  • Methods such as amino acid analysis and Edman
    sequencing can be used to provide necessary
    protein identification information. With 2-DE,
    approximate molecular weight and isoelectric
    point characteristics are provided. Augmented
    with information on amino acid composition and/or
    amino-terminal sequence, a confident
    identification can be obtained.
  • The sensitivity gains of using MS allows for the
    identification of proteins below the one pmol
    level and in many cases in the femtomole regime.

25
Protein Identification by Mass Spectrometry
2-D Gel Electrophoresis
150-
75-
40-
25-
MW x 103
18-
10-
6.5
6.0
5.5
5.0
4.5
pI
1547
1089
Peptide mass fingerprint by MALDI-TOF or
LC-ESI-MS. Additional sequence information can
be obtained by MS/MS.
2384
717
1857
1401
1700
1272
2791
500
2500
3000
1500
m/z
26
Protein Cleavage
  • For the application of mass spectrometry for
    protein identification, the protein bands/spots
    from a 2-D gel are excised and are exposed to a
    highly specific enzymatic cleavage reagent (e.g.,
    trypsin cleaves on the C-terminal side of
    arginine and lysine residues). The resulting
    tryptic fragments are extracted from the gel
    slice and are then subjected to MS-methods. One
    of the major barriers to high throughput in the
    proteomic approach to protein identification is
    the in-gel proteolytic digestion and subsequent
    extraction of the proteolytic peptides from the
    gel. Common protocols for this process are often
    long and labor intensive.

protein digestion robot
27
Protein cleavage - proteolysis and chemical
methods
28
Mass spectrometry-based protein identification
  • A mass spectrum of the resulting digest products
    produces a peptide map or a peptide
    fingerprint.
  • The measured masses can be compared to
    theoretical peptide maps derived from database
    sequences for identification. There are a few
    choices of mass analysis that can be selected
    from this point, depending on available
    instrumentation and other factors. The resulting
    peptide fragments can be subjected to MALDI-MS or
    ESI-MS analysis.
  • A small aliquot of the digest solution can be
    directly analyzed by MALDI-MS to obtain a peptide
    map. The resulting sequence coverage (relative
    to the entire protein sequence) displayed from
    the total number of tryptic peptides observed in
    the MALDI mass spectrum can be quite high, i.e.,
    greater than 80 of the sequence, although it can
    vary considerably depending on the protein,
    sample amount, etc. The measured molecular
    weights of the peptide fragments along with the
    specificity of the enzyme employed can be
    searched and compared against protein sequence
    databases using a number of computer searching
    routines available on the Internet.

29
Protein identification from peptide fragments
Tryptic peptides
Mass spectrum
Protein
Theoretical mass spectrum
Theoretical tryptic peptides
Protein sequence
SEMHIKHYTTK ILGFR EEGDSCPLK QWDDSK ILVAVADK LLEYEE
K ILLFNSAK YLLDESSTYK LMHDDSV
SEMHIKHYTTKILGFREEGDSCPLKQWDDSKILVAVADKLLEYEEKILLF
NSAKYLLDESSTYKLMHDDSV
30
MALDI-MS of tryptic peptides
1247.70
all peaks are (MH)
1116.67
1375.76
trypsin autolysis
1505.77
1424.85
1665.89
2005.07
1287.73

2719.48
1574.20
1811.85

1849.12
2476.21
2550.52
1000
1500
2000
2500
3000
m/z
ARIIVVTSGK GGVGKTTSSA AIATGLAQKG KKTVVIDFDI
GLRNLDLIMG CERRVVYDFV NVIQGDATLN QALIKDKRTE
NLYILPASQT RDKDALTREG VAKVLDDLKA MDFEFIVCDS
PAGIETGALM ALYFADEAII TTNPEVSSVR DSDRILGILA
SKSRRAENGE EPIKEHLLLT RYNPGRVSRG DMLSMEDVLE
ILRIKLVGVI PEDQSVLRAS NQGEPVILDI NADAGKAYAD
TVERLLGER PFRFIEEEKK GFLKRLFGG
31
ESI-MS and LC-MS for protein identification
  • An approach for peptide mapping similar to
    MALDI-MS uses ESI-MS. A peptide map can be
    obtained by analysis of the peptide mixture by
    ESI-MS. An advantage of ESI is its ease of
    coupling to separation methodologies such as
    HPLC. Thus, alternatively, to reduce the
    complexity of the mixture, the peptides can be
    separated by HPLC with subsequent mass
    measurement by on-line ESI-MS. The measured
    masses can be compared to sequence databases.

9.4
LC-MS with ESI
8.4
9.8
8.9
7.7
6.8
6.2
Time (min)
965.3
(M2H)2
MW 1928.6 Da
629.0
m/z
32
LC-MS/MS for protein identification
  • An improvement in throughput of the overall
    method can be obtained by performing LC-MS/MS in
    the data dependant mode. As full scan mass
    spectra are acquired continuously in LC-MS mode,
    any ion detected with a signal intensity above a
    pre-defined threshold will trigger the mass
    spectrometer to switch over to MS/MS mode. Thus,
    the mass spectrometer switches back and forth
    between MS- (molecular mass information) and
    MS/MS mode (sequence information) in a single LC
    run. The data dependant scanning capability can
    dramatically increase the capacity and throughput
    for protein identification.

9.4
y12
8.4
LC-MS
LC-MS/MS
1261.4
9.8
y10
6.8
y13
1374.5
Time (min)
b6
668.4
b8
965.3
838.5
b5
y9
y11
y14
(M2H)2
MS/MS
y8
1474.4
y4
629.0
b3
m/z
m/z
33
Peptide sequencing by mass spectrometry
N-term.
C-term.
  • Peptide molecules are fragmented by collisionally
    activated dissociation (CAD)
  • collisions with neutral background gas molecules
    (nitrogen, argon, etc)
  • typically dissociate by cleavage of -CO-NH- bond

A
N-terminal product ions
34
Peptide sequencing by mass spectrometry
  • Ideally, one can measure the spacings between
    product ion peaks to deduce the sequence
  • if each amide bond dissociates with equal
    probability
  • if only a single amide bond fragments for each
    molecule
  • if only C-terminal or N-terminal products ions
    are formed
  • In reality, this is not the case

C-terminal product ions
35
Nomenclature for MS Sequencing of Peptides
Klaus Biemann, MIT
subscript denotes the number of residues
contained in product ion
N-terminal fragments
b1
b2
b3
H2N - C - C - N - C - C - N - C - C - N - C - COOH
y3
y2
y1
C-terminal fragments
36
Nomenclature for MS Sequencing of Peptides
  • Low-energy collisions promote fragmentation of a
    peptide primarily along the peptide backbone
  • Peptide fragmentation which maintains the charge
    on the C terminus is designated a y-ion
  • Fragmentation which maintains the charge on the N
    terminus is designated a b-ion
  • Low energy collisions ion trap, QQQ, QTOF,
    FT-ICR
  • High energy collisions TOF-TOF
  • cleavage of amino acid side chain bonds (d-ion
    and w-ion)
  • differentiate Leu vs. Ile

37
Peptide Sequencing by Mass Spectrometry
y4-14
LVDKVIGITNEEAISTAR
Cysteine Synthase A
b3-17
MS/MS of 2 charged tryptic peptides yield
(often) 1 charged product ions (but 2 charged
products can be observed as well)
y12
1261.4
y10
mixture of b-ions and y-ions are present
1091.5
y13
1374.5
Rel. Abund.
b6
668.4
b8
b5
y9
838.5
y14
555.4
990.5
y11
y5
b7
1474.4
y4
b4
b9
y6
y8
b3
y7
b14
b10
b11
b12
b13
b16
b17
b15
m/z
38
Computer-based Sequence Searching Strategies
  • A list of experimentally determined masses is
    compared to lists of computer-generated
    theoretical masses prepared from a database of
    protein primary sequences. With the current
    exponential growth in the generation of genomic
    data, these databases are expanding every day.
  • There are typically three types of search
    strategies employed
  • searching with peptide fingerprint data
  • searching with sequence data
  • searching with raw MS/MS data.
  • One limiting factor that must be considered for
    all of the approaches is that they can only
    identify proteins that have been identified and
    reside within an available database, or very
    homologous to one that resides in the database.

39
Searching with Peptide Fingerprints
  • The majority of the available search engines
    allow one to define certain experimental
    parameters to optimize a particular search.
  • Minimum number of peptides to be matched
  • Allowable mass error
  • Monoisotopic versus average mass data
  • Mass range of starting protein
  • Type of protease used for digestion
  • Information about potential protein modification,
    such as N- and C-terminal modification,
    carboxymethylation, oxidized methionines, etc.

40
Searching with Peptide Fingerprints
  • Most protein databases contain primary sequence
    information only
  • Any shift in mass incorporated into the primary
    sequence as a result of post-translational
    modification will result in an experimental mass
    that is in disagreement with the theoretical
    mass.
  • Modifications such as glycation and
    phosphorylation can result in missed
    identifications.
  • A single amino acid substitution can shift the
    mass of a peptide to such a degree that even a
    protein with a great deal of homology with
    another in the database can not be identified.

41
Searching with Peptide Fingerprints
  • A number of factors affect the utility of peptide
    fingerprinting.
  • The greater the experimental mass accuracy, the
    narrower you can set your search tolerances,
    thereby increasing your confidence in the match,
    and decreasing the number of "false positive"
    responses.
  • A common practice used to increase mass accuracy
    in peptide fingerprinting is to employ an
    autolysis fragment from the proteolytic enzyme as
    an internal standard to calibrate a MALDI mass
    spectrum.
  • Peptide fingerprinting is also amenable to the
    identification of proteins in complex mixtures.
  • Peptides generated from the digest of a protein
    mixture will simply return two or more results
    that are a "good" fit.
  • Peptides that are "left over" in a peptide
    fingerprint after the identification of one
    component can be resubmitted for the possible
    identification of another component.

42
Web addresses of some representative internet
resources for protein identification from mass
spectrometry data
43
Mascot
  • Among the first programs for identifying proteins
    by peptide mass fingerprinting, MOWSE, developed
    out of a collaboration between Imperial Cancer
    Research Fund (ICRF) and SERC Daresbury
    Laboratory, UK.
  • The name chosen was an acronym of Molecular
    Weight Search. The MOWSE databases were fully
    indexed so as to allow very rapid searching and
    retrieval of sequence data. Subsequently, the
    software was further developed and renamed
    Mascot.
  • Licensed and distributed by Matrix Science Ltd.
  • Specialized tools include Peptide Mass
    Fingerprint, Sequence Query, and MS/MS Ion
    Search.
  • Search output Web-based.
  • Good visual representation of search quality
    (graphical probability chart).
  • Simple graphical user interface.
  • Reports MOWSE scores as a quantitative measure of
    search quality.

44
Mowse Scoring
  • Rather than just counting the number of matching
    peptides, Mowse uses empirically determined
    factors to assign a statistical weight to each
    individual peptide match. Rapid identification
    of proteins by peptide-mass fingerprinting. Curr.
    Biol. 3327, 1993.)
  • Scoring scheme assigns more weight to matches of
    higher molecular weight peptides (more
    discriminating).
  • Compensates for the non-random distribution of
    fragment molecular weights in proteins of
    different sizes.
  • Was first protein identification program to
    recognize that the relative abundance of peptides
    of a given length in a proteolytic digest depends
    on the lengths of both peptide and protein.
  • Developed for MALDI peptide mass fingerprinting.
  • Probability-Based Mowse
  • Mascot incorporates a probability-based enhanced
    Mowse algorithm, described in Perkins et al.
    (Probability-based protein identification by
    searching sequence databases using mass
    spectrometry data. Electrophoresis 203551-3567,
    1999).
  • A simple rule can be used to judge whether a
    result is significant or not. Different types of
    matching (peptide masses and fragment ions) can
    be combined in a single search.

45
Databases
  • Three components are required for database
    searching support of proteomics MALDI or MS/MS
    data, the algorithms used to search protein
    databases with the MALDI or MS/MS data, and the
    protein databases themselves.
  • The protein databases can be as small as one
    protein, can be large, public domain databases of
    all known and predicted proteins, or may be
    predicted open reading frames based on genomic
    sequence.
  • A major challenge for database searching is that
    these protein databases are constantly changing,
    making database search results potentially
    obsolete as new entries are added that better fit
    the MALDI or MS data.
  • Even as genomes are completed there is still flux
    as new coding regions are identified and novel
    mechanisms of increased translational complexity
    are better understood, such as alternative splice
    products, RNA editing, and ribosome slippage
    leading to novel, unexpected translation products.

46
Databases
  • NCBI non-redundant (NCBInr)
  • Non-redundant database from the National Center
    for Biotechnology Information for use with their
    search tools BLAST and Entrez comprised of
    translated sequences from the Genbank /EMBL/DDBJ
    consortium, SwissProt, Protein Information
    Resource (PIR), and Brookhaven Protein Data Bank
    (PDB).
  • New releases are published bimonthly while
    updates occur daily.
  • OWL
  • OWL is comprised of Swiss-Prot, PIR, translated
    Genbank, and NRL-3D (PDB). All sequences are
    compared to Swiss-Prot to remove identical and
    trivially different sequences. Has not been
    updated since May, 1999.
  • SWISSPROT
  • While SwissProt contains only a subset of
    proteins, the proteins in this database are much
    better annotated and the sequences are much more
    reliable than those available in any other
    database.
  • MSDB
  • Comprehensive, non-identical protein sequence
    database maintained by the Proteomics Department
    at the Hammersmith Campus of Imperial College
    London. Designed specifically for MS
    applications.

47
Databases
  • EST Clusters (dBEST)
  • Division of GenBank that contains "single-pass"
    cDNA sequences, or Expressed Sequence Tags
    (ESTs), from a number of organisms.
  • ESTs are relatively short, usually 3 end
    sequences from isolated mRNA.
  • ESTs tend to be highly redundant and the
    sequence is much lower quality than from other
    sources. An advantage to using these ESTs is
    that they represent only expressed sequences (no
    introns) and include alternative splice variants
    their length, redundancy, and low quality are far
    improved by using clustered ESTs, such as the
    Compugen clusters.
  • The EST database has some redundancy because it
    contains all possible combinations of alternative
    splice products, and so it can be very large (and
    slow to search).
  • During a Mascot search, the nucleic acid
    sequences are translated in all six reading
    frames. dbEST is a very large database, and is
    divided into three sections EST_human,
    EST_mouse, and EST_others. Even so, searches of
    these databases take far longer than a search of
    one of the non-redundant protein databases. You
    should only search an EST database if a search of
    a protein database has failed to find a match.

48
MALDI-MS peptide fingerprint(tryptic digest of a
single protein)
1247.70
all peaks are (MH)
1116.67
1375.76
trypsin autolysis
1505.77
1424.85
1665.89
2005.07
1287.73
2719.48
1574.20

1811.85
1849.12
2476.21
2550.52

1000
1500
2000
2500
3000
m/z
49
Mascot (Matrix Science) for peptide mass
fingerprints
enter peak list
50
Mascot (Matrix Science) for peptide mass
fingerprints
possible identification
51
Mascot (Matrix Science) for peptide mass
fingerprints
get more info on probable proteins
list of all possible matches
52
Mascot (Matrix Science) for peptide mass
fingerprints
53
Mascot (Matrix Science) for peptide mass
fingerprints
tryptic peptides that matched
peptides that did not match
54
Mascot (Matrix Science) for peptide mass
fingerprints
tryptic peptides in protein sequence
better mass accuracy improves identification
process
55
LC-MS/MS for protein identification
  • To provide further confirmation of the
    identification, if a tandem mass spectrometer
    (MS/MS) is available, peptide ions can be
    dissociated in the mass spectrometer to provide
    direct sequence information. Product ions from
    an MS/MS spectrum can be compared to available
    sequences using powerful software toolsl.
  • For a single sample, LC-MS/MS analysis included
    two discrete steps (a) LC-MS peptide mapping to
    identify peptide ions from the digestion mixture
    and to deduce their molecular weights, and (b)
    LC-MS/MS of the previously detected peptides to
    obtain sequence information for protein
    identification.

56
Automated LC-MS/MS and database searching
  • Current mass spectral technology permits the
    generation of MS/MS data at an unprecedented
    rate. Prior to the generation of powerful
    computer-based database searching strategies, the
    largest bottleneck in protein identification was
    the manual interpretation of this MS/MS data to
    extract the sequence information. Today, many
    computer-based search strategies that employ
    MS/MS data require no operator interpretation at
    all.
  • Analogous to the approach described for peptide
    fingerprinting, these programs take the
    individual protein entries in a database and
    electronically "digest" them to generate a list
    of theoretical peptides for each protein.
  • However, in the use of MS/MS data, these
    theoretical peptides are further manipulated to
    generate a second level of lists which contain
    theoretical fragment ion masses that would be
    generated in the MS/MS experiment for each
    theoretical peptide.

57
Automated LC-MS/MS and database searching
  • These programs simply compare the list of
    experimentally determined fragment ion masses
    from the MS/MS experiment of the peptide of
    interest with the theoretical fragment ion masses
    generated by the computer program.
  • The recent advent of data-dependant scanning
    functions has permitted the unattended
    acquisition of MS/MS data. An example of a raw
    MS/MS data searching program that takes
    particular advantage of this ability is SEQUEST.
  • SEQUEST will input the data from a data-dependant
    LC/MS chromatogram and automatically strip out
    all of the MS/MS information for each individual
    peak, and submit it for database searching using
    the strategy discussed above.
  • Each peak is treated as a separate data file,
    making it especially useful for the on-line
    separation and identification of individual
    components in a protein mixture.
  • SEQUEST cross-correlates uninterpreted MS/MS mass
    spectra of peptides from protein/nucleotide
    databases. The software can analyze a single
    spectrum or an entire LC-MS/MS peptide map.
  • No user interpretation of MS/MS spectra is
    involved.

58
Match?
Proteolytic Digest
MS/MS
Experimental Fragment Masses
100
41.63
HPLC-MS
475.3
100
588.3
456.7
54.28
325.2
50
60.16
50
701.3
815.4
62.59
49.20
38.27
46.75
410.3
33.59
851.5
29.02
212.1
912.5
0
900
1000
25
30
35
40
45
50
55
60
m/z
Time (min)
59
Direct identification of proteins using mass
spectrometry
  • Removes the requirement to separate proteins by
    electrophoresis, etc
  • MudPIT multidimensional protein identification
    technology, or Shotgun approach
  • Protein lysate is digested with trypsin
  • The peptide mixture is loaded onto a strong
    cation exchange (SCX) column (to separate on the
    basis of charge). A discrete fraction of peptides
    is displaced from the SCX column using a salt
    step gradient to a reversed-phase (RP) column (to
    separate on the basis of hydrophobicity).
  • This fraction is eluted from the RP column into
    the MS. This iterative process is repeated,
    obtaining the fragmentation patterns of peptides
    in the original peptide mixture.
  • MS/MS spectra are used to identify the proteins
    in the original protein complex.

Link et al. Nature Biotechnology 17, 676 (1999)
60
Large-scale analysis of the yeast proteome by
MudPIT
  • Yates and coworkers, Nature Biotech. (2001) 19,
    242-247
  • Assigned 5,540 peptides to MS spectra leading to
    the identification of 1,484 proteins from the S.
    cerevisiae proteome
  • Of 6,216 ORFs in yeast genome, 83 have CAI
    values between 0 and 0.20 (i.e., predicted to be
    present at low levels) (Fig. A)
  • MudPIT data 791 or 53.3 of the proteins
    identified have a CAI of lt0.2 (1.7 peptides per
    protein) (Fig. B)
  • Number of peptides per protein increases with
    increasing CAI (Fig. C)

61
Approaches for Protein Sequencing and
Identification
Top Down
MS/MS
MIRERICACVLALGMLTGFTHAFGSKDAAADGKPLVVTTIGMIADAVKNI
AQGDVHLKGLMGPGVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEV
FSKLRGSRLVVAVSETIPVSQRLSLEEAEFDPHVWFDVKLWSYSVKAVYE
SLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQSLPAERRVLVTAHD
AFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESS
IPHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNID
TIVAALAR
The molecular mass of an intact protein defines
the native covalent state of a genes product,
including the effects of post-transcriptional/tran
slational modifications, and associated
heterogeneity, that are modulated by the actions
of other gene products . Moreover, the
fragmentation pattern from large proteins can
generate sufficient information for
identification from sequence databases,
particularly when combined with accurate mass
measurements of both the intact molecule and its
product ions.
62
In-Source Decay (ISD) for Protein Sequencing
  • Peptides and large proteins can be fragmented by
    ISD
  • Fragmentation occurs in the MALDI ion source
  • not generally well controlled
  • Reflectron TOF not necessary (linear TOF
    sufficient to measure product ions
  • Complete sequence information not present, but
    extensive stretches of sequence from the N-
    and/or C-termini observed

cut
...TIDE...
63
MALDI-ISD-TOF Mass Spectrometry of Proteins
Sequence Information (In-Source Decay)
Protein 1
Molecular Weight Information
Protein 2
5000
10000
15000
20000
25000
m/z
64
MALDI-ISD-TOF Mass Spectrometry of Proteins
  • ISD generally yields c-ions or z-ions

Sequence information from the N-terminus
A
L
Y
G
G
E
G
F
E
D
H
R
L
K/Q
E
D
PW?
About PowerShow.com