Computer Analysis of Mass Spectrometry Data - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Computer Analysis of Mass Spectrometry Data

Description:

Software and computational techniques for the identification of proteins and ... Supernatant transferred to fresh eppendorf. Sample transferred to target plate ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 81
Provided by: proteinseq
Category:

less

Transcript and Presenter's Notes

Title: Computer Analysis of Mass Spectrometry Data


1
Computer Analysis ofMass Spectrometry Data
  • David Perkins
  • Proteomics Section,
  • Hammersmith Hospital Campus,
  • Imperial College School of Medicine.

2
Introduction
  • Software and computational techniques for the
    identification of proteins and residue
    modifications using both MALDI and MS/MS data.

3
(No Transcript)
4
Peptide Mass Fingerprinting
A mass spectrum of the peptide mixture resulting
from the digestion of a protein by a
proteolytic enzyme
  • Choice of Enzyme
  • Missed Cleavages
  • Search Masses
  • Constraining the Protein Molecular Weight
  • Which masses to include in a search
  • Autolysis products
  • Modifications

5
Enzymatic Cleavage
Peptide Fragments
Native Protein
Enzyme
6
Choice of Enzyme
  • Enzymes of low specificity are next to useless as
    they produce a complex mixture of similar masses
  • For MALDI, Peptides of masses less than 500 Da
    should be avoided

7
Enzyme Specificity
Enzyme Cleave At Dont Cleave
N or Cterm
Trypsin KR P
C
Lys-C K P
C
Lys-C/P K
C
Arg-C R P
C
V8-E E P
C
V8-DE DE P
C
Chymotrypsin FYWLIVM P C
8
Missed Cleavages
  • Digests are usually not perfect
  • Cleavage sites may be missed by an enzyme
  • These partially cleaved peptides are known as
    partials
  • Reduce the discrimination of a search

9
Search Masses
  • Select masses which are large enough to provide
    discrimination
  • Larger masses are more likely to be partials
  • With Trypsin, a mass range of 1000 to 3000 Da is
    good
  • Mass tolerance is important in obtaining good
    discrimination

10
Constraining Protein Mass
  • To increase discrimination, the mass of the
    intact protein can be used in a search
  • This is dangerous since this may be just a
    fragment of an entire protein

11
Which Masses to Include ?
The optimum dataset for a peptide mass
fingerprint is all the correct peptides and none
of the wrong ones ! By correct, we mean that the
textbook cleavage rules were followed. In
practice, this rarely (if ever) happens.
  • Enzymatic cleavage not perfect
  • Sequence coverage may be poor
  • Noise

12
Autolysis Products
  • Some digests may be dominated by the autolysis
    peaks of the enzyme used
  • In these cases, the known masses of these
    products may be filtered

13
Residue Modifications
  • Some residues may be modified during the sample
    preparation procedure
  • This introduces discrepancies in the expected and
    observed masses
  • For example, Met residues are often oxidised

14
Sample Preparation for MALDI
  • Excise band from gel
  • Tryptic Digestion of gel fragment
  • Supernatant transferred to fresh eppendorf
  • Sample transferred to target plate

15
Sample Preparation Robot
16
MALDI Mass Spectrometer
  • Ions are generated by a LASER firing at the
    target plate
  • The time of firing of the LASER and the arrival
    time of the ions at the detector are known, the
    relative masses can then be calculated
  • Only singly charged ions are generated, other
    types of spectrometer may generate multiply
    charged ions

17
MALDI Internals
18
Micromass MALDI
19
Typical Fingerprint Spectrum
20
Isotopic Cluster
21
Poorly Resolved Peak
22
Database Searching with Peptide Mass fingerprints
  • Produce a theoretical digest of all the proteins
    in a database with a specific enzyme
  • Compare these theoretical masses with
    experimentally observed masses
  • Assign a score to matching peptides/proteins

23
Problems
  • Mixtures and contamination
  • Partial cleavage
  • Identifying real peaks
  • Residue modifications
  • Mass accuracy

24
MOWSE
  • One of the first programs for identifying
    proteins by peptide mass fingerprinting
  • Developed by Darryl Pappin and Alan Bleasby
  • Developed alongside the OWL non-redundant protein
    database

25
Problems with MOWSE
  • Databases had to be pre-indexed, these indexes
    are large and slow to build
  • Does not handle variable modifications
  • Indexing means that databases cant be regularly
    updated easily
  • Limited functionality

26
MASCOT
  • Take advantage of multi-processor systems
  • Totally web based
  • No pre-indexing of databases
  • Increased functionality
  • Copes with multiple modifications
  • Easily expandable
  • Increased speed

27
Search Speed
Search speed is very important as databases
increase in size and automation leads to a high
throughput of samples. Also, if the algorithms
are efficient more elaborate searches may be
undertaken, for instance with large numbers of
variable residue modifications and different
mass tolerance to attempt to make more sense of
data derived from mixtures or with contamination
  • Ability to use multiple processors when available
  • Very efficient I/O, databases may also be mapped
    to memory
  • Efficient cleavage site and mass calculation

28
Thread Models
  • Boss/Worker
  • Peer
  • Pipeline
  • MASCOT is based on the Boss/Worker model

29
Boss/Worker Model
Output
The Boss accepts input and then distributes the
work to other threads
30
Peer Model
Output
Output
Output
Each Thread is responsible for its own input
31
Pipeline Model
Input Stream
Output
Thread A
Thread B
Thread C
A single thread accepts input, passing the data
on to the next thread for further processing
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Related Search Methods
  • Masses may be combined with sequence information
    1234.5 seq(c-ABCD) seq(EF)
  • These searches are very valuable as even small
    amounts of sequence information may be very
    discriminating
  • Sequence information is derived from the partial
    interpretation of a MS/MS spectrum
  • Know as the sequence tag method

37
Composition Queries
  • Composition information may also be used with
    mass information to refine queries
  • Chemical or enzymatic analysis, such as N
    terminal analysis with Edman, may give
    composition information
  • A typical query would
    be 1234.5 comp(2H0M)

38
MASCOT Queries
  • One of the most powerful features of MASCOT is
    the ability to mix all the types of query in one
    search
  • MASCOT allows the user to specify a particular
    species to further increase search discrimination

39
Databases Searched with Peptide Mass Fingerprint
Data
  • Non-identical protein databases are the ideal
  • EST sequences are too short to contain meaningful
    information for these searches
  • Non-redundant databases may be problematic
  • MASCOT translates nucleic acid databases on the
    fly

40
MSDB
  • A non-identical protein sequence database
    designed for mass spectrometry searches
  • Additional information, such as multiple species
    lines, in the textual information
  • De-convolution of SWISSPROT and other sequences
  • Nightly updates
  • Links to source databases

41
Is The Protein Identified ?
  • Most samples are identified using just peptide
    mass fingerprinting
  • With the growth of databases, this trend will
    continue
  • Some samples do not have representatives in any
    of the databases, to sequence these proteins more
    analysis is required

42
MS/MS Analysis
  • Also known as tandem MS
  • Individual peptides from the enzymatic digests
    are fragmented further
  • From this ladder sequences may be reconstructed
  • Much more discriminating search than simple
    peptide mass fingerprinting

43
MS/MS Analysis
  • Carried out on nanospray/electrospray mass
    spectrometers
  • Rather than spotted on a target plate, the sample
    is introduced through an inlet from a capillary
  • Peptides identified by the MALDI analysis are
    fragmented inside the mass spectrometer and the
    resultant daughter ions observed

44
Stylized Nanospray Mass Spectrometer
45
Micromass QTOF
46
Finnigan Ion Trap
47
Daughter Ions
  • Unlike the MALDI, ions produced by
    electrospray/nanospray machines may carry
    multiple charges
  • Various types of ions are produced, categorized
    by their charge and their direction in the
    peptide sequence
  • Fortunately the peptides fragment at the peptide
    bonds

48
B and Y Fragment Ions
Y-ions from C to N terminus
B-ions from N to C terminus
49

























E
05
100
2.71
80
60
40
1571.45
20
1725.29
1814.37
500
1000
1500
2000
50
MASCOT Searches with MS/MS Data
  • In a similar fashion to peptide mass
    fingerprinting, the predicted fragment ion mass
    from each peptide of a database sequence are
    calculated
  • The calculated and observed ion masses are
    compared and given a score
  • Individual peptide scores are combined to give a
    protein score

51
Problems with MS/MS data
  • The type of daughter ions produced may be large
    and are dependant on the machine and analytic
    procedure used
  • Searches tend to be used with a no enzyme
    option which introduces a large number of
    calculations
  • Residue modifications are far more difficult to
    handle, the number of mass permutations being
    very large

52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
Databases Searched with MS/MS Data
  • Non-identical protein databases are the ideal
  • EST databases translated in 6 frames are very
    useful as individual peptides may be identified
  • Translated nucleic acid databases
  • Non-redundant databases create problems

63
De-Novo Sequencing
  • If the protein is still not identified, the
    sequence of a peptide has to be reconstructed
    from the MS/MS data
  • Very time consuming and demands a great deal of
    skill, noisy data is very problematic
  • Sequencing is carried out by finding mass
    differences between peaks that correspond to
    amino acid masses

64
Tags
  • Easy to find initial masses in ladder
  • Tags modify the fragmentation of the peptide
  • Reduce isobaric problems
  • Neutralise the adverse effects of certain
    residues on peptide fragmentation

65
Example Tags
66





2044.9 1933.7 1862.7 1763.7 1634.6 1521.6
1450.4 1321.4 1220.3 1106.5 959.4 831.3
760.4 689.3 618.2 517.0 446.3 375.1






Gln Ala Val Glu Xle
Ala Glu Thr Asn Phe
Gln Ala Ala Ala Thr
Ala Ala Thr Lys












256.1 327.3 426.2 555.5 668.2
739.2 868.2 969.2 1083.1 1230.6 1358.3
1429.4 1500.4 1571.5 1672.3 1743.5 1814.4 1915.5


E
05
100
2.71
y-ion series b-ion series
80
60
40
1571.45
20
1725.29
1814.37
500
1000
1500
2000
67
(No Transcript)
68
Automation
Automation is critical to maintain a high
throughput of samples. It is essential to produce
closer integration of machine control and data
analysis software
  • New generation of Mass Spectrometers, quadrupole
    machines with LASER sources
  • Laboratory Information Management Systems
  • Automated sample preparation

69
Laboratory Information Management System
Mass Spectrometer
Data Reduction Peak Processing
Submission into Microarray/Proteomics database
MASCOT Search Engine
Re-search after database updates
Protein Identified
Protein not Identified
Automatic report generation for sample submitter
Via WWW
Results database
70
Future of MASCOT
  • Homology searching
  • Post processing of results for easier
    interpretation
  • Distributed processing - Linux cluster. MASCOT is
    based on the Boss/Worker model so is easy to port
  • Development of a standard API to allow simpler
    automation and extensions to functionality

71
MASCOT Homology Searching
  • Identification dependant on at least some of the
    peptide sequences being identical to a database
    sequence
  • Homology searching (for instance allowing common
    substitutions to occur by default) would overcome
    this limitation
  • Lead to less selectivity and also increased
    search times

72
Post processing of Results
  • Allows easier interpretation by, e.g. removing
    all identical peptide matches from the report
    page
  • Text mining to interpret the results of a search,
    for instance are all the proteins identified
    involved in a particular cellular process ?
  • Important when dealing with quantitative studies

73
Distributed Processing
  • Ability to use as much processing power as
    possible when dealing with high throughput data,
    for instance the thousands of peptides from LC
    MS/MS
  • Implemented in MASCOT using a MPI style mechanism
  • has the ability to dynamically add/remove
    processors for data processing

74
Processing Farm
75
Standard API
  • A standard interface to MASCOT routines allowing
    users to, e.g produce a bespoke interface
  • Allows integration with instrument control
    software (although this is dependant on the
    goodwill of the manufacturers !)

76
MSDB developments
  • Inclusion of variable splicing regions from
    SWISSPROT
  • Integration of textual information from all
    source databases
  • Clustering of highly similar sequences into
    families with extra annotation
  • Inclusion of more translations from nucleic acid
    databases

77
Identification of proteins using short peptide
sequences
  • FASTS is most commonly used tool at the moment,
    but it is relatively slow and doesnt take into
    account peptide masses and other information
  • New functionality for MASCOT based on tri-peptide
    indices and using mass and residue modification
    information

78
MS/MS Data Mining
  • MS/MS data may contain useful information in
    addition to sequence
  • Statistical methods for mining MS/MS data for, eg
    fragmentation efficiency etc
  • Predictive tool for de-novo sequencing
  • Understanding of physical/chemical processes
    involved in fragmentation

79
Matrix Science
  • Dr. John Cottrell
  • Dr. David Creasy
  • URL http//www.matrixscience.com

80
Imperial College
  • Darryl Pappin (now director of research ABI)
  • Mike Bartlet-Jones
  • URL http//csc-fserve.hh.med.ic.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com