Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

Description:

Synopsis. MS/MS spectra provide evidence for the amino-acid sequence of functional ... of amino-acid sequence. Sensitive to small sequence variations. Synopsis ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 57
Provided by: umiac7
Category:

less

Transcript and Presenter's Notes

Title: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry


1
Direct Experimental Observation of Functional
Protein Isoforms by Tandem Mass Spectrometry
  • Nathan Edwards
  • Center for Bioinformatics and Computational
    Biology
  • University of Maryland, College Park

2
Synopsis
  • MS/MS spectra provide evidence for the amino-acid
    sequence of functional proteins.
  • Key concepts
  • Spectrum acquisition is unbiased
  • Direct observation of amino-acid sequence
  • Sensitive to small sequence variations

3
Synopsis
  • MS/MS spectra provide evidence for the amino-acid
    sequence of functional proteins.
  • Applications
  • Cancer biomarkers
  • Genome annotation

4
Mass Spectrometry for Proteomics
  • Measure mass of many (bio)molecules
    simultaneously
  • High bandwidth
  • Mass is an intrinsic property of all
    (bio)molecules
  • No prior knowledge required

5
Mass Spectrometer
  • ElectronMultiplier(EM)
  • Time-Of-Flight (TOF)
  • Quadrapole
  • Ion-Trap
  • MALDI
  • Electro-SprayIonization (ESI)

6
High Bandwidth
7
Mass is fundamental!
8
Mass Spectrometry for Proteomics
  • Measure mass of many molecules simultaneously
  • ...but not too many, abundance bias
  • Mass is an intrinsic property of all
    (bio)molecules
  • ...but need a reference to compare to

9
Mass Spectrometry for Proteomics
  • Mass spectrometry has been around since the turn
    of the century...
  • ...why is MS based Proteomics so new?
  • Ionization methods
  • MALDI, Electrospray
  • Protein chemistry automation
  • Chromatography, Gels, Computers
  • Protein / genome sequences
  • A reference for comparison

10
Sample Preparation for Peptide Identification
11
Single Stage MS
MS
m/z
12
Tandem Mass Spectrometry(MS/MS)
m/z
Precursor selection
m/z
13
Tandem Mass Spectrometry(MS/MS)
Precursor selection collision induced
dissociation (CID)
m/z
MS/MS
m/z
14
Peptide Identification
  • For each (likely) peptide sequence
  • 1. Compute fragment masses
  • 2. Compare with spectrum
  • 3. Retain those that match well
  • Peptide sequences from (any) sequence database
  • Swiss-Prot, IPI, NCBIs nr, ESTs, genomes, ...
  • Automated, high-throughput peptide identification
    in complex mixtures

15
Peptide Identification
  • ...can provide direct experimental evidence for
    the amino-acid sequence of functional proteins.
  • Evidence for
  • Functional protein isoforms
  • Translation start and frame
  • Proteins with short open-reading-frames

16
Why is this useful for ...... genome annotation?
  • Evidence for SNPs and alternative splicing stops
    with transcription
  • No genomic or transcript evidence for translation
    start-site.
  • Conservation doesnt stop at coding bases!
  • Statistical gene-finders struggle with
    micro-exons, translation start-site, and short
    ORFs.

17
Why is this useful for ...... cancer biomarkers?
  • Alternative splicing is the norm!
  • Only 20-25K human genes
  • Each gene makes many proteins
  • Some splicing is believed to be silencing
  • Lots of splicing in cancer
  • Proteins have clinical implications
  • Statistical biomarker discovery
  • Putative malfunctioning proteins

18
What can be observed?
  • Known coding SNPs
  • Novel coding mutations
  • Alternative splicing isoforms
  • Microexons ( non-cannonical splice-sites )
  • Alternative translation start-sites ( codons )
  • Alternative translation frames
  • Dark open-reading-frames

19
Splice Isoform
  • Human Jurkat leukemia cell-line
  • Lipid-raft extraction protocol, targeting T cells
  • von Haller, et al. MCP 2003.
  • LIME1 gene
  • LCK interacting transmembrane adaptor 1
  • LCK gene
  • Leukocyte-specific protein tyrosine kinase
  • Proto-oncogene
  • Chromosomal aberration involving LCK in
    leukemias.
  • Multiple significant peptide identifications

20
Splice Isoform
21
Novel Splice Isoform
22
Novel Mutation
  • HUPO Plasma Proteome Project
  • Pooled samples from 10 male 10 female healthy
    Chinese subjects
  • Plasma/EDTA sample protocol
  • Li, et al. Proteomics 2005. (Lab 29)
  • TTR gene
  • Transthyretin (pre-albumin)
  • Defects in TTR are a cause of amyloidosis.
  • Familial amyloidotic polyneuropathy
  • late-onset, dominant inheritance

23
Novel Mutation
Ala2?Pro associated with familial amyloid
polyneuropathy
24
Novel Mutation
25
Translation Start-Site
  • Human erythroleukemia K562 cell-line
  • Depth of coverage study
  • Resing et al. Anal. Chem. 2004.
  • THOC2 gene
  • Part of the heteromultimeric THO/TREX complex.
  • Initially believed to be a novel ORF
  • RefSeq mRNA in Jun 2007, no RefSeq protein
  • TrEMBL entry Feb 2005, no SwissProt entry
  • Genbank mRNA in May 2002 (complete CDS)
  • Plenty of EST support
  • 100,000 bases upstream of other isoforms

26
Translation Start-Site
27
Translation Start-Site
28
Translation Start-Site
29
Translation Start-Site
30
Easily distinguish minor sequence variations
  • Two B. anthracis Sterne a/ß SASP annotations
  • RefSeq/Gb MVMARN... (7441 Da)
  • CMR MARN... (7211 Da)
  • Intact proteins differ by 230 Da
  • 7441 Da vs 7211 Da
  • N-terminal tryptic peptides
  • MVMAR (606.3 Da), MVMARNR (876.4 Da), vs
  • MARNR (646.3 Da)
  • Very different MS/MS spectra

31
Bacterial Gene-Finding
  • Find all the open-reading-frames...

TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA
Stopcodon
Stopcodon
...courtesy of Art Delcher
32
Bacterial Gene-Finding
  • Find all the open-reading-frames......but
    they overlap which ones are correct?

Reversestrand
Stopcodon
ATCTTTTTACCGAGAAATCTATTTAAAGTACTTTTTATAACT
TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA
Stopcodon
Stopcodon
ShiftedStop
...courtesy of Art Delcher
33
Coding-Sequence Score
...courtesy of Art Delcher
34
Glimmer3 Performance
  • Glimmer3 trained compared to RefSeq genes with
    annotated function
  • Correct STOP
  • 99.6
  • Correct START
  • 84.3
  • Not all the genomes necessarily have
    carefully/accurately annotated start sites, so
    the results for number of correct starts may be
    suspect.

35
N-terminal peptides
  • (Protein) N-terminal peptides establish
  • start-site of known unexpected ORFs
  • Use
  • Directly to annotate genomes
  • Evaluate and improve algorithms
  • Map cross-species

36
N-terminal peptide workflows
  • Typical proteomics workflows sample peptides from
    the proteome randomly
  • Caulobacter crescentus (70)
  • 3733 Proteins (RefSeq Genome annot.)
  • 66K tryptic peptides (600 Da to 3000 Da)
  • 2085 N-terminal tryptic peptides (3)

37
N-terminal peptide workflow
  • Protect protein N-terminus
  • Digest to peptides
  • Chemically modify free peptide N-term
  • Use chem. mod. to capture unwanted peptides

Nat Biotech, Vol. 21, pp. 566-569, 2003.
38
Increasing N-terminal peptide coverage
  • Multiple (digest) enzymes
  • trypsin-R 60 (80)
  • acid lys-C trypsin 85 (94)
  • Repeated LC-MS/MS
  • Precursor Exclusion / Inclusion lists
  • MALDI / ESI
  • Protein separation and/or orthogonal
    fractionation

Anal Chem, Vol. 76, pp. 4193-4201, 2004.
39
Proteomics Informatics
  • Search spectra against
  • Entire bacterial genome
  • All Met initiated peptides or
  • Statistically likely Met initiated peptides.
  • Easily consider initial Met loss PTM, too
  • Off-the-shelf MS/MS search engines (Mascot /
    X!Tandem / OMSSA)

40
Other Practical Issues
  • Suitable for commonly available instrumentation
  • Only the sample prep. is (somewhat) novel.
  • Need living organism
  • Stage of life-cycle?
  • Bang for buck?
  • N-terminal peptides /
  • In discussions with JCVI (ex TIGR)
  • Possible pilot project?

41
Other Research Projects
  • Improving peptide identification by MS/MS
  • Spectral matching using HMMs
  • Combining search engine results
  • Spectral matching for detection and quantitation
  • Microorganism identification using MS
  • Live public web-site and database
  • (Inexact) uniqueness guarantees
  • Primer/Probe oligo design
  • Pathogen detection (DNA Peptide)
  • Significant false-positive peptide identifications

42
Spectral Matching
  • Detection vs. identification
  • Increased sensitivity
  • No novel peptides
  • NIST GC/MS Spectral Library
  • Identifies small molecules,
  • 100,000s of (consensus) spectra
  • Bundled/Sold with many instruments
  • Dot-product spectral comparison
  • Current project Peptide MS/MS

43
Peptide DLATVYVDVLK
44
Peptide DLATVYVDVLK
45
Hidden Markov Models for Spectral Matching
  • Capture statistical variation and consensus in
    peak intensity
  • Capture semantics of peaks
  • Extrapolate model to other peptides
  • Good specificity with superior sensitivity for
    peptide detection
  • Assign 1000s of additional spectra (w/ p-value lt
    10-5)

46
www.RMIDb.org
47
www.RMIDb.org
  • Statistics
  • 16.7 x 106 (6.4 x 106) protein sequences
  • 40,000 organisms, 19,700 species
  • 557 (415) complete genomes
  • Sources
  • TIGRs CMR, SwissProt, TrEMBL, Genbank Proteins,
    RefSeq Proteins Genomes
  • Inclusive Glimmer3 predictions on Genomes
  • Pfam and GO assignments using BOINC grid

48
www.RMIDb.org
Accessed from all over the world...
49
Uniqueness guarantees
  • 20-mer oligo signatures for B. anthracis
  • In all available strains as exact match
  • No (inexact) match to other Bacillus species

50
Uniqueness guarantees
  • Human genome primer design problem
  • 4-unique DNA 20-mers
  • Edit-distance 5 to any non-specific
    hybridization site
  • No such valid loci on Chr. 22!
  • Currently analyzing entire genome
  • 3-unique DNA 20-mers
  • Initial experiments suggest 0.01 valid
  • Approx. 1 valid oligo every 10,000 bases

51
Future Research Plans
  • Cancer biomarkers
  • Optimize proteomics workflow for protein sequence
    coverage
  • Improve informatics infrastructure to make
    interpretation easier
  • Identify splice variants in cancer cell-lines
    (MCF-7) and clinical brain tumor samples

52
Future Research Plans
  • Genome Annotation
  • Collect evidence for functional alternative
    splicing in public datasets into dbPEP.
  • Conduct pilot project for bacterial genome
    annotation with JCVI.
  • Improve informatics infrastructure to make
    interpretation easier.

53
Future Research Plans
  • Peptide Identification
  • Expand library of HMM models for high-confidence
    spectral matching
  • Spectral matching for biomarkers and quantitation
    (with Calibrant).
  • Specificity metric for peptides identified using
    MS/MS

54
Future Research Plans
  • Microorganism identification by mass
    spectrometry
  • Specificity of tandem mass spectra
  • Revamp RMIDb prototype
  • Incorporate spectral matching, top-down.

55
Future Research Plans
  • Oligonucleotide Design
  • Uniqueness oracle for inexact match in human
  • Integration with Primer3
  • Tiling, multiplexing, pooling, tag arrays

56
Acknowledgements
  • Catherine Fenselau, Steve Swatkoski
  • UMCP Biochemistry
  • Chau-Wen Tseng, Xue Wu
  • UMCP Computer Science
  • Cheng Lee, Brian Balgley
  • Calibrant Biosystems
  • PeptideAtlas, HUPO PPP, X!Tandem
  • Funding NIH/NCI, USDA/ARS
Write a Comment
User Comments (0)
About PowerShow.com