Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

Description:

Direct Experimental Observation of Functional Protein ... Ala2Pro associated with familial amyloid polyneuropathy. Novel Mutation. Translation Start-Site ... – PowerPoint PPT presentation

Number of Views:208

Avg rating:3.0/5.0

Slides: 57

Provided by: cbcb6

Category:

more less

Transcript and Presenter's Notes

Title: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

1
Direct Experimental Observation of Functional
Protein Isoforms by Tandem Mass Spectrometry

Nathan Edwards
Center for Bioinformatics and Computational
Biology
University of Maryland, College Park

2
Synopsis

MS/MS spectra provide evidence for the amino-acid
sequence of functional proteins.
Key concepts
Spectrum acquisition is unbiased
Direct observation of amino-acid sequence
Sensitive to small sequence variations

3
Synopsis

MS/MS spectra provide evidence for the amino-acid
sequence of functional proteins.
Applications
Cancer biomarkers
Genome annotation

4
Mass Spectrometry for Proteomics

Measure mass of many (bio)molecules
simultaneously
High bandwidth
Mass is an intrinsic property of all
(bio)molecules
No prior knowledge required

5
Mass Spectrometer

ElectronMultiplier(EM)

Time-Of-Flight (TOF)
Quadrapole
Ion-Trap

MALDI
Electro-SprayIonization (ESI)

6
High Bandwidth
7
Mass is fundamental!
8
Mass Spectrometry for Proteomics

Measure mass of many molecules simultaneously
...but not too many, abundance bias
Mass is an intrinsic property of all
(bio)molecules
...but need a reference to compare to

9
Mass Spectrometry for Proteomics

Mass spectrometry has been around since the turn
of the century...
...why is MS based Proteomics so new?
Ionization methods
MALDI, Electrospray
Protein chemistry automation
Chromatography, Gels, Computers
Protein / genome sequences
A reference for comparison

10
Sample Preparation for Peptide Identification
11
Single Stage MS
MS
m/z
12
Tandem Mass Spectrometry(MS/MS)
m/z
Precursor selection
m/z
13
Tandem Mass Spectrometry(MS/MS)
Precursor selection collision induced
dissociation (CID)
m/z
MS/MS
m/z
14
Peptide Identification

For each (likely) peptide sequence
1. Compute fragment masses
2. Compare with spectrum
3. Retain those that match well
Peptide sequences from (any) sequence database
Swiss-Prot, IPI, NCBIs nr, ESTs, genomes, ...
Automated, high-throughput peptide identification
in complex mixtures

15
Peptide Identification

...can provide direct experimental evidence for
the amino-acid sequence of functional proteins.
Evidence for
Functional protein isoforms
Translation start and frame
Proteins with short open-reading-frames

16
Why is this useful for ...... genome annotation?

Evidence for SNPs and alternative splicing stops
with transcription
No genomic or transcript evidence for translation
start-site.
Conservation doesnt stop at coding bases!
Statistical gene-finders struggle with
micro-exons, translation start-site, and short
ORFs.

17
Why is this useful for ...... cancer biomarkers?

Alternative splicing is the norm!
Only 20-25K human genes
Each gene makes many proteins
Some splicing is believed to be silencing
Lots of splicing in cancer
Proteins have clinical implications
Statistical biomarker discovery
Putative malfunctioning proteins

18
What can be observed?

Known coding SNPs
Novel coding mutations
Alternative splicing isoforms
Microexons ( non-cannonical splice-sites )
Alternative translation start-sites ( codons )
Alternative translation frames
Dark open-reading-frames

19
Splice Isoform

Human Jurkat leukemia cell-line
Lipid-raft extraction protocol, targeting T cells
von Haller, et al. MCP 2003.
LIME1 gene
LCK interacting transmembrane adaptor 1
LCK gene
Leukocyte-specific protein tyrosine kinase
Proto-oncogene
Chromosomal aberration involving LCK in
leukemias.
Multiple significant peptide identifications

20
Splice Isoform
21
Novel Splice Isoform
22
Novel Mutation

HUPO Plasma Proteome Project
Pooled samples from 10 male 10 female healthy
Chinese subjects
Plasma/EDTA sample protocol
Li, et al. Proteomics 2005. (Lab 29)
TTR gene
Transthyretin (pre-albumin)
Defects in TTR are a cause of amyloidosis.
Familial amyloidotic polyneuropathy
late-onset, dominant inheritance

23
Novel Mutation
Ala2?Pro associated with familial amyloid
polyneuropathy
24
Novel Mutation
25
Translation Start-Site

Human erythroleukemia K562 cell-line
Depth of coverage study
Resing et al. Anal. Chem. 2004.
THOC2 gene
Part of the heteromultimeric THO/TREX complex.
Initially believed to be a novel ORF
RefSeq mRNA in Jun 2007, no RefSeq protein
TrEMBL entry Feb 2005, no SwissProt entry
Genbank mRNA in May 2002 (complete CDS)
Plenty of EST support
100,000 bases upstream of other isoforms

26
Translation Start-Site
27
Translation Start-Site
28
Translation Start-Site
29
Translation Start-Site
30
Easily distinguish minor sequence variations

Two B. anthracis Sterne a/ß SASP annotations
RefSeq/Gb MVMARN... (7441 Da)
CMR MARN... (7211 Da)
Intact proteins differ by 230 Da
7441 Da vs 7211 Da
N-terminal tryptic peptides
MVMAR (606.3 Da), MVMARNR (876.4 Da), vs
MARNR (646.3 Da)
Very different MS/MS spectra

31
Bacterial Gene-Finding

Find all the open-reading-frames...

TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA
Stopcodon
Stopcodon
...courtesy of Art Delcher
32
Bacterial Gene-Finding

Find all the open-reading-frames......but
they overlap which ones are correct?

Reversestrand
Stopcodon
ATCTTTTTACCGAGAAATCTATTTAAAGTACTTTTTATAACT
TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA
Stopcodon
Stopcodon
ShiftedStop
...courtesy of Art Delcher
33
Coding-Sequence Score
...courtesy of Art Delcher
34
Glimmer3 Performance

Glimmer3 trained compared to RefSeq genes with
annotated function
Correct STOP
99.6
Correct START
84.3
Not all the genomes necessarily have
carefully/accurately annotated start sites, so
the results for number of correct starts may be
suspect.

35
N-terminal peptides

(Protein) N-terminal peptides establish
start-site of known unexpected ORFs
Use
Directly to annotate genomes
Evaluate and improve algorithms
Map cross-species

36
N-terminal peptide workflows

Typical proteomics workflows sample peptides from
the proteome randomly
Caulobacter crescentus (70)
3733 Proteins (RefSeq Genome annot.)
66K tryptic peptides (600 Da to 3000 Da)
2085 N-terminal tryptic peptides (3)

37
N-terminal peptide workflow

Protect protein N-terminus
Digest to peptides
Chemically modify free peptide N-term
Use chem. mod. to capture unwanted peptides

Nat Biotech, Vol. 21, pp. 566-569, 2003.
38
Increasing N-terminal peptide coverage

Multiple (digest) enzymes
trypsin-R 60 (80)
acid lys-C trypsin 85 (94)
Repeated LC-MS/MS
Precursor Exclusion / Inclusion lists
MALDI / ESI
Protein separation and/or orthogonal
fractionation

Anal Chem, Vol. 76, pp. 4193-4201, 2004.
39
Proteomics Informatics

Search spectra against
Entire bacterial genome
All Met initiated peptides or
Statistically likely Met initiated peptides.
Easily consider initial Met loss PTM, too
Off-the-shelf MS/MS search engines (Mascot /
X!Tandem / OMSSA)

40
Other Practical Issues

Suitable for commonly available instrumentation
Only the sample prep. is (somewhat) novel.
Need living organism
Stage of life-cycle?
Bang for buck?
N-terminal peptides /
In discussions with JCVI (ex TIGR)
Possible pilot project?

41
Other Research Projects

Improving peptide identification by MS/MS
Spectral matching using HMMs
Combining search engine results
Spectral matching for detection and quantitation
Microorganism identification using MS
Live public web-site and database
(Inexact) uniqueness guarantees
Primer/Probe oligo design
Pathogen detection (DNA Peptide)
Significant false-positive peptide identifications

42
Spectral Matching

Detection vs. identification
Increased sensitivity
No novel peptides
NIST GC/MS Spectral Library
Identifies small molecules,
100,000s of (consensus) spectra
Bundled/Sold with many instruments
Dot-product spectral comparison
Current project Peptide MS/MS

43
Peptide DLATVYVDVLK
44
Peptide DLATVYVDVLK
45
Hidden Markov Models for Spectral Matching

Capture statistical variation and consensus in
peak intensity
Capture semantics of peaks
Extrapolate model to other peptides
Good specificity with superior sensitivity for
peptide detection
Assign 1000s of additional spectra (w/ p-value lt
10-5)

46
www.RMIDb.org
47
www.RMIDb.org

Statistics
16.7 x 106 (6.4 x 106) protein sequences
40,000 organisms, 19,700 species
557 (415) complete genomes
Sources
TIGRs CMR, SwissProt, TrEMBL, Genbank Proteins,
RefSeq Proteins Genomes
Inclusive Glimmer3 predictions on Genomes
Pfam and GO assignments using BOINC grid

48
www.RMIDb.org
Accessed from all over the world...
49
Uniqueness guarantees

20-mer oligo signatures for B. anthracis
In all available strains as exact match
No (inexact) match to other Bacillus species

50
Uniqueness guarantees

Human genome primer design problem
4-unique DNA 20-mers
Edit-distance 5 to any non-specific
hybridization site
No such valid loci on Chr. 22!
Currently analyzing entire genome
3-unique DNA 20-mers
Initial experiments suggest 0.01 valid
Approx. 1 valid oligo every 10,000 bases

51
Future Research Plans

Cancer biomarkers
Optimize proteomics workflow for protein sequence
coverage
Improve informatics infrastructure to make
interpretation easier
Identify splice variants in cancer cell-lines
(MCF-7) and clinical brain tumor samples

52
Future Research Plans

Genome Annotation
Collect evidence for functional alternative
splicing in public datasets into dbPEP.
Conduct pilot project for bacterial genome
annotation with JCVI.
Improve informatics infrastructure to make
interpretation easier.

53
Future Research Plans