Title: Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms
1Proteomics and Glycoproteomics(Bio-)Informatics
of Protein Isoforms
- Nathan Edwards
- Department of Biochemistry and Molecular
Cellular Biology - Georgetown University Medical Center
2Outline
- Tandem mass-spectrometry of peptides
- Detection of alternative splicing protein
isoforms - Phyloproteomics using top-down mass-spec.
- Characterization of glycoprotein
microheterogeneity by mass-spectrometry
3Mass Spectrometer
- Time-Of-Flight (TOF)
- Quadrapole
- Ion-Trap
- MALDI
- Electro-SprayIonization (ESI)
4Mass Spectrum
5Mass is fundamental
6Sample Preparation for MS/MS
7Single Stage MS
MS
8Tandem Mass Spectrometry(MS/MS)
Precursor selection
9Tandem Mass Spectrometry(MS/MS)
Precursor selection collision induced
dissociation (CID)
MS/MS
10Why Tandem Mass Spectrometry?
- MS/MS spectra provide evidence for the amino-acid
sequence of functional proteins. - Key concepts
- Spectrum acquisition is unbiased
- Direct observation of amino-acid sequence
- Sensitive to small sequence variations
11Unannotated Splice Isoform
- Human Jurkat leukemia cell-line
- Lipid-raft extraction protocol, targeting T cells
- von Haller, et al. MCP 2003.
- LIME1 gene
- LCK interacting transmembrane adaptor 1
- LCK gene
- Leukocyte-specific protein tyrosine kinase
- Proto-oncogene
- Chromosomal aberration involving LCK in
leukemias. - Multiple significant peptide identifications
12Unannotated Splice Isoform
13Unannotated Splice Isoform
14Splice Isoform Anomaly
- Human erythroleukemia K562 cell-line
- Depth of coverage study
- Resing et al. Anal. Chem. 2004.
- Peptide Atlas A8_IP
- SALT1A2 gene
- Sulfotransferase family, cytosolic, 1A
- 2 ESTs, 1 mRNA
- mRNA from lung, small cell-cancinoma sample
- Single (significant) peptide identification
- Five agreeing search engines
- PepArML FDR lt 1.
- All source engines have non-significant E-values
15Splice Isoform Anomaly
16Splice Isoform Anomaly
17Translation start-site correction
- Halobacterium sp. NRC-1
- Extreme halophilic Archaeon, insoluble membrane
and soluble cytoplasmic proteins - Goo, et al. MCP 2003.
- GdhA1 gene
- Glutamate dehydrogenase A1
- Multiple significant peptide identifications
- Observed start is consistent with Glimmer 3.0
prediction(s)
18Halobacterium sp. NRC-1ORF GdhA1
- K-score E-value vs PepArML _at_ 10 FDR
- Many peptides inconsistent with annotated
translation start site of NP_279651
19Translation start-site correction
20What if there is no "smoking gun" peptide
21What if there is no "smoking gun" peptide
22What if there is no "smoking gun" peptide
23HER2/Neu Mouse Model of Breast Cancer
- Paulovich, et al. JPR, 2007
- Study of normal and tumor mammary tissue by
LC-MS/MS - 1.4 million MS/MS spectra
- Peptide-spectrum assignments
- Normal samples (Nn) 161,286 (49.7)
- Tumor samples (Nt) 163,068 (50.3)
- 4270 proteins identified in total
- 2-unique generalized protein parsimony
24Nascent polypeptide-associated complex subunit
alpha
7.3 x 10-8
25Pyruvate kinase isozymes M1/M2
2.5 x 10-5
26Phyloproteomics
- Fragment intact proteins (top-down MS)
- Match the spectra to protein sequences
- Place the organism phylogenetically
- Works even for unknown microorganisms without any
available sequences
27CID Protein Fragmentation Spectrum from Y. rohdei
28CID Protein Fragmentation Spectrum from Y. rohdei
Match to Y. pestis 50S Ribosomal Protein L32
29Exact match sequence
30Phylogeny Protein vs DNA
Protein Sequence
16S-rRNA Sequence
31What about mixtures?
32Shared Small Ribosomal Proteins
33Shared Small Ribosomal Proteins
34Identified E. herbicola proteins
- DNA-binding protein HU-alpha
- m/z 732.71, z 13, E-value 7.5e-26, ? -14.128
- Eight proteins identified with "large" ?
35Identified E. herbicola proteins
- DNA-binding protein HU-alpha
- m/z 732.71, z 13, E-value 1.91e-58
- Use "Sequence Gazer" to find mass shift
- ?M mode can "tolerate" one shift for free!
36Identified E. herbicola proteins
- DNA-binding protein HU-alpha
- m/z 732.71, z 13, E-value 7.5e-26, ? -14.128
- Extract N- and C-terminus sequence supported by
at least 3 b- or y-ions
37E. herbicola protein sequences
38E. herbicola sequences found in other species
39Phylogenetic placement of E. herbicola
Cladogram
Phylogram
phylogeny.fr "One-Click"
40Glycoprotein Microheterogeneity
- Glycosylation is important, but our analytic
tools are rather rudimentary - Detach glycans (PNGase-F) and analyze glycans
- Detach glycans (PNGase-F) and analyze peptides
- Get glycan structures, but no association with
protein or protein site, or - Get glycosylation sites, but no association with
glycan structures. - We analyze glycopeptides directly
- Challenges all facets of glycoproteomics
41Altered N-Glycosylation in Cancer
Glycosyltransferase Expression or Glycan Analyses
GalNAc Sialic Acid Gal
GlcNAc Man
K. Chandler
42The informatics challenge
- Identify glycopeptides in large-scale tandem
mass-spectrometry datasets - Many glycopeptide enriched fractions
- Many tandem mass-spectra / fraction
- Good, but not great, instrumentation
- QStar Elite CID, good MS1/MS2 resolution
- Strive for hypothesis-generating analysis
- Site-specific glycopeptide characterization
- Glycoform occupancy in differentiated samples
43CID Glycopeptide Spectrum
44Observations
- Oxonium ions (204, 366) help distinguish
glycopeptides from peptides - but do little to identify the glycopeptide
- Few peptide b/y-ions to identify peptides
- but intact peptide fragments are common
- If the peptide can be guessed, then
- the glycan's mass can be determined
45Haptoglobin Standard
- N-glycosylation motif (NX/ST)
- Site of GluC cleavage
Pompach et al. Journal of Proteome Research 11.3
(2012) 17281740.
46Tuning the filters
- Oxonium ions
- Number intensity
- Match tolerance
- "Intact-peptide" fragments
- Number intensity
- Match tolerance
- Glycan composition
- ICScore
- Constrain search space
- Match tolerance
- Glycan database
- Constrain search space
- Match tolerance
- Precursor ion
- Non-monoisotopic selection
- Sodium adducts
- Charge state
- Peptide search space
- Semi-specific peptides
- Non-specific peptides
- Peptide MW range
- Variable modifications
47Tuning the filters
- We estimate the number of false-positivesso
that the user can tune the search parameters
48Application of Exoglycosidasesto locate Fucose
K. Chandler
49NVVFVIDK ITIH4 Glycopeptide
K. Chandler
50Similar Glycopeptides Spectra( mass ? 162 Da)
?
162 Da
51Fragmented Glycopeptides( mass ? 162 Da)
?
162 Da
52Propagating Annotations
G. Berry
53Summary
- Mass-spectrometry coupled with protein chemistry
and good informatics can look beyond the obvious
to the unexpected... - and there is plenty to find!
54Acknowledgements
- Edwards lab
- Kevin Chandler
- Gwenn Berry
- Fenselau lab (UMD)
- Colin Wynne
- Avantika Dhabaria
- Goldman lab (GU)
- Kevin Chandler
- Petr Pompach
- NSF Graduate Fellowship (Chandler)
- Funding NCI