Title: Annotation of the Human Genome by HighThroughput Sequence Analysis of Naturally Occurring Proteins
1Annotation of the Human Genome by High-Throughput
Sequence Analysis of Naturally Occurring Proteins
- Christopher Southan PhD
- Principle Scientist, Molecular Pharmacology,
AstraZeneca RD, Mölndal, Sweden - Special Professor of Proteomics, School of
Biosciences, University of Nottingham
- Presentation Outline
- Issues with annotating human protein-coding genes
- Outline of the Oxford Genome Anatomy Project
- Examples of proteins with new biochemical
information or inferences - Conclusions
2Lack of a Consensus Human ORFomeDefining the
Need for MS-based Protein verification
- Estimates of protein number declining but
diversity increasing - False positives and false negatives in public
data - Undiscovered small proteins (SMORFs)
- 2,645 Ensembl novel proteins
- The International Protein Index indicates churn
and non-overlap - Alternative splice forms underrepresented
- Patent protein sequences not included in genome
annotation - Parallel annotation efforts
- Pseudogenes difficult to define
- Non-coding transcripts
- Unvalidated aa-changing SNPs
3Oxford Genome Anatomy Project (OGAP)
- Developed by Oxford Genome Sciences with elements
of Confirmant Human Protein Atlas - MALDI-TOF of peptide pools from large scale 1D-
and 2D gel plugs is followed by selected tandem
MS/MS - Peptides are identified using a maximal virtual
tryptome search space to include unverified
proteins, nsSNPs and splice forms - Database constructed from protein sequences
mapped to Golden Path and optimally fitted to
the MS/MS and MALDI data - Identified proteins are linked, via the MS data,
to gel positions and samples - Details at www.OGAP.co.uk
4Data Mining, Links and Statistics
- 1 million MALDI determinations and 300K
MS/MS spectra - 13,000 MS/MS supported proteins
- Samples of 70 cell types derived from 50 tissues
- Over 20 sub-cellular fractions
- Merge of three identification pipelines (ICAT, 2D
and 1D) - Peptide assignments supported by multiple MS
scoring methods - Data anchored on Ensembl Golden Path
5Verifying and Correcting Gene Predictions
Transmembrane Anchor Protein (TMAP)
Fractions Cells / tissues
Exons MS data
- Tandem MS/MS peptide sequences confirmed partial
but incorrect gene predictions from NCBI
XP_301613 and ENSP00000332787 on chromosome 17 - 1D-gel samples from endothelial and neuronal stem
cell membrane fractions - Two full-length splice forms assembled from EST
and MS data - ESTs suggest broad tissue distribution
- Sequence confirmed as partial cDNA HBV
PreS1-transactivated protein 1 - No paralogues but many orthologues, no detectable
structural homology
6TMAP Bioinformatic Analysis of An Unknown
Orphan Protein
Pseudo signal peptide prediction
TM prediction indicates membrane anchor
Helical sections from 2D structure prediction
7Napsin A Location in urine
- Observed 2D-gel Mr/pI consistent with removal of
signal and activation peptide - Observation supported by two independent
publications using antibodies - - Detection of immunoreactive napsin A in
human urine. Schauer-Vukasinovic et al. BBA.
2001, 1524(1)51-6. - - Immunohistochemical localization of napsin
and its potential role in protein catabolism in
renal proximal tubules. Mori et al. Arch Histol
Cytol. 2002, 65(4)359-68.
8Cytokine FAMC3 Inference of Phosphorylation
- Four-helix-bundle cytokine without biochemical
characterisation - 1D-gel implies the protein is membrane associated
(signal pep retention ?) - 2D-gel migration supports signal removal and
transition to the CSF - 2D pI chain suggests 3 positions of
phosphorylation
9Acyl Peptide Hydrolase New Location
- Erthrocyte specific degradation of oxidatively
damaged proteins - Novel observations in CSF, brain tissues and
cancer cells - 2D-gel observed pI/Mr 5.2/80 kd
- Predicted pI/Mr 5.29/81.2
- Independent electrospray MS result Mr 81.2 kd
10LOC146556 Verification of an Unknown
- Initially a high-throughput unannotated cDNA
- Subsequently submitted from the Genentech
Secreted Protein Discovery Initiative - No signal peptide but 60 aa N-terminal anchor
- Three splice variants in UniProt
- Novel observation of a CNS protein with Mw/pI
suggestive of proteolytic clipping
11Calsyntenin-1 Partitioning into CSF
- Calsyntenin-1 binds synaptic Ca2 thereby
modulating Ca2-mediated postsynaptic signals - May be subject to regulation by extracellular
proteolysis in the synaptic cleft - Multiple observations in CSF, probably after
proteolysis - 2D Spot spread suggestive of glycosylation
12Limbic-system-associated Membrane Protein
- LAMP 64-68-kDa neuronal surface glycoprotein
from limbic system - Predicted GPI anchor and eight putative
N-glycosylations - Observed Mr and acidic shift on 2D-gel infers
loss of the GPI anchor by transition to CSF and
possible glycosylation
13Conclusions
- Annotation of the human genome by the OGAP
high-throughput MS protein verification pipeline
can resolve some of the uncertainties of in slico
annotation and contribute to closure of the basal
proteome - The expanding range of clinical samples, tissues,
cell lines, sub-cellular fractions, data
integration and mining tools produces a unique,
high value database - The combination of psedogel interpretation,
sample knowlege and bioinformatic analysis can
reveal novel details and inferences - Many of these inferences are testable and can
generate information that compliments public data
for biochemicaly characterised and unknown
proteins
14References and Acknowledgments
- Rohlff (2004) New approaches towards integrated
proteomics databases and depositories. Expert
Reviews in Proteomics, 1 (3), 267 274 , review. - McGowan, S et al. (84 authors) (2004) Annotation
of the Human Genome by High-Throughput Sequence
Analysis of Naturally Occurring Proteins,
Current Proteomics, 1, 41-48 - Southan C. Has the Yo-yo stopped? a human gene
number update (2004) Proteomics (6)1712-26,
review, (also see poster) - Rohlff Southan (2002) Proteomic approaches to
central nervous system disorders, Curr Opin Mol
Ther. 2002 4(3)251-8, review. - http//oxfordgenomesciences.com/
Disclaimer While the speaker is
pleased to present material from participation in
this work before joining AstraZeneca this does
not constitute an endorsment of the Oxford Genome
Sciences Oxford Genome Anatomy Project by
AstraZeneca