Annotation of the Human Genome by HighThroughput Sequence Analysis of Naturally Occurring Proteins - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

Annotation of the Human Genome by HighThroughput Sequence Analysis of Naturally Occurring Proteins

Description:

Special Professor of Proteomics, School of Biosciences, University of Nottingham ... Four-helix-bundle cytokine without biochemical characterisation ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 15

Provided by: chriss67

Category:

more less

Transcript and Presenter's Notes

Title: Annotation of the Human Genome by HighThroughput Sequence Analysis of Naturally Occurring Proteins

1
Annotation of the Human Genome by High-Throughput
Sequence Analysis of Naturally Occurring Proteins

Christopher Southan PhD
Principle Scientist, Molecular Pharmacology,
AstraZeneca RD, Mölndal, Sweden
Special Professor of Proteomics, School of
Biosciences, University of Nottingham

Presentation Outline
Issues with annotating human protein-coding genes
Outline of the Oxford Genome Anatomy Project
Examples of proteins with new biochemical
information or inferences
Conclusions

2
Lack of a Consensus Human ORFomeDefining the
Need for MS-based Protein verification

Estimates of protein number declining but
diversity increasing
False positives and false negatives in public
data
Undiscovered small proteins (SMORFs)
2,645 Ensembl novel proteins
The International Protein Index indicates churn
and non-overlap
Alternative splice forms underrepresented
Patent protein sequences not included in genome
annotation
Parallel annotation efforts
Pseudogenes difficult to define
Non-coding transcripts
Unvalidated aa-changing SNPs

3
Oxford Genome Anatomy Project (OGAP)

Developed by Oxford Genome Sciences with elements
of Confirmant Human Protein Atlas
MALDI-TOF of peptide pools from large scale 1D-
and 2D gel plugs is followed by selected tandem
MS/MS
Peptides are identified using a maximal virtual
tryptome search space to include unverified
proteins, nsSNPs and splice forms
Database constructed from protein sequences
mapped to Golden Path and optimally fitted to
the MS/MS and MALDI data
Identified proteins are linked, via the MS data,
to gel positions and samples
Details at www.OGAP.co.uk

4
Data Mining, Links and Statistics

1 million MALDI determinations and 300K
MS/MS spectra
13,000 MS/MS supported proteins
Samples of 70 cell types derived from 50 tissues
Over 20 sub-cellular fractions
Merge of three identification pipelines (ICAT, 2D
and 1D)
Peptide assignments supported by multiple MS
scoring methods
Data anchored on Ensembl Golden Path

5
Verifying and Correcting Gene Predictions
Transmembrane Anchor Protein (TMAP)
Fractions Cells / tissues
Exons MS data

Tandem MS/MS peptide sequences confirmed partial
but incorrect gene predictions from NCBI
XP_301613 and ENSP00000332787 on chromosome 17
1D-gel samples from endothelial and neuronal stem
cell membrane fractions
Two full-length splice forms assembled from EST
and MS data
ESTs suggest broad tissue distribution
Sequence confirmed as partial cDNA HBV
PreS1-transactivated protein 1
No paralogues but many orthologues, no detectable
structural homology

6
TMAP Bioinformatic Analysis of An Unknown
Orphan Protein
Pseudo signal peptide prediction
TM prediction indicates membrane anchor
Helical sections from 2D structure prediction
7
Napsin A Location in urine

Observed 2D-gel Mr/pI consistent with removal of
signal and activation peptide
Observation supported by two independent
publications using antibodies
- Detection of immunoreactive napsin A in
human urine. Schauer-Vukasinovic et al. BBA.
2001, 1524(1)51-6.
- Immunohistochemical localization of napsin
and its potential role in protein catabolism in
renal proximal tubules. Mori et al. Arch Histol
Cytol. 2002, 65(4)359-68.

8
Cytokine FAMC3 Inference of Phosphorylation

Four-helix-bundle cytokine without biochemical
characterisation
1D-gel implies the protein is membrane associated
(signal pep retention ?)
2D-gel migration supports signal removal and
transition to the CSF
2D pI chain suggests 3 positions of
phosphorylation

9
Acyl Peptide Hydrolase New Location

Erthrocyte specific degradation of oxidatively
damaged proteins
Novel observations in CSF, brain tissues and
cancer cells
2D-gel observed pI/Mr 5.2/80 kd
Predicted pI/Mr 5.29/81.2
Independent electrospray MS result Mr 81.2 kd

10
LOC146556 Verification of an Unknown

Initially a high-throughput unannotated cDNA
Subsequently submitted from the Genentech
Secreted Protein Discovery Initiative
No signal peptide but 60 aa N-terminal anchor
Three splice variants in UniProt
Novel observation of a CNS protein with Mw/pI
suggestive of proteolytic clipping

11
Calsyntenin-1 Partitioning into CSF

Calsyntenin-1 binds synaptic Ca2 thereby
modulating Ca2-mediated postsynaptic signals
May be subject to regulation by extracellular
proteolysis in the synaptic cleft
Multiple observations in CSF, probably after
proteolysis
2D Spot spread suggestive of glycosylation

12
Limbic-system-associated Membrane Protein

LAMP 64-68-kDa neuronal surface glycoprotein
from limbic system
Predicted GPI anchor and eight putative
N-glycosylations
Observed Mr and acidic shift on 2D-gel infers
loss of the GPI anchor by transition to CSF and
possible glycosylation

13
Conclusions

Annotation of the human genome by the OGAP
high-throughput MS protein verification pipeline
can resolve some of the uncertainties of in slico
annotation and contribute to closure of the basal
proteome
The expanding range of clinical samples, tissues,
cell lines, sub-cellular fractions, data
integration and mining tools produces a unique,
high value database
The combination of psedogel interpretation,
sample knowlege and bioinformatic analysis can
reveal novel details and inferences
Many of these inferences are testable and can
generate information that compliments public data
for biochemicaly characterised and unknown
proteins

14
References and Acknowledgments

Rohlff (2004) New approaches towards integrated
proteomics databases and depositories. Expert
Reviews in Proteomics, 1 (3), 267 274 , review.
McGowan, S et al. (84 authors) (2004) Annotation
of the Human Genome by High-Throughput Sequence
Analysis of Naturally Occurring Proteins,
Current Proteomics, 1, 41-48
Southan C. Has the Yo-yo stopped? a human gene
number update (2004) Proteomics (6)1712-26,
review, (also see poster)
Rohlff Southan (2002) Proteomic approaches to
central nervous system disorders, Curr Opin Mol
Ther. 2002 4(3)251-8, review.
http//oxfordgenomesciences.com/

Disclaimer While the speaker is
pleased to present material from participation in
this work before joining AstraZeneca this does
not constitute an endorsment of the Oxford Genome
Sciences Oxford Genome Anatomy Project by
AstraZeneca

Write a Comment

User Comments (0)