Annotation of the Human Genome by HighThroughput Sequence Analysis of Naturally Occurring Proteins - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Annotation of the Human Genome by HighThroughput Sequence Analysis of Naturally Occurring Proteins

Description:

Special Professor of Proteomics, School of Biosciences, University of Nottingham ... Four-helix-bundle cytokine without biochemical characterisation ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 15
Provided by: chriss67
Category:

less

Transcript and Presenter's Notes

Title: Annotation of the Human Genome by HighThroughput Sequence Analysis of Naturally Occurring Proteins


1
Annotation of the Human Genome by High-Throughput
Sequence Analysis of Naturally Occurring Proteins
  • Christopher Southan PhD
  • Principle Scientist, Molecular Pharmacology,
    AstraZeneca RD, Mölndal, Sweden
  • Special Professor of Proteomics, School of
    Biosciences, University of Nottingham
  • Presentation Outline
  • Issues with annotating human protein-coding genes
  • Outline of the Oxford Genome Anatomy Project
  • Examples of proteins with new biochemical
    information or inferences
  • Conclusions

2
Lack of a Consensus Human ORFomeDefining the
Need for MS-based Protein verification
  • Estimates of protein number declining but
    diversity increasing
  • False positives and false negatives in public
    data
  • Undiscovered small proteins (SMORFs)
  • 2,645 Ensembl novel proteins
  • The International Protein Index indicates churn
    and non-overlap
  • Alternative splice forms underrepresented
  • Patent protein sequences not included in genome
    annotation
  • Parallel annotation efforts
  • Pseudogenes difficult to define
  • Non-coding transcripts
  • Unvalidated aa-changing SNPs

3
Oxford Genome Anatomy Project (OGAP)
  • Developed by Oxford Genome Sciences with elements
    of Confirmant Human Protein Atlas
  • MALDI-TOF of peptide pools from large scale 1D-
    and 2D gel plugs is followed by selected tandem
    MS/MS
  • Peptides are identified using a maximal virtual
    tryptome search space to include unverified
    proteins, nsSNPs and splice forms
  • Database constructed from protein sequences
    mapped to Golden Path and optimally fitted to
    the MS/MS and MALDI data
  • Identified proteins are linked, via the MS data,
    to gel positions and samples
  • Details at www.OGAP.co.uk

4
Data Mining, Links and Statistics
  • 1 million MALDI determinations and 300K
    MS/MS spectra
  • 13,000 MS/MS supported proteins
  • Samples of 70 cell types derived from 50 tissues
  • Over 20 sub-cellular fractions
  • Merge of three identification pipelines (ICAT, 2D
    and 1D)
  • Peptide assignments supported by multiple MS
    scoring methods
  • Data anchored on Ensembl Golden Path

5
Verifying and Correcting Gene Predictions
Transmembrane Anchor Protein (TMAP)
Fractions Cells / tissues
Exons MS data
  • Tandem MS/MS peptide sequences confirmed partial
    but incorrect gene predictions from NCBI
    XP_301613 and ENSP00000332787 on chromosome 17
  • 1D-gel samples from endothelial and neuronal stem
    cell membrane fractions
  • Two full-length splice forms assembled from EST
    and MS data
  • ESTs suggest broad tissue distribution
  • Sequence confirmed as partial cDNA HBV
    PreS1-transactivated protein 1
  • No paralogues but many orthologues, no detectable
    structural homology

6
TMAP Bioinformatic Analysis of An Unknown
Orphan Protein
Pseudo signal peptide prediction
TM prediction indicates membrane anchor
Helical sections from 2D structure prediction
7
Napsin A Location in urine
  • Observed 2D-gel Mr/pI consistent with removal of
    signal and activation peptide
  • Observation supported by two independent
    publications using antibodies
  • - Detection of immunoreactive napsin A in
    human urine. Schauer-Vukasinovic et al. BBA.
    2001, 1524(1)51-6.
  • - Immunohistochemical localization of napsin
    and its potential role in protein catabolism in
    renal proximal tubules. Mori et al. Arch Histol
    Cytol. 2002, 65(4)359-68.

8
Cytokine FAMC3 Inference of Phosphorylation
  • Four-helix-bundle cytokine without biochemical
    characterisation
  • 1D-gel implies the protein is membrane associated
    (signal pep retention ?)
  • 2D-gel migration supports signal removal and
    transition to the CSF
  • 2D pI chain suggests 3 positions of
    phosphorylation

9
Acyl Peptide Hydrolase New Location
  • Erthrocyte specific degradation of oxidatively
    damaged proteins
  • Novel observations in CSF, brain tissues and
    cancer cells
  • 2D-gel observed pI/Mr 5.2/80 kd
  • Predicted pI/Mr 5.29/81.2
  • Independent electrospray MS result Mr 81.2 kd

10
LOC146556 Verification of an Unknown
  • Initially a high-throughput unannotated cDNA
  • Subsequently submitted from the Genentech
    Secreted Protein Discovery Initiative
  • No signal peptide but 60 aa N-terminal anchor
  • Three splice variants in UniProt
  • Novel observation of a CNS protein with Mw/pI
    suggestive of proteolytic clipping

11
Calsyntenin-1 Partitioning into CSF
  • Calsyntenin-1 binds synaptic Ca2 thereby
    modulating Ca2-mediated postsynaptic signals
  • May be subject to regulation by extracellular
    proteolysis in the synaptic cleft
  • Multiple observations in CSF, probably after
    proteolysis
  • 2D Spot spread suggestive of glycosylation

12
Limbic-system-associated Membrane Protein
  • LAMP 64-68-kDa neuronal surface glycoprotein
    from limbic system
  • Predicted GPI anchor and eight putative
    N-glycosylations
  • Observed Mr and acidic shift on 2D-gel infers
    loss of the GPI anchor by transition to CSF and
    possible glycosylation

13
Conclusions
  • Annotation of the human genome by the OGAP
    high-throughput MS protein verification pipeline
    can resolve some of the uncertainties of in slico
    annotation and contribute to closure of the basal
    proteome
  • The expanding range of clinical samples, tissues,
    cell lines, sub-cellular fractions, data
    integration and mining tools produces a unique,
    high value database
  • The combination of psedogel interpretation,
    sample knowlege and bioinformatic analysis can
    reveal novel details and inferences
  • Many of these inferences are testable and can
    generate information that compliments public data
    for biochemicaly characterised and unknown
    proteins

14
References and Acknowledgments
  • Rohlff (2004) New approaches towards integrated
    proteomics databases and depositories. Expert
    Reviews in Proteomics, 1 (3), 267 274 , review.
  • McGowan, S et al. (84 authors) (2004) Annotation
    of the Human Genome by High-Throughput Sequence
    Analysis of Naturally Occurring Proteins,
    Current Proteomics, 1, 41-48
  • Southan C. Has the Yo-yo stopped? a human gene
    number update (2004) Proteomics (6)1712-26,
    review, (also see poster)
  • Rohlff Southan (2002) Proteomic approaches to
    central nervous system disorders, Curr Opin Mol
    Ther. 2002 4(3)251-8, review.
  • http//oxfordgenomesciences.com/

Disclaimer While the speaker is
pleased to present material from participation in
this work before joining AstraZeneca this does
not constitute an endorsment of the Oxford Genome
Sciences Oxford Genome Anatomy Project by
AstraZeneca
Write a Comment
User Comments (0)
About PowerShow.com