Mass Spec Proteomics HUPO-PSI - PowerPoint PPT Presentation

Loading...

PPT – Mass Spec Proteomics HUPO-PSI PowerPoint presentation | free to download - id: 46a52b-NjliM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Mass Spec Proteomics HUPO-PSI

Description:

Mass Spec Proteomics HUPO-PSI & PRIDE Phil Jones (pjones_at_ebi.ac.uk) Proteomics Services Group www.ebi.ac.uk * Webservice -Can do searches programmatically -www ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 61
Provided by: Lennart6
Learn more at: http://www.apo-sys.eu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Mass Spec Proteomics HUPO-PSI


1
Mass Spec ProteomicsHUPO-PSI PRIDE
Phil Jones (pjones_at_ebi.ac.uk) Proteomics Services
Group www.ebi.ac.uk
2
Positioning The Technologies in Question
3
Classic 2D PAGE proteomics
4
New peptide-centric identification (shotgun
strategy)
5
Public Standards for ProteomicsHUPO Proteomics
Standards Initiative
6
The HUPO Proteomics Standards Initiative
  • Mission
  • Develop minimal reporting guidelines
  • Data representation standards (often XML formats)
  • Annotation standards (ontology and controlled
    vocabularies)
  • Involve data produces, hardware vendors, database
    providers, software producers, publishers

http//psidev.info
7
What constitutes a PSI standard?
  • Four documents make up each individual standard
  • Formal requirements specification
  • Minimal reporting requirements gt MIAPE document
  • XML Data exchange format
  • Domain-specific controlled vocabulary

8
MIAPE / MIMIx Guidelines
9
MIAPE MIMIx
  • MIAPE Minimum Information About a Proteomics
    Experiment
  • MIMIx Minimum Information about a Molecular
    Interaction eXperiment
  • Understand, qualify and reproduce
  • Requirements to be enforced by journals,
    repositories, funders
  • Compatibility with the PSI data formats

10
What is a MIAPE / MIMIx document
  • It is
  • A checklist of information and data to provide
    when an experiment is reported (it is a content
    descriptor)
  • An aid to assessing quality control
  • Number of replicates, expected error rate
  • It is not
  • A description of the way to run an experiment
  • A describing of HOW to represent data
  • Use excel to create a table with these five
    following columns
  • A guide to quality judgment

11
XML Data Exchange Formats
12
Available XML Exchange Formats
  • mzData Mass spectrometry data
  • mzML Replacement for mzData (since June 2008)
  • analysisXML Mass spec. search engine output
  • PSI-MI Molecular interactions (PPI)
  • GelML Results of gel electrophoresis experiments
  • GelInfoML Gel image analysis, manipulation and
    quantitation
  • spML GC, LC, centrifugation, capillary
    electrophoresis etc.

13
PSI Mass Spectrometry Data InterchangemzData
-gt mzML
  • mzData 1.05
  • Established 4 years ago
  • All major MS vendors generate mzData
  • All major search engines consume mzData
  • Data repositories accept mzData as input
  • Commercial applications are built on mzData
  • mzML 1.0.0
  • Completed document process on 1 June, 2008
  • Developed as a collaboration between PSI and ISB
  • PSI-MS Working group chaired by Eric Deutsch
    (ISB)
  • Supports
  • Merges best features of PSIs mzData and ISBs
    mzXML
  • We encourage the community to begin implementing
    mzML 1.0.0 and to phase out use of mzData and
    mzXML

14
mzData ? mzML beyond the deliverable
PSI
ISB
mzXML
mzData
PepXML
mzIdent
ProtXML



mzML
analysisXML
15
Details of mzML
16
Details of mzML run
17
Details of mzML cvParam and userParam
18
Details of mzML spectrum
19
Details of mzML chromatogram
20
PSI Mass Spectrometry Data InterchangeanalysisX
ML (Protein / Peptide Identifications)
  • Will become a common format for mass spectrometry
    search engine output
  • Provides support for multi-step analyses
  • Merges previous efforts of HUPO-PSI with ISB

21
Controlled Vocabularies
  • All interchange standards map to external CVs
  • CVs used to keep standards flexible and up to
    date XML frozen for as long as possible
  • CVs assist in keeping curation consistent and
    database searching effective
  • All CVs maintained in OBO format and published on
    the Open Biomedical Ontologies website
    (http//www.obofoundry.org/)

22
Available PSI Controlled Vocabularies
  • PSI-MS Mass spectrometry data
  • MI Molecular Interactions
  • PSI-MOD Protein modifications (PTMs)
  • sepCV Sample processing and separations
    controlled vocabulary
  • PI Proteomics Informatics CV (accompanies
    analysisXML)
  • The four in bold are current and available from
    the OBO Foundry the Ontology Lookup Service
  • http//obofoundry.org/
  • http//www.ebi.ac.uk/ols

23
PRIDE The Proteomics Identifications Database
24
The origin availability versus accessibility
  • Proteomics data is only made available as
    arbitrarily
  • formatted PDF tables, carrying important
    limitations
  • Source data (mass spectra) are not made available
  • No peer review validation possible
  • Very little raw materials for testing innovative
  • in silico techniques are available
  • Automated (re-)processing of the identifications
    is
  • impossible

25
Science Supported by PRIDE
26
Data In PRIDE
Current Statistics
  • 831,764 Protein Identifications
  • 4,947,353 Peptide Identifications (479,014
    unique)
  • 7,409,854 Mass spectra

Large Public Datasets
  • HUPO Plasma Proteome Project
  • HUPO Brain Proteome Project (including mass
    spectra)
  • HUPO Liver Proteome Project (including mass
    spectra)
  • Human Cerebrospinal Fluid (U Washington School of
    Medicine).
  • Cellzome data set

27
Data Ownership Remains with Submitter 84
Public 16 Private
PRIDE Overview
Presentation
Data Submission
Proteome Harvest Excel Data Submission Spreadsheet
Human Curation (Creation of XML in house)
Direct XML Submission Using the PRIDE Core API
WEB
API Persistence
Data Exchange
mzData XML Peak Lists (MS), Instrumentation,
Sample.
PRIDE XML Identifications of Proteins, Peptides,
PTMs
CORE
28
A simplified schema of the PRIDE data store
group-based access control system reviewer
access
29
THE LOOK OF PRIDE
30
PRIDE web interface overview
31
PRIDE web interface experiment and protein
32
PRIDE web interface mass spectra
33
PRIDE web interface project comparison
34
PRIDE BioMart A Leap Forward in Query Capability
35
BioMart (http//www.biomart.org)
A query-oriented data management
system. Developed by the EBI and CSHL Powered
by BioMart software
  • Central Server
  • Ensembl
  • HapMap
  • Dictybase
  • UniProt
  • Reactome
  • Array Express
  • Wormbase
  • Gramene
  • GermOnLine
  • DroSpeGe
  • PRIDE

36
BioMart and PRIDE
  • Perform powerful and fast queries across large,
    complex data sets
  • specify simple or complex filters involving
    multiple attributes of the data
  • specify precisely which attributes or columns
    of data are included in the output
  • specify the format of the output, including
  • HTML table (with links)
  • Excel spreadsheet
  • Tab-delimited file
  • Comma separated format

37
Typical BioMart Usage
Step 1 (Dataset) Choose your dataset Step 2
(Filters) Restrict your query Step 3
(Attributes) Specify what information you want
to include in the output Step 4
(Results) Preview (including a simple count) and
output or download the results in your chosen
format.
38
Typical BioMart Usage
Step 1 (Dataset) Choose your dataset Step 2
(Filters) Restrict your query Step 3
(Attributes) Specify what information you want
to include in the output Step 4
(Results) Preview (including a simple count) and
output or download the results in your chosen
format.
39
PRIDE BioMart Dataset Page
40
Typical BioMart Usage
Step 1 (Dataset) Choose your dataset Step 2
(Filters) Restrict your query Step 3
(Attributes) Specify what information you want
to include in the output Step 4
(Results) Preview (including a simple count) and
output or download the results in your chosen
format.
41
PRIDE BioMart Defining a Complex Filter
42
Typical BioMart Usage
Step 1 (Dataset) Choose your dataset Step 2
(Filters) Restrict your query Step 3
(Attributes) Specify what information you want
to include in the output Step 4
(Results) Preview (including a simple count) and
output or download the results in your chosen
format.
43
PRIDE BioMart Selecting Output Fields
44
Typical BioMart Usage
Step 1 (Dataset) Choose your dataset Step 2
(Filters) Restrict your query Step 3
(Attributes) Specify what information you want
to include in the output Step 4
(Results) Preview (including a simple count) and
output or download the results in your chosen
format.
45
PRIDE BioMart Retrieving Results
46
PRIDE BioMart Output to Microsoft Excel
47
The Ontology Lookup ServiceIntelligent Query
for PRIDE and Beyond
48
Ontologies more than just a list of terms
  • A vocabulary of terms (names for concepts)
  • use stable identifiers for each concept
  • Definitions
  • Authoritative and unambiguous meaning for each
    concept and the context in which it should be
    used.
  • Defined logical relationships between terms
  • More complexity than a simple hierarchy. Child
    terms can be related to more than one parent and
    parent terms can have multiple children.
    Relationships themselves carry a significance.

49
http//www.ebi.ac.uk/ontology-lookup/
What is OLS?
  • A unified, single point of query for over 54
    ontologies (updated daily) and upwards of 530,000
    terms.
  • A tool that offers online and programmatic access
    to query ontologies about
  • Term names
  • Synonyms
  • Relationships
  • Annotations
  • Cross-references
  • Reusable code components to integrate such
    functionality in other projects

50
The Use of Controlled Vocabulariesand Ontologies
in PRIDE
Require controlled vocabularies / ontologies are
used to define the search space
  • Species Newt / NCBI Taxonomy ID
  • Tissue / organ / cell type BRENDA Tissue
    ontology, Cell Type ontology
  • Sub-cellular component GO
  • Disease Human Disease DOID
  • Genotype GO
  • Sample Processing PSI Ontology
  • Mass Spectrometry PSI-MS Ontology
  • Protein Modifications PSI-MOD Ontology
  • Terms that fit nowhere else!? - PRIDE CV

OBO Ontologies
51
Ontology Lookup Service (OLS)
http//www.ebi.ac.uk/ols
52
The Protein Identifier Cross Reference
ServiceSolving the Protein Accession Problem
in PRIDE
53
Why do you need ID mapping
  • Merging datasets to a common identifier space
  • Finding all aliases/synonyms for an identifier
  • (data integration submissions!)
  • Mapping from secondary IDs to more recent primary
    IDs
  • (data freshness)
  • Preparing data sets for specific tools
  • Querying in various primary databases
  • (data format requirements)

54
Protein identifier mapping is hard
  • The basic problem the same protein sequence is
    referred to by multiple accession numbers
    assigned by multiple databases.
  • No universal identifier scheme
  • Redundant databases multiple identifiers for
    the same sequence in the same database
  • Unstable identifiers (ex gi numbers)
  • Obsolete and deleted identifiers (hypothetical
    proteins)
  • Different production cycles for major databases
  • Tools exist, but are limited in important their
    database and species coverage and in their
    usability and availability.

UniParc is a major component
55
PICR Home page
http//www.ebi.ac.uk/tools/picr
Limit search by taxonomy (pessimistic)
Submit accessions OR sequences (FASTA) with 500
entry interactive limit (no batch limit)
Choose to return all mappings or only active ones
Select output format
Select one or many databases to map to in one
request
Run search
56
PICR Result Page simple view
57
PICR Result Page detailed view
58
PICR Result Page XLS view
59
PICR in PRIDE
60
The PRIDE Team
Overseen by Henning Hermjakob
Juan Antonio Vizcaino (Bioinformatician
Database Curator)
Lennart Martens (Started PRIDE when a student
at the EBI, ProDac)
Richard Côté (OLS, PRIDE, Protein A/C Mapping
tool)
Phil Jones (PRIDE, DAS)
About PowerShow.com