Grid-Enablement of Protein Information Resource (PIR) caBIG ICR Face-to-Face Workspace Columbia University, Irving Cancer Research Center January 26-27, 2006 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Grid-Enablement of Protein Information Resource (PIR) caBIG ICR Face-to-Face Workspace Columbia University, Irving Cancer Research Center January 26-27, 2006

Description:

GridEnablement of Protein Information Resource PIR caBIG ICR FacetoFace Workspace Columbia Universit – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 23
Provided by: ArumaniMan8
Category:

less

Transcript and Presenter's Notes

Title: Grid-Enablement of Protein Information Resource (PIR) caBIG ICR Face-to-Face Workspace Columbia University, Irving Cancer Research Center January 26-27, 2006


1
Grid-Enablement of Protein Information Resource
(PIR) caBIG ICR Face-to-Face Workspace
Columbia University, Irving Cancer Research
Center January 26-27, 2006
  • Baris Ethem Suzek
  • Georgetown University
  • Lombardi Cancer Center PIR
  • bes23_at_georgetown.edu

Craig Street University of Pennsylvania
Biomedical Informatics Facility street_at_mail.med.up
enn.edu
2
Outline
  • Introduction PIR/BMIF
  • Data Model
  • Overview of the Grid-Enablement of Protein
    Information Resource
  • Demo/Screenshots API/caGRID Browser
  • Acknowledgements

3
Introduction - PIR
  • Protein Information Resource (PIR) Integrated
    Protein Informatics Resource for
    Genomic/Proteomic Research
  • UniProt Universal Protein Resource Central
    Resource of Protein Sequence and Function
  • PIRSF Family Classification System Protein
    Classification and Functional Annotation
  • iProClass Integrated Protein Knowledgebase Data
    Integration and Functional Analysis

http//pir.georgetown.edu
4
Introduction - PIR
  • UniProt Universal Protein Resource - Central
    Resource of Protein Sequence and Function
  • International Consortium
  • PIR at GUMC
  • European Bioinformatics Institute (EBI)
  • Swiss Institute of Bioinformatics (SIB)
  • Unifies PIR-PSD, Swiss-Prot, TrEMBL Protein
    Sequence Databases

http//www.uniprot.org
5
Introduction - PIR
  • UniProt Databases

Primary data source for Grid-Enablement of PIR
6
Project Overview
  • Grid-Enablement of PIR project is a data service
  • Developer PIR _at_ Georgetown University
  • Adopter BMIF _at_ University of Pennsylvania
  • All the objects in our model exposed to the grid
  • API is developed using caCORE SDK 1.0.3.1
  • All the PIR and UniProt databases are public gt
    no security layers implemented

7
Data Model
  • Protein/Gene related objects

8
Data Model
  • Annotation related objects Protein Features

9
Data Model
  • Taxonomy related objects (Proposed as Taxonomy
    CDE)

10
Demo and/or Screenshots - API (Example 1)
  • Purpose of the script Rudimentary ID Mapper
  • Find Corresponding PIR Database Cross Reference
    ID(s) that match EMBL M15034
  • Test Script
  • public void Demo1_DBXR2DBXR_Reports()
  • try
  • DatabaseCrossReference source new
    DatabaseCrossReferenceImpl()
  • final String id "M15034"
  • source.setCrossReferenceId(id)
  • source.setDataSourceName("EMBL")
  • final String answer "PIR"
  • log.info("Find all PIR Database Cross Reference
    ids that match EMBL id " id ")
  • try
  • String path "edu.georgetown.pir.domain.Databas
    eCrossReference,"
  • "edu.georgetown.pir.domain.Protein"
  • List resultList appService.search(path,
    source)
  • log.info("Size " resultList.size())

11
Demo and/or Screenshots API (Example 1)
  • Expected Response

12
Demo and/or Screenshots - API (Example 2)
  • Purpose of the script Return all Organisms
    Containing your Favorite Protein of Interest
  • Find all Organisms having a Protein Named
    Transferrin receptor protein 1
  • Test Script
  • public void Demo2_TestProtein2Organism()
  • ProteinName source new ProteinNameImpl()
  • final String id Transferrin receptor protein
    1"
  • source.setValue(id)
  • log.info(" Find all Organisms which
    have Protein Name " id " ")
  • log.info("COMMON NAME" "\t\t" "SCIENTIFIC
    NAME")
  • try
  • String path "edu.georgetown.pir.domain.Organi
    sm,"
  • "edu.georgetown.pir.domain.Protein,"
  • List resultList appService.search(path,
    source)
  • for( Iterator it resultList.iterator()
    it.hasNext())
  • Organism organism (OrganismImpl)it.next()
  • log.info(organism.getCommonName()
  • "\t\t"organism.getScientificName())

13
Demo and/or Screenshots API (Example 2)
  • Expected Response

14
Demo and/or Screenshots - API (Example 3)
  • Purpose of the script Need to Identify Protein
    with Known Molecular Weight from Proteomics
    Experiment
  • Find all Proteins with a Molecular Weight of
    26266 Daltons
  • Test Script
  • public void Demo3_ProteinMeolcularWeight()
  • try
  • ProteinSequence object new
    ProteinSequenceImpl()
  • final Integer id new Integer(26266)
  • object.setMolecularWeightInDaltons(id)
  • log.info(" Find all Proteins which
    have a Molecular Weight in Daltons of " id ".
    ")
  • try
  • List resultList appService.search(Protein.cl
    ass, object)
  • for( Iterator it resultList.iterator()
    it.hasNext())
  • Protein protein (ProteinImpl)it.next()
  • List seqs appService.search(ProteinSequence
    .class, protein)
  • for(Iterator iseqs.iterator()
    i.hasNext())
  • ProteinSequence seq (ProteinSequence)i.nex
    t()
  • log.info(protein.getUniprotkbEntryName()"\
    t\t" seq.getMolecularWeightInDaltons())

15
Demo and/or Screenshots API (Example 3)
  • Expected Response

16
Demo and/or Screenshots caGRID Browser
  • Retrieve the proteins for gene BRCA2 (Breast
    Cancer Gene 2)
  • ltcaBIGXMLQuery name"testGene2Protein"gt
  • ltTarget name"edu.georgetown.pir.domain.Protein
    gt
  • ltObjects name"edu.georgetown.pir.domain.Gene"
    gt
  • ltProperty namename predicate"equal
    valueBRCA2"/gt
  • lt/Objectsgt
  • lt/Targetgt
  • lt/caBIGXMLQuerygt

17
Demo and/or Screenshots caGRID Browser
  • Retrieve the proteins for gene BRCA2 (Breast
    Cancer Gene 2) RESPONSE
  • ltgridDataServiceResponse xmlns"http//ogsadai.org
    .uk/namespaces/2003/07/gds/types"gt
  • .
  • ltuniprotkbPrimaryAccessiongtO35923lt/uniprotkbPrim
    aryAccessiongt
  • ltuniprotkbEntryNamegtBRCA2_RATlt/uniprotkbEntryNam
    egt
  • ltuniprotkbPrimaryAccessiongtP51587lt/uniprotkbPrima
    ryAccessiongt
  • ltuniprotkbEntryNamegtBRCA2_HUMANlt/uniprotkbEntryN
    amegt
  • .....
  • ltvaluegtMPIGSKERPTFFEIFKTRCNKADLGPISLNWFEELSSEAPPY
    NSEPAEES
  • EHKNNNYEPNLFKTPQRKPSYNQLASTPIIFKEQGLTLPLYQSPVKELDK
  • ..
  • ltuniprotkbPrimaryAccessiongtP97929lt/uniprotkbPrima
    ryAccessiongt
  • ltuniprotkbEntryNamegtBRCA2_MOUSElt/uniprotkbEntryN
    amegt
  • ..
  • lt/edu.georgetown.pir.domain.impl.ProteinImplgtgtlt
    /resultgt
  • lt/gridDataServiceResponsegt

18
Demo and/or Screenshots caGRID Browser
  • Find all the proteins that contain the domain
    BRCA2 repeat (PFAMPF00634, a domain in Breast
    cancer type 2 susceptibility protein)
  • ltcaBIGXMLQuery name"testPfam2Protein"gt
  • ltTarget nameedu.georgetown.pir.domain.Protein
    gt
  • ltObjects name"edu.georgetown.pir.domain
    .DatabaseCrossReference"gt
  • ltProperty name"crossReferenceId"
    predicate"equal" value"PF00634"/gt
  • lt/Objectsgt
  • lt/Targetgt
  • lt/caBIGXMLQuerygt

19
Demo and/or Screenshots caGRID Browser
  • Find all the proteins that contain the domain
    BRCA2 repeat (PFAMPF00634, a domain in Breast
    cancer type 2 susceptibility protein) RESPONSE
  • ltgridDataServiceResponse xmlns"http//ogsadai.org
    .uk/namespaces/2003/07/gds/types"gt
  • ..
  • ltuniprotkbPrimaryAccessiongtO35923lt/uniprotkbPrim
    aryAccessiongt
  • ltuniprotkbEntryNamegtBRCA2_RATlt/uniprotkbEntryNam
    egt
  • ..
  • ltuniprotkbPrimaryAccessiongtP51587lt/uniprotkbPrim
    aryAccessiongt
  • ltuniprotkbEntryNamegtBRCA2_HUMANlt/uniprotkbEntryN
    amegt
  • .
  • ltuniprotkbPrimaryAccessiongtP70098lt/uniprotkbPrim
    aryAccessiongt
  • ltuniprotkbEntryNamegtQ5TBJ7_HUMANlt/uniprotkbEntry
    Namegt
  • .
  • ltuniprotkbPrimaryAccessiongtQ7RG20lt/uniprotkbPrim
    aryAccessiongt
  • ltuniprotkbEntryNamegtQ7RG20_PLAYOlt/uniprotkbEntry
    Namegt
  • .
  • lt/gridDataServiceResponsegt

20
Demo and/or Screenshots caGRID Browser
  • ID mapping Find all the database
    cross-references from various databases
    corresponding to RefSeq Accession NP_061820
  • ltcaBIGXMLQuery name"testIDMapping"gt
  • ltTarget name"edu.georgetown.pir.domain.DatabaseC
    rossReference pathedu.georgetown.pir.domain.Pro
    teingt
  • ltObjects name"edu.georgetown.pir.domain.Databa
    seCrossReference"gt
  • ltProperty name"dataSourceName"
    predicate"equal" value"RefSeq"/gt
  • ltProperty name"crossReferenceId"
    predicate"equal" value"NP_061820"/gt
  • lt/Objectsgt
  • lt/Targetgt
  • lt/caBIGXMLQuerygt

21
Demo and/or Screenshots caGRID Browser
  • ID mapping Find all the database
    cross-references from various databases
    corresponding to RefSeq Accession NP_061820
  • ltgridDataServiceResponse xmlns"http//ogsadai.org
    .uk/namespaces/2003/07/gds/types"gt
  • .
  • ltdataSourceNamegtEMBLlt/dataSourceNamegt
  • ltcrossReferenceIdgtM22877lt/crossReferenceIdgt
  • .
  • ltdataSourceNamegtPIRlt/dataSourceNamegt
  • ltcrossReferenceIdgtCCHUlt/crossReferenceIdgt
  • .
  • ltdataSourceNamegtGenPeptlt/dataSourceNamegt
  • ltcrossReferenceIdgtAAH09579lt/crossReferenceIdgt
  • .
  • ltdataSourceNamegtNCBI GIlt/dataSourceNamegt
  • ltcrossReferenceIdgt14250124lt/crossReferenceIdgt
  • .
  • ltgridDataServiceResponsegt

22
Acknowledgements
  • 3rd Millenium
  • Juli Klemm
  • Brian Davis
  • BAH
  • Mark Adams
  • Arumani Manisundaram

NCI Center for Bioinformatics Peter
Covitz George Komatsoulis Avinash Shanbhag Tara
Akhavan William Sanchez Manav Kher Jijin
Yan Clarie Wolfe Nicole Thomas Himanso
Sahni Jennifer Zeng Nafis Zebarjani
Georgetown University Cathy Wu (Faculty
Lead) Hongzhan Huang (Chief Architect) Peter
McGarvey (Domain Expert) Baris Suzek (Project
Manager) Sehee Chung (SW Developer) Hsing-Kuo Hua
(DB Developer) Jess Cannata (System Admin) Robert
Clarke Steve Moore Arnie Miles Panther
Informatics Brian Gilman (Consultant)
University of Pennsylvania - BMIF David
Fenstermacher Craig Street Vishal Nayak Casey
Overby
Write a Comment
User Comments (0)
About PowerShow.com