Introduction of a Grid Approach in the Biotechnology Industry - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Introduction of a Grid Approach in the Biotechnology Industry

Description:

... a Grid Approach in the Biotechnology Industry. Grid Day ... Current situation in Biotechnology. Genetic sequencing databases. More than 10,000,000 in Genbank ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 17
Provided by: dimi2
Category:

less

Transcript and Presenter's Notes

Title: Introduction of a Grid Approach in the Biotechnology Industry


1
Introduction of a Grid Approach in the
Biotechnology Industry
  • Grid Day
  • University of Cyprus, NicosiaMarch 26, 2003

2
BioGrid team Laboratory of software engineering
internet technologies University of Cyprus
  • Head of bioGrid team
  • Prof. George Papadopoulos george_at_cs.ucy.ac.cy
  • Research group
  • Aristos Stavrou cs98sa2_at_cs.ucy.ac.cy
  • Dr. Dimitrios Vogiatzis dimitrv_at_cs.ucy.ac.cy

3
BioGrid
  • 2 year trial IST project
  • Sep.2002-Sep.2004
  • Project Leader ZooRobotics
  • Site www.bio-grid.net

4
Scientific Objectives
  • BioGrid is a trial IST project with the following
    objectives
  • Development and Integration of grid technologies
    so that
  • Researchers obtain an efficient information
    output
  • Three tools to be integrated
  • PSIMAP, protein interaction discovery
    visualisation
  • Space Explorer, gene protein visualisation
  • Classification Server, text data mining
  • Tools access Information resources
  • Databases (protein, gene expression).
  • Unstructured data (pubmed abstracts)
  • Software tools (TOPS, for protein structural
    comparison)

5
Business Objectives
  • Information Grid for large proteomics and
    genomics databases
  • Efficient transnational enterprise collaboration
  • Faster time to market biotech innovations
  • Software license model for bioGrid
  • Targeted customers
  • Pharmaceutical Aventis, GSK, Novartis, AkzoNobel
  • SMEs KeyGene, Inpharmatica, LionBioScience,
    Avantium

6
Data sources and formats I
  • Biological objects are complex not standard
    format (XML, ASN.1, proprietary)
  • Literature unstructured, text data mining is
    necessary
  • Protein Structure
  • Protein Data Bank (PDB)
  • Structural Classification of Proteins (SCOP,
    CATH)
  • Gene expression databases
  • Stanford Microarray data (SMA)
  • Gene Expression Ominbus
  • Biomolecular interaction database (ASN.1 format)
  • PubMed (Biology oriented journal abstracts)
  • TOPS (protein comparison tool)

7
Data sources and formats II
LITERATURE DATABASE FORMAT XML BASED
(PUBMED) ltPubmedArticlegt ltPMIDgt12649791lt/PMIDgt ltYe
argt2003lt/Yeargt ltJournalgt ltISSNgt0941-3790lt/ISSNgt
lt/Journalgt ltArticleTitlegtNutritional ecology
chances . services to shape procedures lt/Article
Titlegt ltAbstractgt ltAbstractTextgtNutrition
ecology is the science that studies the impacts
of human nutrition on nutrition ecology in
their projects. lt/AbstractTextgt lt/Abstractgt
lt/PubmedArticlegt
PROTEIN DATABASE FORMAT (PDB) Columns denote
attribute e.g. near the end of file, cols 31-54
denote atomic coordinates (3d vector)
GENE EXPRESSION FORMAT (SMD)
8
Tool 1 Space Explorer
  • high-dimensional data mapped into a 1, 2 or 3
    dimensional subspace
  • interactive web-enabled, virtual reality
    environment
  • 3D visualisation complemented by hierarchical
    clustering
  • subsequent visualisations as dendrograms, which
    are linked to the scatter plots
  • Space Explorer facilitates visual data mining.

9
Tool 2 PSIMAP
  • PSIMAP is the first complete protein structural
    domain interaction map
  • shows, what kinds of protein domains are found to
    be interacting structurally.
  • PSIMAP has specific shapes reflecting the types
    of protein domains their interaction partners

10
Tool 3 Classification server
  • custom hierarchy from a sample of documents,
    producing a global, consistent view
  • automatic classification of textual items into a
    hierarchy of topics
  • Classifications are output in XML for flexible,
    standard data interchange
  • APIs for easy integration with existing
    applications and new services

11
Current situation in Biotechnology
  • Genetic sequencing databases
  • More than 10,000,000 in Genbank
  • Protein databasesgt1,000,000 (PIR)
  • Tools gt500 on-line
  • PubMed 11,000,000 abstracts on-line
  • There is not a universal access to data objects
  • Not possible to pass automatically
  • from gene expression ?protein ?protein families
    ?visualisation

12
BioGrid project overview
13
Requirements for Grid I
scenario genes ? expression ? proteins ?
relevant literature
  • Unified view of the data objects (XML) ? unified
    interfaces for users and applications
  • Co-operatation of the (gene expression, protein
    interaction connection to literature)
  • Passing data objects to the 3-tools

Lower level functionality
  • Addition of new resources transparent
  • Continuous operation when adding data sources or
    in the event of component failure
  • Functionality on a different platforms (unix
    windows)
  • Possibility of local caching and net traffic
    regulation (to be determined in the evaluation
    phase)

14
Requirements for Grid II
Requirements lead to the following Grid types
  • Information grid accessibility to sources of
    information and tools for analysis
    visualisation
  • Knowledge grid data mining machine learning
    for filtering literature abstracts.

15
Schematic design integrating 3 platforms data
sources
PSIMAP
Space Explorer
Classification Server
XML
GRID
Text Mining
16
bioGrid part of health Grids
  • Health Grid Projects
  • MammoGrid (databases of mammograms)
  • GEMSS (Grid Enabled Medical Simulation Services.
    Access to advanced simulation and image
    processing services (www.gemss.de)
  • BioMody (discovery and distribution of bio-data
    to web)
  • Tambis single user interface to bio-data
  • SRS access to unstructured data
  • Grid technologies under investigation
  • Globus
  • Legion by Avaki
  • SDSC Storage Resource Broker
Write a Comment
User Comments (0)
About PowerShow.com