Dimitris Dimitropoulos MSD Search Database - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Dimitris Dimitropoulos MSD Search Database

Description:

Originates from the internal MSD deposition database that ensures accuracy and data integrity ... information available in the MSD database is organised in ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 17
Provided by: same169
Category:

less

Transcript and Presenter's Notes

Title: Dimitris Dimitropoulos MSD Search Database


1
Dimitris DimitropoulosMSD Search Database
Database Replication
2
What is the MSDSD database
  • A relational database primarily developed in
    Oracle that stores the cleaned up PDB together
    with reference and derived information
  • Simple to understand for the novice average
    biologist and fast in performance for the
    database non-expert
  • Originates from the internal MSD deposition
    database that ensures accuracy and data integrity
  • In MSDSD naming and other summary information is
    repeated from every level of the hierarchy to the
    next one in order to be closer to the familiar
    PDB data

3
Main MSDSD features
  • The symmetry has been expanded and the
    information of the quaternary biological
    assemblies is directly available
  • External Information like binding sites and
    secondary structure has been derived on the
    assembly level
  • The original PDB asymmetric unit is also
    available
  • Includes and provides clear database relations
    with the ligands data mart and other reference
    information
  • Includes information and cross-references to
    external databases (NCBI taxonomy, UniProt, SCOP
    etc)

4
MSDSD Conventions
  • Model nomenclature focuses in simplicity rather
    than term accuracy
  • Different sets of identifiers for different
    purposes
  • Abstract database identifiers (ie CHAIN_ID)
  • Original PDB identifiers (ie PDB_CHAIN_CODE)
  • Cleaned up official identifiers (ie CHAIN_CODE)
  • Several other model conventions to avoid user
    confusion and improve model clarity

5
Managing Data
  • The information available in the MSD database is
    organised in application areas (data marts)
  • Users may replicate only the data marts that
    they are interested in
  • Some data marts are quite valuable and still
    small enough to be used on desktop systems as in
    the demonstration
  • The data marts are also loosely interrelated and
    can be synchronised independently

6
DATA MARTS
  • Structure Data
  • Descriptions
  • Secondary Structure
  • Taxonomy
  • Ligands
  • Experimental details
  • Citations
  • Mapping to SWISS-PROT, SCOP, CATH, PFAM
  • Active Sites
  • Structural-Sequence alignment

7
Building The Search Database
Deposition
Deposition Copy
Pre-Search-Database
Replication
Transformation
Post Transformation
Search-Database
distribution
Replication
8
Database Documentation
9
Database Documentation
http//www.ebi.ac.uk/msd-srv/docs/dbdoc/
10
(No Transcript)
11
Database Replication
  • EBI has by definition the role to concentrate and
    distribute biological information
  • Enables distribution and sharing of the enhanced
    clean PDB in the powerful environment of a
    relational database
  • Research avoids the overhead of data management,
    by adapting a consistent and ready to use
    database
  • Updates are applied in a Semi-Automated Fashion

12
Why Replicate?
  • To take advantage of local hardware and CPU time
    some operations are simply not possible on-line
  • To avoid continuous dependency on network and
    EBI resources
  • To extend or merge information with other
    databases or data sources
  • To utilise the information in new innovative
    ways
  • To ensure confidentiality of research

13
Replication Overview
SOURCE DATABASE
MSD SEARCH DATABASE
Database Copy Oracle Export SQL Loader Files
Incremental Replication
14
Components of the Replication
  • Database copy on Sun Solaris
  • Schema export-import plus sql-loader files for
    creating the database initially for Oracle on
    other platforms
  • Possibility to Import to Non Oracle databases
    (MySQL)
  • Periodic synchronisation with the MSD master
    database using periodic incremental scripts for
    all Oracle platforms
  • Use of two schemas, main search database and
    incremental

15
Incremental Data Export Import
  • Implemented in server side JavaScript
  • Data is exported as Oracle Export files
    organised in marts
  • Data files on the FTP server
  • Aim for weekly updates
  • Mechanism flexible enough to adapt on different
    data mart
  • Combinations
  • Prerequisites Rhino, Java, Oracle-JDBC driver,
    oracle-export-import
  • The user has just to download and run the
    periodic incremental import script of a data mart
    for his database
  • Database version, Data version, Data mart
    maintenance is controlled via the administration
    tables through synchronisation

16
Incremental Replication Mechanism
DATA MARTS
DATA MARTS
Increment log
Admin Tables
Admin Tables
JDBC
JDBC
crontab
crontab
PERIODIC EXPORT SCRIPT
PERIODIC IMPORT SCRIPT
Oracle Dump Files
Web-FTP Service
Target Database
MSD Search Database
Write a Comment
User Comments (0)
About PowerShow.com