Title: The EAVCR WebDB Toolkit: An open source application framework for building evolvable neuroscience da
1- The EAV/CR WebDB Toolkit An open source
application framework for building evolvable
neuroscience databases - Luis Marenco
- Center for Medical Informatics
- Yale University School of Medicine
- 2004
2Outline
- Neuroscience knowledge data is characterized for
being in constant evolution. An issue that
affects traditional waterfall application
design where relational database schemas are
designed at-front with applications hard coded to
them. - A particular approach to this problem will be
reviewed in these topics - Motivation The SenseLab project
- Background issues of traditional applications
- Evolvable database applications goals
- Possible solution scenarios
- EAV/CR and derived methodologies
- EAV/CR applications SenseLab and NDG
- EAV/CR Solution Framework (EAVCR WebDB Toolkit)
3Motivation The SenseLab Project
- The SenseLab project is a ongoing effort to
integrate multidisciplinary sensory data derived
from the olfactory system. - This process involves the development of
neuroinformatics databases and tools in support
of this research. - SenseLab currently consists of the following
Web-databases - Neuronal research NeuronDB, ModelDB, and
CellPropDB - Olfactory research Olfactory Receptors DB,
OdorDB, and OdorMapDB
4Background Issues Traditional Databases
- Traditional database applications are
characterized with code entwined database schema
elements. - The research of a not well understood process
like olfaction involves constant variable
revisions affecting the DB schema and derived
code. - The implications of following this approach in
SenseLab created - Increased code complexity as new elements were
added to the DB - Limited code reusability (code specific to every
data category) - Lack of robust interoperability (schema
dependency) - Changing knowledge embedded in schemas created
- Downtime and application breakdown
- Interface redesign (User and Interoperability)
- Introduction of code errors (when updating code)
- Exponential maintenance burden (due to all of the
above)
5Background Issues Traditional Databases (2)
- Web-database applications additionally involve
- Data entry and security Elaborate, expensive and
with limited portability - Ad hoc searching mechanisms are difficult to
standardize and expensive to maintain. - Hard-coded Interoperability can be cumbersome to
adapt to new standardized formats. - At the databases metadata level
- Built-in data dictionary lacks expressivity
- Limited schema extensibility
- Reduced data types
6Evolvable Database Application Goals
- PRIMARY
- Create a programmatic approach capable to allow
DB structural changes without disrupting the
existing data and code - Minimize codemetadata dependency focusing on
automated interface generation (human and
automated agents) - Attempt to improve code simplification as project
matures (Extreme programming principles) - SECONDARY
- Facilitate system integration to a Web platform
- Allow accessibility from common web browsers
- Incorporate role-based security for public and
private data - Create generic interfaces and formats for data
exchange - Improve code reusability leveraging interfaces
and formats - Foresee robust interoperability with extensible
protocols
7Some Possible Solution Scenarios
- Object oriented or object relational databases
At the time Immature and unsupported - Leveraging of other flexible application
approaches (e.g. Protégé) Lack of features
(e.g. non-distributed, or web-based) - Built a new ground-up solution to provide
needed featuresThe EAV/CR Application
Framework(Data storage software practices)
8EAV/CR Storage Approach
- EAV/CR (Entity-Attribute-Value with Classes and
Relationships)Is a data storage approach derived
from EAV, a row based data modeling technique
widely used in AI , Electronic Patient Record
Systems, MS Windows Registry, and others. - EAV/CR uses a limited number of tables and
constrains to represent any amount of tables,
fields and cells from a RDB
9EAV/CR Storage Approach
- Conceptual
- EAV/CR augments standard EAV by
- Allowing unlimited categories grouping entities
in Classes C
10EAV/CR Storage Approach (2)
- EAV/CR augments standard EAV by
- Implementing strong data typing for values
- Extending data types (computed attributes)
- Allowing entity relationships R (inter-class
and hierarchies) - Including implicit data and metadata versioning
and timestamp - Including Web oriented features Enriched
web-oriented metadata to automate web-interface
generation (Web forms, XML, ) - Facilitating ontological representation Mapping
standardized vocabulary and semantic
relationships identifiers to data and metadata
elements - Ability to create database portals to present
different subsets of the data to users with a
particular research focus - Centralized role-based security. Uses distributed
administration model to minimize dedicated
administration costs - Monitoring tools
11EAV/CR derived methodologies
- Expandable system architecture Allows parallel
processing by scaling-out. Parallel middle-tier
servers connect to the same EAV/CR database
preserving security, data and metadata
concurrency - Delegated user profile management Users are
responsible of their own profiles, administrators
provide access and restrictions to specific
database resources. (Web portal model for data
and metadata) - Distributed data Shared Classes among databases
allow tight data integration minimizing redundancy
12EAV/CR derived methodologies (2)
- Data Services Creation of the EAV/CR Dataset
Protocol (EDSP) . An InfoSet protocol that
describes database structural ontology,
metadata, and data in a simple XML format. (It
brings the EAV/CR approach to the XML world). - The following processes depend on EDSP
- Data transference
- Middle tier components
- Automated Ad-hoc query interface generation
- The use of EDSP as the source for these
processes has improved software components
stability and reusability
13The EAV/CR Application Framework
- Programming model
- Database component programmer
- Domain programmer
- EAV/CR Framework Toolkit (version 1)
- Database Component Encapsulates EAV/CR logic
presenting interfaces for domain programmers.
Created in C MS.NET - Plumbing code Generic web scripts for metadata
driven navigation and interface generation.
ASP-VBScript migrated to C MS.NET 2.0 (Visual
Studio 2005) - Domain programmers customize plumbing code to
their research goals.
14EAV/CR Summary
- EAV/CR and Evolvability
- High data integration
- Flexibility in database schema evolution /
maintenance - Code reuse and increased reliability
- Extensible application architecture
- Disadvantages
- Querying complexity
- Multi-parameterized queries performance penalty
- Complex EAV/CR components programming
- Future Directions
- Improve disadvantages
- Test bed to design evolvable interoperability
mechanisms like next SOAP version WS-STAR
(Microsoft, IBM, Oracle, etc)
15Links / Team
- SenseLab Project - http//senselab.med.yale.edu
- SfN - Neuroscience Database Gateway -
http//big.sfn.org/ndg - EAV/CR Web site / WebDB toolkit / EDSP protocol
-http//ycmi.med.yale.edu/EAVCR - Team Members
- Gordon Shepherd PI
- Perry Miller Project PI
- Michael Hines Project PI (ModelDB/Neuron
design) - Luis Marenco System/DB design
- Prakash Nadkarni System/DB design
- Qin Zhang EAV/CR WebDB Toolkit developer
- Chiquito Crasto OrDB/OdorDB administrator /
domain programmer - Tom Morse ModelDB/NeuronDB administrator /
domain programmer - Nian Liu OdorMapDB administrator / domain
programmer -
- Follow - DEMO SLIDES
16Centralized Schema Management
17Centralized Schema Management (2)
18Metadata extensibility
- EAV/CR allows global ontological annotation of
any data or metadata element in the database.
19Metadata driven Ad hoc interface generation
- This generic interface is built in real time by
reading the metadata. Boolean expressions can be
added for complex associations. Results can be
retrieved in HTML, XML text and other formats.
20Metadata driven Ad hoc interface generation (2)
- The same generic code is reused by other
databases augmenting the value added to this
robust evolvable design.
21InfoSets and Evolvable Interoperability
- The creation of the EDSP (EAV/CR dataset
protocol) allows transference of database schema
and data in a simple consistent extensible
format.This picture show partial information of
some olfactory receptors molecules from ORDB
22InfoSets and Evolvable Interoperability (2)
- Data exchange with standardized formats can be
achieved through XML transformations. Below the
previous EDSP message transformed into Microsoft
XDR, a format used by the MS Office Suite to
import/export data and metadata into MS Access
and SQL Server databases.
23Importing EAV/CR database into MS Access
24Importing EAV/CR database into MS Access (2)
- http//senselab.med.yale.edu/senselab/site/dbGate/
Xtract.asp?o1798xsledsp-officedata
25Importing EAV/CR database into MS Access (3)
26Importing EAV/CR database into MS Access (4)
27Importing EAV/CR database into MS Access (5)
- relationships, and the data (preserving strong
data typing ) - All in one deEAVfication process.
28Importing EAV/CR database into Protégé ontology
29EAV/CR Physical DB Diagram
SenseLabPhysical schema Mix of both
worlds EAV/CR and RDB
- http//senselab.med.yale.edu/senselab/site/dsArch/
images/Visio-EAVCR_Physical_Schema_021205.png