Title: VIRTUAL INTEGRATION OF DISTRIBUTED HETEROGENEOUS DATABASES BASED ON GRID CONCEPT FOR agricultural research
1VIRTUAL INTEGRATION OF DISTRIBUTED HETEROGENEOUS
DATABASES BASED ON GRID CONCEPT FOR agricultural
research
- Seishi Ninomiya
- snino_at_affrc.go.jp
- National Agricultural Research Center
- NARO
2What is Grid?
- Concept and technology to share, integrate and
coordinate distributed computer resources - Software and Hardware
- Keeping autonomy of distributed resources
- Keeping heterogeneity of distributed resources
- The term was originally used for a framework to
realize a virtual supercomputer with distributed
CPUs - Computational Grid
- Data Grid seems to be more promising now
3A lot of resources such as data and programs are
available but .
4Users need to obtain one by one, knowing how to
access each
5e.g. Data Grid provides you
A virtually integrated huge database
We do not need to know where they are, how to
use,
6Potential of Grid in agricultural research
- Data integration/comparison among different
experiments/locations is highly required
particularly for evaluation of environment X
genetic effects - Tremendous number of data sets are being kept
unused once they were analysed within annual
or/and locational experiments - Data are managed by different organizations and
difficult to be centralized - Once they are integrated, then we could expect to
meet completely unknown facts through data-mining
7OutlineImplementations of Grid frameworks
applicable to agricultural research
- Spreadsheet-based crop data sharing and
integration - Consistent access to heterogeneous metrological
databases - Integration of crop data and meteorological data
8Spreadsheet-based crop data sharing and
integration
- Experimental data sets are usually stored using
ordinal spreadsheet applications - But not easy to collect them and merge them
particularly among different locations - A data grid based on spreadsheets is promising
particularly for agricultural research - Table formats are not uniform among different
locations
9Spreadsheet-based crop data sharing
integration
- Multi-location data sharing and integration
through daily data management by spreadsheet
application
Internet
DBMS
.
10Spreadsheet-based crop data sharing integration
- Once you enter your experimental data in
spreadsheet software (e.g. MS Excel), data become
automatically sharable over the Internet among
different locations - No skill is required
- Just a part of everyday data management
- Uniformity of tables are not strictly required
- Low cost in user sides
11Basic structure of application based on EJB
SOAP/XML
Container
12Direct data update with spreadsheet
- A client on Web-service
- Direct data upload from spreadsheet application
- Use of MS Excel VB macro its SOAP tool kit
- Seamless action with daily data management
13Data search/modification/update by Web browser
- One can obtain any combinations of records from
different locations/experiments - Data upload download by spreadsheet files
14Definition of data table by spreadsheet
- Structure of data table is registered by
spreadsheet - Heterogeneity of original data sheets in the
order of items and lacks of items are only
acceptable by the present version
15Now updating the application, adopting
web-ontology as a meta-DB to accept more
heterogeneous tables
e.g. plant height ?? ?? ??
16Test operations
- ca. 10000 records from rice adaptability tests
from 20 experimental stations were applied to the
application - The application was practically operational
- Those who are not good at computers could use it
easily
17Consistent access to heterogeneous metrological
databases
18Solution by Data Broker
- Data brokers provide consistent access to
heterogeneous DBs
Heterogeneous and Autonomous DBs
Rice Growth
MetBroker
DB A
Pest Management
DB B
Meta Data
Farm Management
DB C
Heterogeneity is absorbed by brokers (mediators)
DB D
19Database Broker Service
Data Summarization Ex) Daily mean from hourly
data
Database Driver
Data Secondary Processing
Client
Data Brokage
Data Request
DB A
Data Standardization
Standardized Data
Data request translated to DB C
Search
DB B
Data acquisition
Meta Database Where, How to use Data contents
DB C
DB D
20Data Brokers Developed
- Meteorological DBs
- MetBroker(23DB, gt22000 stations)
- Map DBs
- ChizuBroker(3DB,Japan,NZ,World)
- Digital Elevation DBS
- DEMBroker(2DB,Japan 50m, World 1Km)
- Soil DBs
- SoilBroker
21Adoption of EJB
Without EJB
Servlet Container
WEB Browser
DBMS
Application
WEB Browser
With EJB
WEB Browser
EJB Container
Servlet Container
DBMS
WEB Application
Application
WEB Service engine
WEB Service Client
JAVA Application
22Present Coverage of MetBroker
23Brokers Provided as Web Services
24New MetBroker with Web ontology
Metadata database
Decision-Making Support Services Operational
Products Simulation Models Detailed
Digital Forecast
Item Definition OWL
Station metadata RDF
2. Request
3. Request metadata
1. Register
Meteorological databases
DB Wrapper
Inference Engine
Broker
DB Wrapper
DB Wrapper
4. Request data
25Integration of crop data and meteorological data
26Integration of crop data and meteorological data
Standardized interface for data exchange
Rice growth model
MetBroker
HyDRAS
WeatherDB1
WeatherDB3
WeatherDB2
27Crop data and meteorological data
Crop DB
SOAP/XML
Data Extraction by Spreadsheet-based DB
Corresponding weather data
Location Date
Crop Data
XML/Crop data weather data
Models/Analysis
28Crop data and weather data are combined in an XML
file
29Conclusions
- Grid-based approach accelerates data integration,
helping several agricultural data analyses - Standardized interfaces make development of
integration framework much less laborious, less
time consuming and costless - Next step
- Need to evaluate scalability of this approach
- Integration framework with other types of data,
e.g. molecular data, soil data, variety/line data
(dendrogram)
30Thank you for your attentionhttp//www.agmodel.n
et/
31Dead Storage Data Issue
- A lot of digital data sets are produced in
agricultural experimental stations - Using ordinal software such as spread sheet
applications - But they are likely to be kept in local stations
and scientist level - The data sets are isolated and hardly integrated
among different locations - How to ease data publication for sharing for
unskillful end users
32The next step
- Multi DB environment
- Ontology to hide and interpret data heterogeneity
33????????
????
34(No Transcript)