Title: A Hybrid Model for Data Synchronism in Data Warehouse Projetcs
1A Hybrid Model for Data Synchronism in Data
Warehouse Projetcs
- Isabel Cristina Italiano
- ici_at_ime.usp.br
João Eduardo Ferreira jef_at_ime.usp.br
Universidade de Sao Paulo - Brazil
Seventh International Database Engineering and
Applications Symposium IDEAS 2003 Hong Kong
2Data Warehouse Synchronism Model
CriticalInformation
3Objective
- Introduce a framework to define the most suitable
synchronism option - Parameters and function
- For each data portion
4Dynamic Synchronism Model
- Dynamic Synchronism updates on transactional
environment reflected immediately on analytical
environment - DW loading divided among various maintenance
transactions - Data portions based on business niches
- Solution for paralelization of static and dynamic
loadings
5Hybrid Synchronism Model
Static DW
- Portions are synchronized in different time
intervals - Based on Characteristics of
- Transaction/Operational Environment
- Analytical Applications Environment
Dynamic DW
6Transactional Environment Characteristics
- Transactional Environment
- Set of functions that generates information
stored in Data Warehouse - Functions that handle data sets
- Regarded as functional units in transactional
environment
7Transactional Parameter P1Applicability
- The need for dynamic maintenance
- Typically static
- Transactional system has characteristics that
require a typically static loading, with no
requirement for implementing dynamic updates - Dynamic
- Transactional system requires a data warehouse
dynamic update, considering the analyzed set of
transactions
8Transactional Parameter P2Adaptability
- The adaptation capacity of the transactional
environment - Dynamic loading is allowed to be implemented
- Requirements could be added for gathering and
propagating information to the data warehouse and
the transactions involved - Dynamic loading is not allowed to be implemented
- There is not a minimum control level to implement
the functions
9Transactional Parameter P3Time Interval
- The desirable time interval for synchronism
- Minimum Possible (Real Time)
- Sysnchronism must be in real time, keeping only
the minimum time required for information
transmission and processing - Determined Interval
- Application requires synchronism in a certain
period of time, longer than required for real
time, but shorter than the static loading
frequency
10Transactional Parameter P3Time Interval
- The desirable time interval for synchronism
(cont.) - Postponed Synchronism
- Synchronism can be kept at regular intervals
determined by the data warehouse update static
process during its periodic loading (eg. day-1,
day-2, month etc.)
11Transactional Parameter P4Complexity in
Propagation
- Complexity in Propagation number of different
objects involved in the transaction during a
transactional database update - Low
- Small number of distinct objects involved in the
update (1 or 2) - Average
- From 3 to 5 distinct objects involved
- High
- More than 5 distinct objects involved
12Transactional Parameter P5Complexity in
Consistency
- Complexity in Consistency degree of complexity
of check and validation operations - Low
- There is not, or it is too small the need to
validate or check other elements in addition to
the information handled by the transaction
(simple selections with no more complex
operations) - Average
- Average level of complexity when preparing
information for loading (joining-related
selection operations) - High
- Transaction requieres highly complex checks and
validations when preparing information to be
transmitted to DW (joins, group bys, having
operators)
13Analytical Environment Characteristics
- Analytical Environment
- Different types of users, queries
- Variety of views of the DW
- Numerous combinations of aggregations and
detailed data - Several strategies to implement dynamic
aggregations (on fly), virtual views,
materialized views, summarization tables, cubes
14Analytical Parameter P6Use of Aggregates
- The use of aggregates in analytical queries
- Just Detail
- Queries involved do not use aggregates em,
therefore, can only access information at a lower
granularity level - Aggregates
- Queries perform functions such as sum, count,
group by and others, supplying aggregate
information to the user
15Analytical Parameter P7Aggregates Implementation
- How aggregates are implemented in the analytical
base - None
- Analytical DB dos not contain any pre-aggregated
information. Any aggregated value requirement - Virtual Views with Aggregates
- Analytical DB contains aggregation views (not
materialized) that can be used by the queries
(computed during queries executions)
16Analytical Parameter P7Aggregates Implementation
- How aggregates are implemented in the analytical
base (cont.) - Materialized views with aggregates
- Analytical DB contains tabels or views of the
materialized aggregations, which contain
aggregated information required by the inquiry - Cubes
- All aggregation levels required by the queries
have been preciously prepared and stored in cubes
17Decision Trees
- Parameters must be jointly evaluated
- Displayed as decision trees representing all
possible combinations - Leaves represent the select synchronism level
- Dr dynamic model (real time synchronism)
- Ddt dynamic model with synchronism in a certain
time interval (?t) - St static model with synchronism based on
periodic loadings - -- combination path is invalid
18The Parameter Analysis Function
- Applied on information subsets contained in data
warehouse - Analysis considering that portion of DW
information and the characteristics of the
transactional environment that originates that
information - Based on values of decision trees parameters
19The Parameter Analysis Function
- Result the most suitable synchronism option to
an specific data portion - Prototype was developed
- Requirements change over the time
- New business requirements
- New analysis requirements
- DW volume different aggregation strategies
- Frequent monitoring to identify possible changes
in the synchronism model
20Conclusion
- We have presented
- Customized solution for different processes of
data update between transactional and analytical
environments - We have proposed
- Hybrid data synchronism model for data warehouse
projects - Guidelines for transactions classification
- Parameters and function to determine the most
suitable synchronism option
21Future Works
- Incorporation of the prototype developed through
data transfer tools - Evaluation and extension of the hybrid data model
for heterogeneous databases - Methodology for using hybrid models for data
synchronism in data warehouse projetcs