Title: City of Charlotte Data Warehousing and Business Intelligence and Building Mashups By Example by Rattapoom Tuchinda, Pedro Szekely, and Craig A. Knoblock
1City of Charlotte Data Warehousing and Business
Intelligenceand Building Mashups By Example by
Rattapoom Tuchinda, Pedro Szekely, and Craig A.
Knoblock
by Doris Phillips 6010 Data Integration UNC
Charlotte
2Business Decision Making
Which?
(Raper, J., Building a Data Warehouse in a
Heterogeneous Tool Environment)
3Overview
- Describe current data warehousing/business
intelligence methods at the City of Charlotte - Present Building Mashups By Example byRattapoom
Tuchinda, Pedro Szekely, and Craig A. Knoblock - Compare data warehousing techniques to data
integration methods for mashups
4Focus on Processes Common to Data Warehousing and
Data Integration
- Data Retrieval
- Data Modeling
- Data Cleaning
- Data Integration
- Common processes but not the same!
5Data Warehousing at the City of Charlotte
6DISCLAIMER
7City of CharlotteSimplified KBU Organization
Chart
(Raper, J., Building a Data Warehouse in a
Heterogeneous Tool Environment)
8Sample Data Sources
- Accounts Payable
- Accounts Receivable
- Asset Center
- Emerald
- Faster
- General Ledger
- Kronos
- Hansen
- PeopleSoft
- Remedy
- Unisys Helpdesk
- Utility Billing System
9HUB and Spoke Design
SOURCES
TARGET DATA MARTS
(Raper, J., Building a Data Warehouse in a
Heterogeneous Tool Environment)
10Data Vault Components
- Hubs - Key Topics
- Links Relationships
- Satellites Details
- Auxiliaries Support
- Implemented as Relational Tables
- Diagramed Using Distinctive Shapes
(Raper, J., Phillips, D., User-Managed Metadata
Oracle Application Express Meets Oracle Warehouse
Builder 10.2)
11Hub Tables
- Hub records contain the logical business and
physical keys to the business data and its
context. - Relatively Stable over Time
- Keys
- Primary Key Surrogate ID
- Natural Key Unique Logical Key Fields
(Raper, J., Phillips, D., User-Managed Metadata
Oracle Application Express Meets Oracle Warehouse
Builder 10.2)
12Link Tables
- Link records provide information related to
relationships between Hubs and Links. - The information in link records can and does
change over time. - May have one or more Satellite Type records
- Keys
- Primary Key Surrogate ID
- Natural Key Unique Logical Key Fields
- Foreign Key HUB or LNK SIDs
(Raper, J., Phillips, D., User-Managed Metadata
Oracle Application Express Meets Oracle Warehouse
Builder 10.2)
13Satellite Tables
- Satellite records provide the structure to hold
the context or descriptive type information from
operational systems. - Maintain these changes over time.
- Related Directly to Hub or Link Records
- Keys
- Primary Key Surrogate ID
- Foreign Key HUB or LNK SIDs
(Raper, J., Phillips, D., User-Managed Metadata
Oracle Application Express Meets Oracle Warehouse
Builder 10.2)
14Auxiliary Tables
- Auxiliary records contain a variety of cross
reference and lookup descriptions tied to logical
business keys. - Standalone Support Tables Not Directly Linked to
HUB, SAT, or LNK tables. - Types of Auxiliaries
- Lookups
- Cross References
- External Tables
- ETL Work Tables
- Keys
- Primary Key is NK Unique Logical Key Fields
- May have FK
(Raper, J., Phillips, D., User-Managed Metadata
Oracle Application Express Meets Oracle Warehouse
Builder 10.2)
15Generic Data Vault Schema
(Raper, J., Phillips, D., User-Managed Metadata
Oracle Application Express Meets Oracle Warehouse
Builder 10.2)
16Oracle Warehouse Builder MappingNew Code Capture
(Raper, J., Phillips, D., User-Managed Metadata)
17Oracle Warehouse Builder Mapping New Code Capture
Detail
(Raper, J., Phillips, D., User-Managed Metadata)
18Data Cleaning
- Convert to UPPER case
- Trim blank spaces
- Replace NULL within UNK or UNKNOWN
- Check for valid values using lookups
- Remove duplicates
19Metadata Capture - Cross Reference Organization
Codes APEX Application
(Raper, J., Phillips, D., User-Managed Metadata)
20Data Warehouse Approach
- Extracted data from multiple heterogeneous
sources - Converted to Data Vault Architecture
- Cleaned data and transformed into desired format
- Combined data from multiple sources
- Data provided to users for reporting and data
visualizations
21Building Mashups By Example
- byRattapoom Tuchinda, Pedro Szekely, and Craig
A. Knoblock
22Overview
- Mashup A web application that integrates data
from multiple web sources to provide a unique
service - Goal Create a mashup building framework where
an average Internet user with no programming
experience can build Mashups easily
23Current Solutions
- Widget Paradigm
- Current Solutions involve selecting, customizing,
and connecting widgets together - Disadvantages
- As number of widgets gets large, locating the
right widget becomes confusing and time consuming
- Connecting widgets required understanding
programming concepts
24Widgets Yahoo Pipes
(Tuchinda, et. al, p. 140)
25Microsoft Popfly
- United States Information Widget - EDIT
- United States Information Widget - RUN
26Mashup Building Process
- Data Retrieval
- Extracting data from web pages into a structured
source (table or XML) - Source Modeling
- Process of assigning the attribute name for each
data column - Data Cleaning
- Required to fix misspellings and transform
extracted data into the appropriate format - Data Integration
- Specifies how to combine two or more data sources
together
27Karma Solution
The left window is an embedded web browser. The
top right window contains a table that a user
would interact with. The lower right window shows
options that the user can select to get into
different modes of operation. (Tuchinda, et. al,
p. 140)
28Data Retrieval Karma Table
- User Selects Value from Page, List
Automatically Populated
- User Selects Address for Value, List
Automatically Populated
(Tuchinda, et. al, p. 140-141)
29Source Modeling - Attributes
- Karma compares extracted data with existing data
in its repository - Automatically populates some attributes
- User specifying the correct attribute
- Users search existing attributes in data
repository
30Data Cleaning
- Users specify what data to clean
- Karma tries to pick up desired transformations
and populate the remaining columns
31Data Integration
- Analyzes attributes and data to determine
possible join conditions - Suggests existing data sources in the repository
that can be linked to the new data in the table
32Data Integration
- Locating the related sources from the repository
- Figuring out the query to combine the new source
and existing valid sources
33Data Integration
- Uses table constraints
- Uses programming methods and procedures induced
from user interaction
34Mashup Building Approach
- Combines most problem areas in Mashup building
into a unified interactive framework that
requires no widgets - Allows users with no programming background to
easily create Mashups by example
35Conclusions
- Data Integration / Mashup
- Data from enterprise applications and maintained
sources - Historical data for trending and analysis
- Extract, Transform, and Load
- Often Strategic information
- Data from web that may not be maintained or may
contain errors - Real time data for current information
- Extract, Model, and Clean
- Often tactical information
36Resources
- Linstedt, D. http//www.danlinstedt.com/AboutDV.ph
p. Nov. 9, 2008. - Linstedt, D., Garziano, K., Hultgren, H., May
2008. The New Business Supermodel The Business
of Data Vault Modeling. - Raper, Jim., Dec. 2004. Building a Data Warehouse
in a Heterogeneous Tool Environment. - Raper, Jim., Dec. 2004. From Source to Loading
Dock with Oracle Warehouse Builder - Raper, Jim., Phillips, Doris., 2006.
User-Managed Metadata Oracle Application
Express Meets Oracle Warehouse Builder 10.2 - Tuchinda, R., Szekely, Pl., Knoblock, C., 2008.
ACM., Building Mashups by Example.