METADATA for Data Stewards: DQI and the Customer Intelligence Life Cycle - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

METADATA for Data Stewards: DQI and the Customer Intelligence Life Cycle

Description:

Consistency. With other instances. With other information. Timeliness ... Verify Consistency across all Copies. Verify Dependency Rules are Valid ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 40
Provided by: David1124
Category:

less

Transcript and Presenter's Notes

Title: METADATA for Data Stewards: DQI and the Customer Intelligence Life Cycle


1
METADATA for Data StewardsDQI and the Customer
Intelligence Life Cycle
  • DAMA
  • 9 January 2002
  • Columbus, Ohio
  • Derek J Strauss
  • (with input from Alan Kolber and Larry English)

2
Levels of Data
Meta Schema Object, Property, Constraint,
... Schema Entity, Attribute, Relationship,
Screen, Report, Structure, Field, Application,
... Metadata CUSTOMER EMPLOYEE RESOURCE
Cust-Name Emp-Day-Sal-Amt Res-Hm-St
r-Addr Character-32 Integer Variable
Character Not Null 100ltvaluelt3,000 Data Joe
Jones 300 4 Elm Street Real World

3
Customer Intelligence Lifecycle
ANALYZE
1. Data Integration Tools
Data Audit
Data Quality
Data Warehouse
ETL
Operational Data Store
Data Discovery
Exploration Warehouse
CAPTURE
PLAN
Data Mart
Data Mart
Data Mining Warehouse
Data Mart
Data Mart
INTERACT
4
Customer Intelligence Lifecycle
ANALYZE
2. Analytical Tools
1. Data Integration Tools
Data Audit
Query Reports
Data Quality
OLAP
Data Warehouse
ETL
Data Mining
Operational Data Store
Data Discovery
Exploration Warehouse
CAPTURE
PLAN
Portals
Data Mart
Data Mart
Data Mining Warehouse
Data Mart
Data Mart
INTERACT
5
Customer Intelligence Lifecycle
ANALYZE
2. Analytical Tools
1. Data Integration Tools
Data Audit
Query Reports
Data Quality
OLAP
Data Warehouse
ETL
Data Mining
Operational Data Store
Data Discovery
Exploration Warehouse
CAPTURE
PLAN
Portals
Data Mart
Data Mart
Data Mining Warehouse
Data Mart
Data Mart
Campaign Mgt
Segment Analysis
Profitability
Loyalty Analysis
3. Analytical Applications
INTERACT
6
Customer Intelligence Lifecycle
ANALYZE
2. Analytical Tools
1. Data Integration Tools
Data Audit
Query Reports
Data Quality
OLAP
Data Warehouse
ETL
Data Mining
Operational Data Store
Data Discovery
Exploration Warehouse
CAPTURE
PLAN
Portals
Data Mart
Data Mart
Data Mining Warehouse
Data Mart
Sales/Stores
Data Mart
Web/ Email
Campaign Mgt
Call Center
Segment Analysis
Wireless
Direct Mail
Profitability
Loyalty Analysis
3. Analytical Applications
4. Interaction Systems
INTERACT
7
Customer Intelligence Lifecycle
ANALYZE
2. Analytical Tools
1. Data Integration Tools
Data Audit
Query Reports
Data Quality
OLAP
ETL
Data Mining
Data Discovery
CAPTURE
PLAN
Portals
?
Sales/Stores
Web/ Email
Campaign Mgt
Call Center
Segment Analysis
Wireless
Direct Mail
Profitability
Loyalty Analysis
3. Analytical Applications
4. Interaction Systems
INTERACT
8
6 Kinds of Data Quality
  • Accuracy
  • Of Value
  • Of Meaning
  • Precision
  • Level of detail
  • Level of granularity
  • Completeness
  • Consistency
  • With other instances
  • With other information
  • Timeliness
  • Consistent at a point in time
  • Responsive delivery
  • Accessibility
  • Security
  • Technology

9
Metadata for Data Quality
  • Applications Inventory
  • Data Models
  • Logical Data Modeling
  • Domain Analysis
  • Transformations

10
Applications Inventory
  • Applications
  • People
  • Documents
  • File Structures
  • Platforms - HW,SW (OS,DBMS,Language)
  • Software Code
  • Screens, Forms, Reports

11
Application Inventory
12
Applications Inventory Issues
  • Useful for many other purposes
  • Application Development
  • Application Maintenance
  • Disaster Recovery
  • Other information, including more detail, needed
    for other purposes
  • Platform information tough to model to meet
    different needs

13
Model Management
14
Model Management Issues
  • No one tool available to manage same component
    across different model types supported by
    different CASE tools
  • Must manage versioning over time
  • Must manage versioning across environments
    (Requirements Analysis, Development, Test,
    Production)

15
Sources of Information
  • Business Units/Users
  • Technical Support
  • Forms, Screens, Reports
  • Documents
  • The Data

16
Data Quality Sources of Information
17
Data Models
  • Source (Legacy) System Physical
  • Enterprise/Warehouse Logical
  • Target Physical
  • Physical to Logical Mapping

18
Data Model Types
WHAT (DATA)
Scope
Lists of subjects, containing major
entities Subject Diagrams
(Planner)
Conceptual Data Model (ERM)
TECHNOLOGY INDEPENDENT
(Owner)
Logical Data Model (Normalised to 5NF)
(Designer)
multidimensional relational hierarchical, etc..
(Builder)
Physical Model
DBMS-specific models Data Definition Language
(DDL)
eg Sybase or UDB (UNIX) eg DB2 (Mainframe)
TECHNOLOGY DEPENDENT
(SubContractor)
Functioning System
Files/Tables Fields
19
Data Models Needed
Corporate Logical Data Model
Reverse Engineered Physical Data Model
Data Warehouse Logical Data Model
Data Mart Logical Data Model
Data Warehouse Physical Data Model
Data Mart Physical Data Model
20
Logical Data Modeling
  • Entities
  • Relationships
  • Attributes
  • Rules
  • About Entities
  • About Relationships
  • About Attributes (Domains)

21
Logical Data Modeling
22
Attribute Purposes
  • Identifier
  • Names
  • ID Numbers
  • Descriptor
  • Locator
  • Geographic (real)
  • Electronic (virtual)
  • Date/Time
  • Classifier
  • Codes/Types
  • Indicators
  • Measure
  • Amounts (money)
  • Counts (discrete)
  • Quantities (continuous)

23
Attribute Purposes
24
Quality Issues for Attributes
  • Classifier
  • Accuracy
  • Precision
  • Consistency
  • Measure
  • Accuracy
  • Precision
  • Consistency (UOM)
  • Timeliness
  • Identifier
  • Accuracy
  • Precision
  • Completeness
  • Consistency
  • Accessibility
  • Descriptor
  • Accuracy
  • Locator
  • Accuracy
  • Precision

25
Domain Rule Types
  • Note All Domains constrained by format
  • Other (for IDs, descriptors, and locators)
  • Value Expressions (for measures)
  • Range Limits
  • Derivations
  • Value Sets (for classifiers)
  • Multiple Symbol Sets
  • One Official Symbol Set

26
Domain Rules
27
Domain Analysis - Value Expressions
28
Domain Analysis - Value Sets
29
Value Sets Versus Symbol Sets
  • Value Sets composed of Values
  • U.S. State Alabama, Arkansas
  • U.S. State, Territory, Possession Alabama,
    District of Columbia, Puerto Rico...
  • Symbol Sets composed of Symbols
  • 4 Character U.S. State ALAB, ARK
  • 2 Digit U.S. State 01, 02
  • 2 Character U.S. State AL, AKofficial
  • 2 Character U.S. State, Territory, Possession
    AL, AK, DC, PRofficial

30
Logical to Physical Mapping
31
Data Modeling Issues
  • Tools dont support enough rows of ZFEA
  • N-ary versus binary relationships
  • Multiple type hierarchies (super/sub-types)
  • Domain information not supported well
  • Physical to Logical mapping
  • Many to one
  • Many to many

32
5 Cleansing Mechanisms
  • Fix the data at the source
  • Fix the code at the source
  • Fix the process at the source
  • Fix the metadata
  • Transform the data on the way into a data
    warehouse

33
Source to Target Transformations
  • Transformations
  • Transformation Components
  • Physical Data Elements involved in the various
    Transformation Components
  • Source Elements
  • Target Elements
  • Reference Elements
  • Metrics on the whole transformation process

34
Source to Target Transformations
35
Data Quality Assessment Audit Process

Implement Data Audit
After Larry English
  • Number of Records
  • Hash Totals
  • Financial Amount Balances
  • Reasonableness Checks
  • Row counts
  • Referential Integrity Checks

Analyze Data Definition Quality
Analyze Data Content Quality
  • Conformance to Meaningful
  • Enterprise Standards
  • Meaningful Business Names
  • Clear, Precise, Complete
  • Consensus Definition
  • Singularity of Definition
  • Complete, Exact Definition of
  • Domain
  • Completeness, Accuracy and
  • Usefulness of Integrity Policies
  • and Business Rules
  • Measure Completeness
  • Measure Validity (domain values,
  • ranges,derived data)
  • Identify Potential Duplicate Records
  • Verify Consistency across all Copies
  • Verify Dependency Rules are Valid
  • Verify Derived Data is Correctly Calculated
  • from its Base Data
  • Verify Format is Consistent/Complete
  • Interpret and Report Results
  • Track Over Time to Communicate Progress
  • Satisfies Stakeholder Expectations

Implement Data Cleanup Actions
  • Produce Data Assessment Report
  • Produce Assessment Procedure Report
  • Summary Reports
  • Exception Reports
  • Control Charts
  • Pareto Diagrams
  • Weight Errors (not all errors are equally bad)
  • Recommend Corrective Quality Improvement Actions

36
4 Key Questions from Users
  • Where did this data come from?
  • Where did my data go to (so I can find it)?
  • Nobody seems to need this data. What would
    happen if we get rid of it?
  • Is this new data clean? (Did the clean-up work
    and can I use this data now?)

37
(No Transcript)
38
Key Assumptions
  • ? The framework is a classification schema
  • ? The rows represent roles that use information
  • ? The columns represent aspects of information
  • -- must be understood
  • -- must be controlled
  • -- can be optimized
  • ? The cells represent unique types of
    components

39
The Repository Model - A Starter Set
Write a Comment
User Comments (0)
About PowerShow.com