The Data Warehousing 2001 - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

The Data Warehousing 2001

Description:

What are sites around the world doing with DW today. How are the biggest schools of ... Offload DSS from the mainframe platform. Get around security issues ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 48
Provided by: drjohn91
Category:

less

Transcript and Presenter's Notes

Title: The Data Warehousing 2001


1
The Data Warehousing 2001
  • Continuing the Evolution

2
Agenda
  • The evolution
  • What are sites around the world doing with DW
    today
  • How are the biggest schools of thought coming
    together
  • Why is data warehousing more important than ever?
  • Is architecture still a dirty word?

3
The progression
  • 1st data warehouse in 1905 by Dupont Corp
  • 1st data cube by sales, branch and date
  • 1970s - Management Decision Systems developed
    product called Express (Oracle)
  • 1983 Metaphor - founded by Ralph Kimball and 2
    partners as standalone DSS
  • Lessons learned - manage information as corporate
    resource
  • 1980 - E.F.Codd - Promise of relational databases
    (data every which way)
  • Inmon 1993 - Popularisation of the term

4
Original Data Warehouses
  • Set up primarily to
  • Offload DSS from the mainframe platform
  • Get around security issues
  • Let the end users look at data in a safe
    environment
  • To provide a place to cleanup data, by
    replicating it somewhere else
  • IT was the Primary beneficiary

5
Evolution through 90s
  • Reporting
  • Summarisation
  • EIS applications
  • OLAP
  • Data Mining
  • Intelligent Agents
  • Active Warehouses

6
Why Did it Take off This Time?
  • We finally have the ability to store vast
    quantities of data
  • Intense competition in the business world
    requires ability to automatically adjust
  • This means understand trends/change as they
    happen
  • Parallel processing technologies make querying
    vast quantities of data possible
  • Availability of desk-top or web-based tools to
    slice and dice

7
Critical Success Factors in DW Yesterday
  • Sponsorship - Gotta have it
  • Requirements driven development versus build it
    and they will come
  • Data quality imperatives
  • Solid database design
  • End user deployment as opposed to information
    centre
  • DW Methodology versus standard AD cycle

8
Critical Success Factors in DW Today
  • So whats changed
  • Application requirements, not data requirements,
    are driving need
  • Successful DW environments are more concerned
    about getting better and better usage
  • Hard befits soft benefits of users DW ROI
  • New pitfalls
  • Ignored Deployment
  • Obsessive cleansing
  • Some DSS have finite lifespans
  • Failing to socialise new applications
  • Advanced technology groups
  • Talking to the wrong end users!

9
Data Warehouse Institute San Diego 2000
  • Conference Purpose
  • Provide a vendor-independent forum for sharing
    information about data warehousing
  • No Hype, No Bias, No Fluff
  • Vendor exhibitions - equal footing
  • High quality education for all levels
  • Highlights for this year (and for me)
  • working with click-stream data to measure and
    improve e-business initiatives
  • Some success stories! (and some tragedies)
  • Measuring and justifying DW projects is being
    done successfully using solid return on
    investment figures
  • Solutions to the metadata problem

10
Some best practices
  • Web-based deployment
  • Power users still need client-server
  • Internet portals
  • near real-time updates
  • On demand ad-hoc
  • No query limits
  • Atomic data, used for predictive modeling
  • as opposed to predictable answer for predictable
    questions
  • Formalised acquisition of data
  • shared processes to reduce the re-work

11
What are other people doing?
  • Web Wide analytics
  • Collecting and integrating click streams
  • Goal to improve relevance and efficiency of site
    content and advertising targeting
  • Almost impossible to do if you dont build web
    applications with this in mind
  • Some new data types to confuse us
  • PII !! (personally Identifiable info)
  • Sessionization
  • Anonymous cookie profiles - event pings

12
Layers of BI
Knowledge Development - No Hypothesis
Modelling - With some Hypothesis
Multi-dimensional Analysis
Degree of understanding
Standard Ad-hoc Queries
13
Customer Relationship Management (CRM)
  • Is enabled by the Data Warehouse
  • Need information pulled together to turn data
    into knowledge
  • Detailed customer analysis, segmentation to
    segments of 1
  • Pattern recognition, and profiling through data
    mining
  • Enter the active data warehouse

14
What do we mean by CRM
  • Embedding of customer management and relationship
    building into every organisational aspect
  • The use of information technologies to drive a
    share of wallet strategy in markets where
    personal account management is not appropriate or
    cost-effective
  • NOT
  • Some data warehousing
  • Some data mining
  • campaign management systems
  • some analytical tools with automation of sales
    force thrown in

15
CRM - Why the big deal now?
  • Customer needs -
  • Loyalty out the window
  • Competition fiercer due to globalisation and
    increased commoditization in all industries
    (especially services)
  • Retention - is cheaper than acquiring new
    customers (10X)
  • The challenge
  • Enabling strategies are just that, the key to
    achieving the value is in the doing
  • Enormity of the task
  • Where to start

16
Key Message Think Big, but Start Small
  • Single campaign of high priority, tight scope
  • Support it minimally at first
  • Data
  • Target by analysing information
  • Study the channel mix
  • Measure
  • Absorb into organisation
  • Develop systems, architecture, skills transfer to
    other areas
  • Long term strategy

17
Critical Success Factors - CRM Related
  • Field of dreams - when do you get payback?
  • Cultural changes that are required
  • Sponsorship (not until it looks like a winner)
  • Training, training and more training
  • Knowledge and tools without action and the right
    attitude by the people does not translate into
    success
  • What does this look like?

18
(No Transcript)
19
What a Data Warehouse Architecture IS
  • A data warehouse architecture is a blueprint, an
    arrangement or a map
  • A data warehouse architecture is a plan which is
    the technical translation of the business
    requirements
  • Data Warehouse Architecture is the set of
    components required to deliver an organisations
    information capability
  • A data warehouse architecture is a set of
    principles that an organisation determines to
    follow to achieve an information capability

20
What does it allow you to do?
  • Spend time understanding the priorities of the
    business
  • Determine which scale your company will fit into
  • Choose tools (if youre lucky)
  • Determine which components are required and which
    are luxuries
  • People, Processes and Technology
  • Choose the right religion

21
What a Data Warehouse Architecture IS NOT
  • It is NOT a detailed plan
  • It is not something that you go out and build or
    buy
  • It is not a set of imperatives that must be
    followed in all situations
  • It is not a diagram or a document
  • Although these may be useful means of
    communicating, explaining, and recording it

22
Why do you need an architecture anyway?
  • On its own, an architecture does not deliver
    anybody anything!
  • But the same could be said of Project Plans, Data
    Models, Road maps, methodologies!
  • If youre going to build anything, you need a
    plan. If you dont plan, plan to fail
  • Serves as communication point for the architects
    and the customers
  • Assists in breaking down work into manageable
    components so that you can determine overall
    costs

23
What will happen to it.(Unfortunately)
  • People will get sick of hearing about it! (0.7
    Probability)
  • At some point, you will be struggling to find
    another word for it
  • It will change (0.9 Probability)
  • The situation that you didnt plan for emerges
  • Someone will disobey architectural principles
    (0.9999 probability)
  • And this may be you (0.85 probability)

24
What happens without agreed architecture?
  • Failure to manage expectations Youll have
    trouble selling the vision going forward
  • Constant rework due to duplication of effort
  • Difficulty in estimating costs
  • Departments and business units will do their own
    thing and you may never get the opportunity to
    catch up

25
Architecture as a religion...
  • When principles become more important than
    delivering business value
  • Zealotry / devotion to theory
  • Architectural trade-offs are seen as compromising
    principles
  • Standing on the sidelines of projects waiting for
    failure
  • I told you so (sinner repent)!
  • Elevator white-anting

26
Architecture as evolution
  • Agreeing on a first cut version, then selecting a
    piece to deliver the first slice of business
    value
  • Modifying the first version over time as business
    imperatives alter
  • Building or buying the components in stages when
    it makes sense
  • ETL tools, Multi-dimensional DBs, Metadata
    dictionary, ODS

27
Architectural deviation...
  • When you need to deliver a business solution
    sooner
  • when sponsorship appears to be at risk
  • To prove that you can actually deliver something
  • Previous projects have failed
  • Examples
  • Standalone data marts
  • Implemented prototypes

28
Some situations not to compromise.
  • Making up fun new dimensions and measures
  • Customer, Client, Debtor, Partner
  • All defined differently
  • How much more to do it right
  • Recording and Euros, average balances and
    opening balances as Balances
  • Exception - Analysis paralysis

29
Communicating the Architecture
  • Communication plan mandatory
  • List of key stakeholders
  • Devise the preferred way of informing them of
    progress and requesting support
  • Consistency
  • Frequency of communication
  • not too much, and not too little
  • Ensure that each stage/phase can be related to
    its predecessor

30
How far do you goand when do you stop!
  • Symptom
  • A Excessive data modeling
  • A Sponsors dont return your phone calls
  • A New products or technology innovations are
    sending you back to the drawing board
  • Diagnosis
  • Advanced analysis paralysis

31
The Cure for A.A.P.
  • Remember why were doing this?
  • To have a picture of where we are going
  • To be able to scope out the next project
  • To understand the costs in enough detail to get
    to the next stage
  • To be able to communicate internally and
    externally
  • When youre there, STOP
  • Get STARTED

32
The Great Debate
  • Father of DWing
  • Origin - Prism, Pine Cone Systems, Author
  • Corporate Information Factory
  • Dependent Data Marts
  • Normalised Warehouse
  • billinmon.com
  • Lifecycle Toolkit Man
  • Redbrick
  • Dimensional Model
  • Data Mart Data Mart Data Mart DW
  • Dimensional everything
  • lifecycle-toolkit.com

33
(No Transcript)
34
(No Transcript)
35
Where they meet
  • An architecture is required
  • Conform dimensions and measures across the
    enterprise
  • The star schema model is a useful way to present
    information to users
  • Build a data warehouse iteratively
  • Metadata is of crucial importance
  • That they both have it right!

36
Where they disagree
  • Granularity required
  • Which modeling technique to use and when
  • ER Modeling
  • Star Schema/Dimensional Modeling
  • The role of the data mart

37
Architectural Components
  • Ralph Kimball says.
  • Data staging area
  • Collection of DMs DW
  • ODS (Internal/external) really just atomic DM if
    for DSS)
  • Data Mart - not quite the same
  • N/A
  • star schema used for everything
  • Archived data
  • Metadata
  • Bill Inmon Says.
  • Integration and Transformation Layer
  • Enterprise data warehouse
  • ODS
  • Data Mart
  • Exploration DW
  • For outside square end user
  • Near line storage
  • Archived, less frequently used data
  • Metadata

38
Granularity in the Warehouse
  • Ralph Kimball says.
  • Declaring the grain of a data mart at the lowest
    level will make design impervious to changes
  • The grain of the time dimension will usually be
    individual days (doesnt this conflict)
  • One of most important decisions is declaring the
    grain correctly
  • Bill Inmon Says.
  • Level of granularity in the data warehouse should
    be at the lowest possible level required for any
    data mart
  • If that isnt very low, go lower!
  • Atomic data warehouse data should be archived so
    that any new views/summaries can be recreated

Who is right?
39
DW and role of E/R Modelling/
  • Ralph Kimball says.
  • ER Models are too complicated for end users to
    understand
  • ER Modeling/normalising only suitable for OLTP or
    in data staging area since it eliminates
    redundancy
  • Results in too many tables to be easy to query
  • ER models are optimised for update activity not
    high performance querying
  • Bill Inmon says.
  • ER Model is suitable for data warehouses because
    it is stable, and supports consistency and
    flexibility
  • Normalised data is ideal basis for the design of
    the Data Warehouse and the ODS
  • May not be suitable for the data mart, which
    deals heavily with regular query activity and
    time-variant analysis

Who is right?
40
Dimensional Modelling and Star Join
  • Ralph Kimball says.
  • DM is only viable technique for designing
    databases in the Data Warehouse environment
    because it provides a predictable framework
  • Even lowest level granular data should be in
    dimensional format
  • Every E/R model has an equivalent dimensional
    model representation
  • Any type of business data can be represented as a
    cube
  • Bill Inmon Says.
  • DM is reasonable viable technique for designing
    data marts, when type of access is very
    predictable
  • DMs are not suitable for updating at all
  • Differing business areas will likely want a
    different dimensional model to look at similar
    data
  • Series of dimensional models are not flexible
    enough to support an enterprises entire Data
    Warehouse

Who is right?
41
Role of the Data Mart
  • Ralph Kimball says.
  • Successive data marts built on a star schema
    model together form a data warehouse
  • The bad publicity about data marts comes from
    implementation of isolated stovepipe data marts
    done badly, and not conforming dimensions and
    measures
  • Data Marts can be atomic but should still be in
    dimensional view format
  • Bill Inmon says.
  • Data marts should be populated by the data
    warehouse and external data only
  • Can contain subsets, aggregated data or atomic
    data
  • Provide a departmental view of the world
  • May or may not reside on a different platform
    from DW
  • Provide for repeatable, predictable types of
    information delivery

Who is right?
42
Where Inmon works best (my opinion)
  • Large organisations with many different business
    units that need to share information
  • Multiple MISs/DSSs in place, and feeling the
    pain from inconsistencies and many interfaces
  • Traditional data modeling skills in-house and
    understood

43
Where Inmon fails us
  • Little attention is paid to the value and rigour
    required for dimensional modeling
  • Defined hierarchies are glossed over
  • Does not stress the concept of conformed
    dimensions and measures
  • This is implied but not stressed
  • Assumed as part of integration layer

44
Where Kimball works best
  • Small organisations, predictable measuring
    capability required
  • Where how you need to look at the data is a
    no-brainer
  • Dimensions and measures are well-known and not
    likely to change
  • Where lowest level of granularity does not create
    Terabytes
  • Traditional data modeling was not successful and
    is not practiced anyway

45
Where Kimball fails us
  • If you get the initial granularity wrong, you are
    in deep trouble
  • He warns you about this, but does not really
    provide a solution
  • Gut feeling says daily is often not enough
  • If a new way of looking at things emerges, could
    cost a rebuild
  • Assumes users are too dumb to make sense out of
    snow-flaking

46
Best of Both worlds?
  • Why not.
  • Pay strict attention to conforming dimensions and
    measures across the business
  • Also model hierarchies early in piece
  • Have a permanent staging area (3rd normal form)
    and name it an atomic data warehouse
  • Feed dimensional data marts from this DW/Staging
    area
  • Build data marts for departments going thru
    staging area

47
Where does this leave us?
  • Has become an integral part of todays decision
    making processes
  • DW is here to stay as
  • enabler of CRM
  • Will continue to evolve
  • Role will move to be more operational
  • Will not work without a sound information
    architecture
Write a Comment
User Comments (0)
About PowerShow.com