Data Warehousing: the New Knowledge Management Architecture for Humanities Research? - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Data Warehousing: the New Knowledge Management Architecture for Humanities Research?

Description:

Title: PowerPoint Presentation Author: Dr. David Anderson Last modified by: DelveJ Created Date: 3/7/2003 3:36:06 PM Document presentation format – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 37
Provided by: DrDavidA7
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing: the New Knowledge Management Architecture for Humanities Research?


1
Data Warehousing the New Knowledge
Management Architecture for Humanities Research?
  • Janet Delve
  • University of Portsmouth, UK
  • UKAIS 2004

2
Introduction
  • Data Warehouses everywhere
  • Amazon
  • WalMart
  • Opodo
  • DWs used a lot in industry, and scientific
    research, but not in humanities research.
  • Written paper covers linguistics and history.
    Talk covers history in detail and gestures
    towards linguistics.

3
Overview
  • Introduction
  • Data modelling and traditional databases
  • Source-oriented data modelling
  • Data Mining
  • Philosophy of data warehousing
  • Background of DWs
  • Basic components of a data warehouse (DW)
  • Advantages of DWs
  • Findings Humanities and DWs
  • Humanities and DWs some issues
  • Examples of possible Humanities DWs
  • Ideas for the future?

4
Data Modelling
  • Relational data modelling material split into
    many tables in order to gain enhanced performance
    no duplication, updating or insertion anomalies
    etc.
  • Source-oriented data modelling emphasis on
    modelling data as closely as possible to original
    source which is included in its entirety for
    posterity.
  • DW data modelling nearer to source-oriented
    approach in spirit.

5
Traditional databases
  • ERD p117 Harvey and Press

6
Traditional databases
  • Harvey and Press p.129

7
Historical Data
  • This can be difficult to model because
  • It is irregular in structure,
  • It is complex
  • It is erratic in terms of when it occurs
  • Using a relational database can mean data from a
    single source being spit into many tables.

8
Source-oriented data modelling
  • a semantic network tempered by hierarchical
    considerations Thaller 1991, 155.
  • Its flexible nature gives kleiw a rubber band
    data structures facility Denley 1994, 37.
  • The fluid nature of creating a database with
    kleiw marks it out as an organic DBMS.

9
Data Mining
  • The whole field is often referred to as data
    mining, which is also a major component within
    the field.
  • Data mining (DM) is normally used on large
    quantities (terabytes) of data, to find
    meaningful patterns. Neural nets, statistical
    modelling, decision trees are just some AI
    methods used. SQL can be used too. Parallel data
    processing is used with DM.
  • In order to mine data, it must be kept in a
    suitable system - a data warehouse is ideal.

10
Philosophy of data warehousing
  • Data warehousing is an architecture, not a
    technology. There is the architecture, and there
    is the underlying technology, and they are two
    very different things. Unquestionably there is a
    relationship between data warehousing and
    database technology, but they are most certainly
    not the same. Data warehousing requires the
    support of many different kinds of technology.
  • Inmon 2002

11
Background of DWs
  • Business-oriented serve the analytical needs of
    a company. The ordinary DBMS is still needed for
    the day-to-day queries, and also to feed the DW.
  • W.H. Inmon, father of DW. Cabinet effect 1991
  • R. Kimball, expert on dimensional modelling
  • Need for single, integrated source of clean data,
    particularly for multinational etc. companies
  • Supporting technology from e.g. Oracle, Prism
    Solutions, IBM

12
Data Marts
  • Data marts contain DW data but are restricted to
    one department or one business process.
  • The industry is divided about data marts,
  • Inmon recommends building the DW first, then
    siphoning off the data to data marts.
  • Kimball believes you should build several data
    marts first, then integrate them into a DW.

13
Basic components of a Data Warehouse (DW)
  • A DW is subject-oriented, integrated,
    non-volatile time-variant.
  • The major subjects for an insurance company are
    customer, policy, premium and claim. Previously
    data modelled around applications -car, health,
    life and accident.
  • Integration is the most important facet of a DW.
    Previous inconsistencies are ironed out and all
    data unambiguously entered into DW. Many sources
    of data can be placed in DW.

14
Basic components of a Data Warehouse (DW)
  • Non-volatile data in a DW means that it is not
    changed in the way data is in operational
    database data is loaded en masse and isnt
    updated. Obviates need for normalisation.
  • Time- variant DW time horizon 5 10 years,
    operational database 2-3 months. DW snapshots,
    operational database current data, DW always has
    element of time, operational database may or may
    not have. Inmon 2002

15
Basic components of a Data Warehouse (DW)
  • Kimball p7

16
Typical Architecture of a Data Warehouse
17
Meta Data
  • Meta data is extremely important in a DW. It is
    used
  • to log the extraction and loading of data into
    the warehouse
  • in query management to locate the most
    appropriate data source and also to help end
    users to build queries
  • to show how the data has been mapped when
    carrying out data cleansing and transformations
  • To manage all the data in the DW recording
    where data came from, when etc.

18
Basic components of a Data Warehouse (DW)
  • Fact Tables
  • A fact table is the primary table in a
    dimensional model where the numerical performance
    measurements of the business are stored
  • The measurement data resulting from a business
    process is stored in a single data mart
  • Since measurement data is overwhelmingly the
    largest part of any data mart, we avoid
    duplicating it in multiple places around the
    enterprise Kimball 2002

19
Basic components of a Data Warehouse (DW)
  • Dimension tables
  • These contain the textual descriptors of the
    business. Their depth and breadth define the
    usefulness of the DW.
  • Contains data that doesnt change frequently
  • Can have 50-100 attributes.
  • Not usually normalized. (Snowflake and starflake)
  • Coding disparaged (Long term view)

20
Basic components of a Data Warehouse (DW)
  • Star schema Kimball p51

21
Basic components of a Data Warehouse (DW)
  • Kimball p43

22
Basic components of a Data Warehouse (DW)
  • Kimball p39

23
Data Warehousing Tools and Technologies
  • Building a data warehouse is a complex task
    because there is no vendor that provides an
    end-to-end set of tools.
  • Necessitates that a data warehouse is built using
    multiple products from different vendors.
  • Ensuring that these products work well together
    and are fully integrated is a major challenge.

24
Advantages of DWs
  • Flexibility in modelling data.
  • Time dimension country-specific calendars and
    synchronization across multiple time zones.
  • Easy to add external data and summarised data.
  • Built for analysis.
  • Built for huge volumes of data (terabytes of data
    a trillion 1012).
  • Can cope with idiosyncrasies of geographic
    location dimensions within GISs.

25
Possible advantages of DWs
  • Indexing facilities of DW.
  • Publishing the right data data collected from
    a variety of sources and edited for quality and
    consistency.
  • DW seeks to collate all data so a variety of
    different subsets can be analysed whenever
    required.
  • Easy to extend DW and add material from a new
    source.
  • Data cleansing techniques.
  • Tracking facility afforded by meta data

26
Disadvantages of DWs
  • Some humanities data fits into the numerical
    fact topology, some doesnt
  • Technology not easy and is based on having
    existing databases to extract from
  • Regular snapshots not the same but they could
    equate to data sets taken at different periods of
    time (e.g. 1841 census, 1861 census)
  • A lot to learn.

27
Findings Humanities and DWs
  • NAGARA
  • (National Association of Government Archives and
    Records Administrators)
  • Article on DWs by Mary Klauda of the Minnesota
    Historical Society 1999 (archivist)
  • Eastern Connecticut schools DW 2002
  • Bo Wandschneider University of Guelph, Canada
    -DW and the use of census data. ICPSR
    (Inter-university Consortium for Political and
    Social Research)

28
Findings Humanities and DWs
  • University of California DW memo to Humanities
    department
  • Social Science DW Human Resources DW project of
    Human Sciences Research Council, South Africa
  • GEOBASE, Israel. DW of Israels regional
    statistics, supported by National Planning
    Authority in the Ministry of Interior Affairs.

29
Humanities and DWs some issues
  • Scale can cope with really large country /
    state -wide problems.
  • Can analyse e.g. British censuses 1841-1901
    (108).
  • Can put several databases together to produce a
    time run e.g Hearth taxes, window taxes, poll
    taxes, land taxes, poor rates all in one DW.
  • Oracle site licenses.

30
Examples of possible History DWs
31
Examples of possible History DWs
HOLDING DETAILS ----------------------------- Hold
ing ID King Tenant in Chief Manor Lord VILL Etc.
PROPERTY INFORMATION -----------------------------
--- Property Id Property description Property
value Etc
MANOR ---------------------------- ManorId Holding
Id Property Id Original Owner Id Date Manor
Value Tax (Hides) Cottar Population Bordar
Population Villein Population Sokeman
Population Pries Population Number of
Burgesses Number of slaves Etc.
ORIGINAL OWNER ---------------------------- Origin
al Owner ID Etc.
32
Examples of possible History DWs
  • Data from a variety of sources over time hearth
    tax, poor rates, trade directories, census,
    street directories, wills and inventories, GIS
    maps for a city e.g. Winchester.
  • Voting data poll book data and rate book data
    up to 1870 for whole country (note some data
    missing).
  • Port data all data from portbooks for all
    British ports together with yearly trade figures.
  • Street directories for whole country for last 100
    years.
  • Taxation overview different types / areas /
    periods.

33
Examples of possible History DWs
  • 19th C British census data doesnt fit into the
    typical DW model as it doesnt have the numerical
    facts to go into a fact table.
  • However, theres a recent development in DWs
    factless fact tables.
  • There is real scope to be able to model
    historical data using these.

34
Examples of possible History DWs
  • Kimball p247

35
Examples of possible Humanities DWs
  • Language DW could contain databases of
    different languages for comparison, or many
    databases of same languages over larger area.
  • DW of worldwide scholarly community / whole
    culture
  • GIS or archaeological DW by continent etc. rather
    than country.
  • DW of biographies.
  • DW of library catalogues or archives for enhanced
    public access.

36
Ideas for the future?
  • Instead of me and my database - emphasis on
    smallish, individual, national projects,
  • Maybe
  • Our integrated warehouse emphasis on large
    scale, collaborative, international projects?
Write a Comment
User Comments (0)
About PowerShow.com