Data Mining Data Warehousing - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Data Mining Data Warehousing

Description:

Data Mining. Data Warehousing. Ragib Ahsan. Andrew Beard. Ryan Dunlap. Christine LaBarre ... Put data into databases quickly, safely & efficiently ... – PowerPoint PPT presentation

Number of Views:406
Avg rating:5.0/5.0
Slides: 49
Provided by: Universi73
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Data Warehousing


1
Data MiningData Warehousing
  • Ragib Ahsan
  • Andrew Beard
  • Ryan Dunlap
  • Christine LaBarre

2
Introduction
  • Data storage has become
  • More Available
  • Low Cost of Processing Power
  • Low Cost of Storage
  • CHEAP DATA!

3
Traditional Ways of Collecting Data
  • OLTP online transaction processing systems
  • Put data into databases quickly, safely
    efficiently
  • Not as good at delivering meaningful analysis
  • Statistical Analysis
  • Costly, Pen and Paper

4
Data Mining
  • Data Mining is the process of extracting
    information from the company's various databases
    and re-organizing it for purposes other than what
    the databases were originally intended for.
  • Data Mining is the analysis of data with the
    intent to discover gems of hidden information in
    the vast quantity of data that has been captured
    in the normal course of running the business.

5
Advantages over Statistic
6
Insurance Fraud Example
7
The History of Data Mining
  • 1960s
  • Data collection was available in digital form,
    allowing for retrospective data analysis.
  • 1970s
  • Data-processing department was not able to handle
    huge backlogs of requests for data analyses.
    Applications data was hid behind mainframe files
    and databases, and it was periodically recorded
    in tapes for specific information manipulation.

8
The History of Data Mining
  • 1980s
  • Relational Databases arose with Structured Query
    Languages
  • SQL Structured English Query Language
  • Published in 1987
  • allowing for dynamic, on-demand analysis of data.
  • 1990s
  • Explosion in growth of data.
  • Data warehouses were beginning to be used for
    storage of massive amounts of data.
  • Data Mining thus arose as a response to
    challenges faced by the database community
  • Application of statistical analysis to data and
    application of search techniques from Artificial
    Intelligence to these problems.

9
Data Mining Models
  • Two Data Mining Models
  • Verification Model Takes a hypothesis from the
    user and tests the validity of it against the
    data
  • Discovery Model Automatically discovering
    important information hidden in the data. The
    data is sifted in search of frequently occurring
    patterns, trends and generalizations about the
    data without the intervention or guidance from
    the user

10
How Data Mining Works
  • The Seven Steps of Data Mining
  • 1. Define a precise definition of the business
    issue being addressed.
  • 2. Map issues to mining data.
  • 3. Source and preprocess data.
  • 4. Explore and evaluate data.
  • 5. Choose your mining technique.
  • 6. Interpret results
  • 7. Deploy results

11
Data Warehouses
  • Data Warehouse databases popular sources for Data
    Mining
  • Data has been gathered, consolidated, validated,
    and extract/transform/load (ETL)

12
Four Characteristics of Data Warehouses
  • Subject-Orientated organize data according to
    subject instead of application (organize by
    characteristics and not different products)
  • Integrated Need consistency when moving data
    from operational environment to warehouse
  • Time-variant contains place for storing data
    that is 5-10 years old (used for comparisons,
    trends, forecasting)
  • Non-volatile data are not changed or updated
    only loaded and accessed

13
The Big Picture
14
Processes in Data Warehousing
  • Data Warehouses retrieve data from a variety of
    operational databases. The data is transformed
    and delivered to the data warehouse
  • Metadata is data describing data
  • Data extracted from sources at regular intervals
    and pooled centrally but must be cleansed to
    reconcile difference and remove duplication
  • Data Marts are small warehouses (subsets of main
    warehouse) containing summarized information for
    a specific need

15
OLTP vs. Data Warehouses
16
Data Warehouse Model
  • Process of extracting and transforming
    operational data into informational data and
    loading it into data warehouse

17
Structure of Data Inside Warehouse
  • Current Detail reflects most recent happenings
  • Older Detail infrequently accessed
  • Lightly Summarized
  • Highly Summarized

18
Structure of Data Inside Warehouse
  • Example

19
Interpreting the Data
  • Purchasing Trends can be determined through two
    kinds of analysis
  • Association suggests specific course of action
  • Useful contain high quality, actionable
    information
  • Trivial already known by anyone familiar to
    business
  • Inexplicable results seem to have no explanation
    and do not suggest course of action

20
Interpreting the Data
  • Sequencing extends association by adding time
    comparisons between transactions.
  • Requires primary ID that relates transactions
    that occurred at different times

21
Data Mining Applications
  • Retail/Marketing
  • Identify buying patterns from customers
  • Find associations among customer demographic
    characteristics
  • Predict response to mailing campaigns
  • Market basket analysis
  • Banking
  • Detect patterns of fraudulent credit card use
  • Identify loyal' customers
  • Predict customers likely to change their credit
    card affiliation
  • Determine credit card spending by customer groups
  • Find hidden correlations between different
    financial indicators
  • Identify stock trading rules from historical
    market data

22
Data Mining Applications
  • Insurance and Health Care
  • Claims analysis - i.e which medical procedures
    are claimed together
  • Predict which customers will buy new policies
  • Identify behavior patterns of risky customers
  • Identify fraudulent behavior
  • Transportation
  • Determine the distribution schedules among
    outlets
  • Analyze loading patterns
  • Medicine
  • Characterize patient behavior to predict office
    visits
  • Identify successful medical therapies for
    different illnesses

23
Data Mining Problems
  • Limited Information
  • Restriction of connections to available fields in
    database
  • Noise and Missing Values
  • Databases usually contaminated by errors
  • Uncertainty
  • Severity of error present
  • Size, Updates Relevant Fields

24
Privacy Concerns
  • Main issue with Data Mining is the lack of
    privacy for customers
  • More than 95 of the people in the US are listed
    in the computers of companies that mail junk-mail
  • Government Report released in August 2005 lists
    flaws within certain agencies data mining
    systems.

25
Government Privacy Acts
  • Privacy Act 1974 requires that, when collecting
    personal information from individuals, agencies
    should provide those individuals with notice that
    includes the purpose for which the information
    was collected and the potential effect of not
    providing the information.
  • E-Government Act of 2002 requires that federal
    government agencies conduct privacy impact
    assessments before developing information
    technology containing personal information.
  • Federal Information Security Management Act of
    2002 defines federal requirements for securing
    information and information systems that support
    federal agencys it requires agencies to develop
    agency-wide information security programs that
    extend to contractors and other providers of
    federal data and systems

26
Government Privacy Concerns
  • FBI Foreign Terrorist Tracking Task Force
  • Citibank Custom Reporting System
  • IRS Reveal System
  • SBA Loan Lender Monitoring System
  • Risk Management Agency Data Mining Effort

27
Data Warehouse Cost
28
Return on Investment
  • Return on Investment Study Study showed the
    implementation of data warehousing.
  • Consisted of 62 companies from N. America to
    Europe
  • 62 had revenues of 1B or more
  • Data warehouse sizes ranged from megabytes to
    over 1 terabyte
  • All data warehouses were in use for more than 6
    months
  • Usage ranged from 3 to 1300 users

29
Return on Investment
  • Range -1,857 to 16,000.
  • (excluding 8 lower outliers and 9 higher outliers
    the range then becomes 3 to 1,838 ROI)
  • Average 3 yr. ROI (AROI) 401
  • Median 3 yr. ROI (MROI) 167
  • Average Payback (APAY) 2.3 years
  • Median Payback (MPAY) 1.67 years
  • Average Cost 2.3M
  • 14 companies had an ROI of over 1,000

30
Where Data Mining is Going
  • The Surveillance Creep
  • Cameras
  • Store Cards
  • Biometrics
  • Thumb-print transactions

31
Summary
32
THE END
  • Any Questions?

33
Resources
  • http//en.wikipedia.org/wiki/Metadata
  • http//www.interprisesoftware.com/resources/data_w
    arehousing5.jpg
  • http//www.pcc.qub.ac.uk/tec/courses/datamining/st
    u_notes/dm_book_2.html
  • http//www.businessintelligence.com/ex/asp/code.64
    /xe/article.htm
  • http//www.intelligententerprise.com//030405/606fe
    at2_1.jhtml
  • http//www.cris.org.in/images/datawarehouse.gif
  • http//www.dwreview.com/Images/DataWarehouse_Overv
    iew.gif
  • http//www.netnam.vn/unescocourse/knowlegde/41.htm
  • http//www.villagevoice.com/news/0230,baard,36760,
    1.html
  • http//www.gao.gov/new.items/d05866.pdf
  • http//searchsqlserver.techtarget.com/tip/1,289483
    ,sid87_gci1058040,00.html?bucketETA
  • http//www.seattlepress.com/print-9715.html

34
Family Feud
  • A FUN FUN game!!!

35
Instructions
  • Guess what other items you would buy with the
    given product.
  • The store you are shopping in will be listed
    next to the product.
  • Each team will guess 5 items that would likely be
    purchased with the given product.
  • One point will be given for each item that
    matches with our list.
  • The team with the most points at the end of the
    game wins.

36
DVD Player (Best Buy)
  • DVDs
  • CDs
  • TV
  • Stereo system
  • DVD-Rs
  • Component Cables
  • Entertainment center
  • DVD rack
  • Receiver
  • CD/DVD cleaners

37
Laptop (Dell Online)
  • Printer
  • Scanner
  • Carrying case
  • Software
  • External Drive
  • Extra Battery
  • Docking Bay
  • Wireless Accessories
  • CD-Rs
  • Keyboard/Mouse

38
Car (Lexus of Tulsa)
  • Insurance
  • Gas
  • License Plate Covers
  • Stereo system
  • Steering wheel cover
  • Floor Mats
  • Service Plan
  • Warranty Plan
  • Alloy wheels
  • Air freshener

39
Chicken (Albertsons)
  • Rice
  • Pasta
  • Potatoes
  • Chicken Broth
  • Seasoning
  • Onions
  • Soda
  • Wine
  • Cheese
  • Sauce (BBQ/Spaghetti)

40
Textbook (TU Bookstore)
  • Paper
  • Pencil
  • Pen
  • Spiral Notebook
  • Calculator
  • Binder
  • Eraser
  • Highlighter
  • Stapler
  • Backpack

41
T-shirt (Dillards)
  • Pants
  • Shoes
  • Socks
  • Belt
  • Hat
  • Jewelry
  • Iron-on stickers
  • Jacket
  • Washer/Dryer
  • Deodorant

42
Bonus Round Instructions
  • You will be given a description of a customer and
    each team will be asked to come up with 5 items
    they would purchase in the given store and given
    their current situation.
  • Points for the bonus round will be doubled

43
Bonus Round
  • Occupation Student
  • Gender Male
  • Age 18
  • Major MIS
  • Situation He is moving away for home to attend
    TU in 2 weeks and is currently shopping at
    Target.
  • What will he buy?

44
His first 10 items
  • School supplies (paper, pencils, etc.)
  • TV
  • DVD Player
  • Camera
  • Room decorations
  • Toiletries
  • Refrigerator
  • Microwave
  • Towels
  • Alarm Clock

45
Bonus Round 2 Instructions
  • You will be given a list of items a customer has
    just purchased.
  • Based on the items on the list each team will
    make assumptions of what type of person the buyer
    is.
  • Points in this rounds will be worth 2 points.

46
Bonus Round 2
  • These are items on a persons shopping list.
    Give your best descriptions of the person who is
    buying these item.
  • List
  • Stilettos
  • Capri pants
  • Sunglasses
  • Belt
  • Jacket
  • Mood Ring
  • Watch
  • Hair supply
  • Make-up
  • Hand bag

47
Customer Description
  • Occupation Student
  • Gender Female
  • Age 18-21
  • Major Fashion Design
  • Situation She is a full time student at college
    that just got a check from her parents for books.

48
YOU WIN!!!
THANKS FOR PLAYING!!!
Write a Comment
User Comments (0)
About PowerShow.com