By: Mihir Mehta - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

By: Mihir Mehta

Description:

Data is the raw material stored in a structured manner that, given context, ... create. made ' ... A data warehouse by itself will respond to queries from users: ... – PowerPoint PPT presentation

Number of Views:497
Avg rating:3.0/5.0
Slides: 41
Provided by: humb
Category:
Tags: mehta | mihir

less

Transcript and Presenter's Notes

Title: By: Mihir Mehta


1
KM
IRM
IT
Data
Mining
By Mihir Mehta
Date 11/11/03
2
Menu
Introduction Terminology Learn about Data
Warehouse Learn about Data Mining
3
Introduction
  • Key topic of this presentation
  • Data Warehouse
  • OLTP and OLAP
  • Data Mining

Relation between Data Mining IRM, KM and IT
Main focus on Data Warehouse and Data Mining
4
What is Data?
  • The data that is an individual fact or multiple
    facts, or a value, or a set of values, but is not
    significant to a business in and of itself. Data
    is the raw material stored in a structured manner
    that, given context, turns into information.

http//www.kmtool.net/vocabulary.htm
5
Data Warehouse
  • a logical collection of information gathered
    from many different operational databases used
    to create business intelligence that supports
    business analysis activities and decision-making
    tasks.

Burlton www.datawarehouse.dci.com
6
What can Data Warehouse Do?
  • A data warehouse is an attempt to integrate
    separate decision support system so that users
    can query one place to find the answers to their
    questions
  • A data warehouse has the key, corporate data in
    the organization
  • A data warehouse tracks historical data

Burlton www.datawarehouse.dci.com
7
Data Warehouse Architecture
  • A Data Warehouse is a repository store for
    information that will allow a company to make
    business decisions based on facts.  Many business
    decisions are made on historical business
    knowledge and intuition, which in this day and
    age is not enough to stay ahead of your
    competitors.

Data Extraction from Sites across the World
Business QueryReports to aid Information
Led Decision Making
Automatic Populationof Data to
Spreadsheets, World or E-mail
Decision Making Identify New Business
Opportunities
Graphical Representation of Data
Dimensional View of Data To provide Drill Down
and Analysis of Data
8
Data Warehouse Knowledge Management Cycle
Explicit
Tacit
takes
create
made
New Tacit
It doesnt change tacit knowledge into explicit
rather it takes explicit knowledge and helps
create new tacit knowledge. - Burlton
Burlton www.datawarehouse.dci.com
9
Data Warehouse A Success Story
  • Largest data warehouse is Wal-Mart
  • Wal-Mart used data warehouse
  • Identifies where a new store should be built
    based on customer demand
  • Identifies how stores are performing across the
    nation
  • Contains every scan from every purchase
  • Benefits Wal-Mart gained from their data
    warehouse
  • Provided competitive advantage over K-Mart
  • Reduced excess inventory in individual stores
  • Avoided wasted funds in building stores which
    would fail

www.walmart.com
10
Why do we need Data Warehouse?
  • Decisions can be made quickly and correctly.
  • Data warehouse is also a place to store and
    access historical data.
  • Users measure performance goals for their company
    over a period of time.
  • Company statistics are available
  • Single query can be used to access key data
  • A data warehouse by itself will respond to
    queries from users
  • It will not tell users about patterns in data
    that users may not have thought about.
  • To find patterns in data, data mining is used to
    try and mine key information from a data
    warehouse.
  • Data warehouses provide a single place to store
    key corporate data
  • The idea is that users can go one place to find
    this key data using an enterprise information
    system (EIS)

Alex Berson, Stephen J. Smith
11
Enterprise Information System
  • An EIS (Enterprise Information System) allows
    users to query data in a data warehouse.
  • Its a tools predate reports and managed query
    tools

http//www.iec.org/online/tutorials/bus_int
12
Security in Data Warehouse
  • Building a data warehouse does increase security
    risk because key, corporate information is all in
    one place.
  • Database system components can be used to protect
    the data warehouse. They are
  • Views
  • Access Control
  • Security Administration
  • Encryption
  • Audit

Alex Berson, Stephen J. Smith
13
Introduction - Data Mining
14
Introduction - Data Mining
  • Data Mining is done by running software that
    examines a database and looks for patterns in the
    data.
  • Data mining is a powerful analytical tool that
    enables business executives to advance from
    describing historical customer behavior to
    predicting the future. It finds patterns that
    unlocks the mysteries of customer behavior.

http//www.teradata.com
15
Data Mining
  • Data Mining Solves complex business problems
  • increase revenue
  • reduce expanses
  • identify business opportunities
  • Gain competitive advantages.

Alex Berson, Stephen J. Smith
16
Data Mining Benefits
  • Fraud detection in banking and telecommunication
  • Marketing such as stock market
  • Science Data analysis involving cataloging object
    of interest in large data sets. (for instance
    finding atmospheric events in remote sensing
    data, volcanoes on Venus)
  • Problem diagnosis in manufacturing

Reference book in Library - Computer Information
System pg 496-499
17
Data Mining Benefits
  • Data mining allows companies to collect
    information and make them more productive and
    beat their competition.
  • Data mining helps identify
  • why customers buy certain products
  • ideas for very direct marketing
  • ideas for shelf placement
  • training of employees vs. employee retention
  • employee benefits vs. employee retention

Reference book in Library - Computer Information
System pg 496-499
18
Implementing Data Mining
  • Apply data mining tools to run data mining
    algorithms against data.
  • There are two approaches
  • Copy data from the Data Warehouse and mine it
  • Mine the data in the Data Warehouse
  • Popular tools use a variety of different data
    mining algorithms
  • association rules
  • genetic algorithms
  • decision trees
  • neural networks

Alex Berson, Stephen J. Smith
19
Data Mining Using Separate Data
  • You can move data from the data warehouse to data
    mining tools
  • Advantages
  • Data mining tools may organize data so they can
    run faster
  • Disadvantages
  • Could be very expensive to move large amounts of
    data

Copy of datamade by the Data Mining Tool
Data Warehouse
Data Mining Tool
Reference book in Library - Computer Information
System pg 496-499
20
Data Mining Against the Data Warehouse
  • Data mining tools can access data directly in the
    Data Warehouse
  • Advantage
  • No copy of data is needed for data mining
  • Disadvantage
  • Data may not be organized in a way that is
    efficient for the tool

Data Warehouse
Data Mining Tool
Reference book in Library - Computer Information
System pg 496-499
21
How Datas transfer to Knowledge
Data Mining
Transformation and Reduction
Graph
Preprocessing cleaning
Selection Sampling
Evaluation
InputData
Pre-processedData
Data
Target data
Database Warehouse
22
How Datas transfer to Knowledge continues
  • Selection selecting or segmenting the data
    according to some criteria.
  • Preprocessing this is the data cleansing stage
    where certain information is removed which is
    deemed unnecessary and may slow down queries.
  • Transformation The data is made useable and
    navigable.
  • Data mining this stage is concerned with the
    extraction of pattern from the data.
  • Interpretation and Evaluation the system are
    interpreted into knowledge which can then be used
    to support human decision-making.

23
Data Mining - Information Process
  • OLTP (Online-Transaction Processing) - the
    processing of transaction information.
  • OLAP (Online-Analytical Processing)
    manipulation of information to support decision
    making.

24
What is OLTP
Client / Server
OLTP
Web-based
Mainframe
25
OLTP and DSS Defining
  • An application that updates is called an on-line
    transaction processing (OLTP) application
  • An application that issues queries to the
    read-only database is called a decision support
    system (DSS)

OLTP Application
DSS Application
26
OLTP vs. OLAP
  • Online-Transaction Processing
  • Day to Day Operations
  • Application-Oriented
  • Data Current, up to date detailed
  • Database Size 100 MB- GB
  • Online-Analytical Processing
  • Decision support
  • Subject-oriented
  • Historical, Multidimensional summarized
  • Database Size 100GB-TB

Principle of Knowledge Discovery in Database
27
OLTP vs. OLAP (example)
McGrow-Hill company, Inc
28
Data Mining Algorithm
Data mining algorithms consists three parts
  • Model the purpose of the algorithm is to fit a
    model to the data.
  • Preference use to fit one model over another
  • Search search data

Margaret H. Dunham
29
What can Data Mining Do?
Basic Data Mining Tasks
  • Classification maps data into predefined groups
    or classes.
  • Regression is used map a data item to a real
    valued prediction variable. Assumes that the
    target data fit into some known type of function
    (eg. Linear, logistic) and determine best
    function.
  • Time Series Analysis The value of an attribute
    is examined as it varies over time. Values are
    usually are obtained as evenly spaced time points
    (daily, weekly, hourly)
  • Prediction prediction is predicting a future
    state rather than a current state.

Margaret H. Dunham
30
What can Data Mining Do?
Basic Data Mining Tasks
  • Clustering similar to classification except that
    the groups are not predefined, but rather defined
    by the data alone. The most similar data are
    grouped into clusters.
  • Summarization Maps data into subsets with
    associated simple description.
  • Association Rules link analysis, alternatively
    referred to as affinity analysis or association,
    refers to the data mining task of uncovering
    relationships among data.
  • Sequence Discovery is used to determined
    sequential patterns is data.

Margaret H. Dunham
31
Different Styles - Data Mining
Two styles of Data Mining
  • Directed Data Mining is a top-down approach,
    used when we know what we are looking for. This
    often takes the form of predictive modeling,
    where we know exactly what we want to predict.
  • Ex Classification, Estimation, and Prediction.
  • Undirected Data Mining is a bottom-up approach
    that lets the data speak for itself.
  • Ex Clustering, Summarization, Sequence
    Discovery

Mastering Data Mining - Michael J. A. Berry
32
Data Mining Transfer Data into IRM, KM, and IT
33
How data transfer to IRM, IT, KM
34
If it is still unclear about IRM, KM and IT then
here is the easier
35
Overview - Business View Diagram
  • A wide-range of data sources and turning it into
    knowledge - can be used to make better business
    decisions.
  • The data warehouse/data mart is a repository for
    data that has been extracted from one or more
    sources, cleansed and transformed into a format
    suitable for analysis.

www.xwave.com/industries/telecom/solutions/images/
bi_diagram_jpg
36
Data Mining
Important Considerations
  • Do you need a data warehouse?
  • Do all your employees need an entire data
    warehouse?
  • How up-to-date must the information be?
  • What data mining tools do you need?

37
Conclusion
  • I talked about how Data Mining can be used to
    pull information into knowledge.
  • Data mining is not a one-step procedure. Data
    mining is also not the end procedure in
    decision-making processes. It is only a part of
    the decision-making support system. The
    decision-making system basically includes data
    warehousing, data mining, and online analysis
    processing, and so on.
  • Data Mining is the natural evolution of query and
    reporting tools. Everyone who creates queries and
    reports, benefits from having data mining
    capabilities.
  • The data mining process be able to discover
    information that are completely hidden.

38
Data Mining - Software
  • AIM Learning offers fast data mining tools based
    on genetic programming and simulated annealing.
  • Acknosoft developers of KATE-tools for induction
    and CBR, and other tools for decision support and
    data mining.
  • http//www.salford-systems.com
  • Demonstration about Data Mining
  • https//www.statsoft.com/dm2.html

39
The End
40
Question ???
Write a Comment
User Comments (0)
About PowerShow.com