Using Shapes of Trends in Active Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Using Shapes of Trends in Active Data Mining

Description:

Mining trends in histories useful. Many applications, including observing ... 'The Quest Data Mining System', by Rakesh Agrawal, Manish Mehta, John Shafer, and ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 34
Provided by: duy3
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Using Shapes of Trends in Active Data Mining


1
Using Shapes of Trends in Active Data Mining
  • Duy Lam
  • Norris Boothe

2
Shape Querying and Active Data Mining
  • Historical time sequences make up a large portion
    of data stored in computers
  • Mining trends in histories useful
  • Many applications, including observing trends in
    stock prices, online bids, and rule mining

3
Overview
  • Overview of SDL
  • SDL language
  • Applications to data mining

4
A (Very) Simple History
5
Shape Definition Language
  • SDL is a shape definition language used to query
    the shapes of histories
  • Small, powerful language that allows blurry
    matching
  • Designed to make it easy and natural to query
  • Easily implementable
  • Little non-determinism

6
Alphabet
  • SDL allows you to specify an alphabet defining
    transitions
  • Example

7
Describing a shape
  • So with this alphabet we can describe a shape
  • Use such a description to query a history to
    produce all subsequences that match the shape

(shape name(parameters) descriptor)
8
(shape spike() (concat Up up down Down))
9
Derived Shapes
  • any
  • allows a shape to have multiple values
  • concat
  • shapes can be concatenated together contiguously

(any up Up)
(concat down up down up)
10
Multiple Occurrence Operators
  • Shapes made of multiple contiguous occurrences of
    the same shape
  • Resulting subsequences are such that they are
    neither preceded nor followed by a subsequence
    that matches P

(exact 5 (any up Up)) (atleast 3 stable) (atmost
2 (concat disappear appear))
11
Bounded Occurrence Operators
  • in
  • permits blurry matching by allowing users to
    state an overall shape without specific details
  • within the specified time period length, we can
    have a specified number of occurrences of a
    shape
  • can have arbitrary gaps and can have overlap

(in 7 (nomore 5 up))
(precisely n P)(noless n P)(nomore n P)
12
Bounded Occurrence Operators
  • inorder
  • specifies shapes that must appear in a specific
    order

(inorder P1 P2 ... Pn)
13
Shape Definition Examples
(in 5 (and (noless 2 (any up Up))
(nomore 1 (any down Down))))
14
Shape Definition Examples
(in 7 (inorder (atleast 2 (any up Up)) (in 4
(noless 3 (any down Down))))))
15
Parameterized Shapes
  • Can parameterize shape definitions instead of
    using concrete values

(shape spike(upcnt dncnt) (concat (exact upcnt
(any up Up)) (exact dncnt (any down
Down)))) (shape doublepeak(width ht1 ht2) (in
width (inorder spike(ht1 ht1) spike(ht2
ht2))))
16
Advantages of SDL
  • natural and powerful language for expressing
    shape queries
  • capability of blurry matching
  • reduction of output clutter
  • efficient implementation

17
SDLs Expressive Power
  • SDL is equivalent to regular expressions for
    regular matching
  • several features enchance its effectivesness,
    however
  • greedy matching and lookahead capabilities help
    reduce output clutter

18
SDLs Expressive Power
  • blurry matching enables a much more natural and
    compact specification of certain shapes
  • For example, if we wanted precisely one
    occurrence of each ai in any order
  • in SDL
  • regular expressions requires at least exponential
    size to specify!

(and (precisely 1 a1) (precisely 1
a2) ... (precisely 1 an))
19
SDL Summary
  • SDL is a small, powerful language for naturally
    and intuitively expressing shapes found in
    histories
  • Equivalent in power to regular expressions, but
    much more effective
  • Permits blurry matching

20
Using SDL inActive Data Mining
21
Static Data Mining
  • Discovery of rules for
  • Associations
  • Sequences
  • Classification
  • Entire data set is mined
  • Inherent weakness Rules are not static

22
Active Data Mining
  • Partition into time periods
  • Run data mining algorithm on each period
  • Gather rules into a rulebase
  • Create triggers to discover
  • Trends in rules
  • Associations between rules

23
Active Data Mining Process
Large Data Base
Period 3 Rules
Period 1 Rules
Period 2 Rules
24
Active Data Mining Process (cont).
Selected Rules
Shape Definition Language
Trigger Definition Language
Active Data Mining
25
Active Data Mining Components
  • Shape definitions (SDL)
  • (shape name(parameters) descriptor)
  • Ex
  • (shape spike(upcnt dncnt)
  • (concat (atleast upcnt (any up Up))
  • (atleast dncnt (any down Down))))
  • Queries
  • Triggers

26
Queries
  • For rule selection
  • Syntax
  • (query (shape (history-name start-time
    end-time)))
  • start and end specify the end points of
    history
  • Result rules that match the desired shape
  • Ex(shape ramp() (concat Up Up))
  • (query (ramp() (confidence start end)))

27
Larger Query Example
  • (shape upramp(len cnt)
  • (in len (noless cnt (any up Up))))
  • (shape dnramp(len cnt)
  • (in len (noless cnt (any down Down))))
  • (query (and
  • (upramp(5 3) (support start 10))
  • (dnramp(5 3) (confidence start 10))))

Results rules where support is increasing but
confidence is decreasing
28
Triggers
  • Datastream type functionality
  • ECA (Event Condition Action) model used
    (Chakravarthy et al. 1989)
  • Syntax
  • (trigger trigger-name
  • (events events-spec)
  • (condition (shape history-spec))
  • (actions action-spec))
  • Events
  • Rule creation
  • History updates

29
Wave Execution Semantics
  • Stratified execution of triggers similar to
    Datalog

Set of Events
Triggers for those Events
Queries for those Triggers
Set of Actions/ Events
30
Trigger Example
  • Identifying rules where support is increasing,
    but confidence is decreasing
  • (trigger detect_up
  • (events updatehistory)
  • (condition (upramp 5 4) (support (- end 5)
    end)))
  • (actions upward))
  • (trigger detect_dn
  • (events upward)
  • (condition (dnramp 5 4) (confidence (- end 5)
    end)))
  • (actions notify))

31
Implementation
  • Implemented on AIX system
  • Part of IBMs Quest project
  • Successfully tested
  • Large set (5 years) of mail order data (2.9
    million records)
  • Large set (3 years) of POS (point-of-sale)
    transactions (6.8 million records)

32
Future Work
  • At time of paper
  • Integrate constructs into a SQL relational system
  • Improve incremental computations using partial
    results of current trigger queries
  • Since then
  • Integrated into the Quest Data Mining System
  • Subsumed into IBMs data mining products,
    including Intelligent Miner
  • Referenced for work in Active Data Mining and
    blurry pattern matching

33
References
  • Querying Shapes of Histories, by Rakesh
    Agrawal, Giuseppe Psaila, Edward L. Wimmers, and
    Mohamed Zait of the IBM Almden Research Center,
    1995
  • Active Data Mining, by Rakesh Agrawal and
    Giuseppe Psaila of the IBM Almden Research
    Center, 1995
  • The Quest Data Mining System, by Rakesh
    Agrawal, Manish Mehta, John Shafer, and
    Ramakrishnan Srikant of the IBM Almden Research
    Center in coordination with Andreas Arning and
    Toni Bollinger of the IBM German Software
    Laboratory, 1996
  • IBM Almden Research Center Website
    http//www.almaden.ibm.com/software/quest/
Write a Comment
User Comments (0)
About PowerShow.com