Analytical Data Mining for Stream Data Analysis - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Analytical Data Mining for Stream Data Analysis

Description:

closed data cubing, computing cuboids cells consisting of only closed cells (on ... cubing tradeoff between size complexity and efficient computation ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 21
Provided by: alfaDi
Category:

less

Transcript and Presenter's Notes

Title: Analytical Data Mining for Stream Data Analysis


1
Department of Informatics, University of Minho
Braga, 22 de February de 2006
Analytical Data Mining for Stream Data Analysis
Ronnie Alves Orlando Belo ronnie,obelo_at_di.umin
ho.pt http//alfa.di.uminho.pt/ronnie
Department of Informatics University of
Minho PORTUGAL
2
outlines
  • motivation
  • analytical data mining
  • cube -gt lattice of cuboids
  • main issues
  • first efforts (on 2005)
  • current work
  • final discussion
  • research agenda

3
motivation
  • emerging applications
  • such as sensor networks, telecommunications, web,
    power supply, stock exchange, have to handle
    various data streams
  • data streams characteristics
  • continuous, ordered, changing, fast, huge amount

4
motivation
  • most stream data are at pretty low-level or
    multi-dimensional in nature, needs multi-level
    and multi-dimensional processing
  • analysts want to see changes, trends, unusual
    patterns, at reasonable levels of details

5
motivation
  • stream data analysis
  • query approximations
  • data mining
  • on-line analytical processing (cube-based)
  • keywords
  • multi-dimensional, trends, unusual patterns

6
analytical data mining
  • analytical data mining, combine ideas of
    cube-based algorithms with data mining functions
  • we want to provide a set of analytical data
    mining methods to reveal exceptional and trend
    patterns over data streams
  • cubing while mining or mining while cubing

7
cube -gt lattice of cuboids
all
0-D(apex) cuboid
time
item
location
supplier
1-D cuboids
time,location
item,location
location,supplier
2-D cuboids
time,supplier
item,supplier
time,location,supplier
3-D cuboids
item,location,supplier
time,item,supplier
4-D(base) cuboid
8
main issues
  • (issue 1) given such characteristics of stream
    data, is it feasible to compute such data cube,
    since its size is usually much bigger than the
    original data set, and its construction may take
    multiple database scans? curse of
    dimensionality
  • Online Analytical Processing Stream Data Is It
    Feasible? (DMKD02)
  • (issue 2) how to detect abnormal changes of
    cuboids cells, since on-line mining of the
    changes is one of the core issues is stream data
    analysis?

9
main issues
  • compared to the history
  • (issue 3) what are the distinct features of the
    current status?
  • (issue 4) what are the relatively factors over
    time?
  • on-line mining of changes(SIGMOD03)

10
first efforts (on 2005)
  • itemset mining is a core problem in many data
    mining tasks
  • it can be used as a building block for more
    complex data mining process and also for
    computing data cubes

11
first efforts (on 2005)
  • pattern-growth SQL-extensions (one dimension)
    (EPIA05)
  • inter-transactional rules (two dimensions
    distance measures) (JISBD05)
  • cube-based mining method for multi-dimensional
    associations rules (n-dimensions, incremental and
    multi-level) (miuda project)
  • industrial projects on telecommunications and
    retail (real testbed)

12
current work
  • iceberg data cubing, computing only cuboids cells
    above minimum support threshold (curse of
    dimensionality remains)
  • closed data cubing, computing cuboids cells
    consisting of only closed cells (on dense
    databases, cuboids will be too large)
  • maximal data cubing, computing cuboids cells
    consisting of only maximal cells (pure maximal,
    loose aggregates info)

13
current work
  • real data applications have dense and correlated
    databases
  • can we develop an algorithm which captures
    maximal correlated cuboids cells on dense
    databases?
  • we propose m3c-cubing

14
current work
  • Let the measure be count, the iceberg be count
    2 and the correlated value 3CV 0.85. Then c1
    (a1,b1,c1, 3) and c2 (a1,,, 4) are
    closed cells c1 is a maximal cell c3
    (a1,b1,, 3) and c4 (,b1,c1, 3) are
    covered by c1 but c4 has a correlated exception
    (3CV1) c4 (a2,b2,c2,d4 1) does not satisfy
    the iceberg constraint. Therefore, c1 and c4 are
    maximal correlated cuboids cells

c1 (a1,b1,c1, 3)
c4 (,b1,c1, 3)
15
current work
  • we provide a interesting measure which disclose
    true correlation (also dependence) relationship
    among cuboids cells (inspired by all_confidence)
  • the computation of cuboids is guided by a m3cTree
    (based on SetEnumemeration tree)
  • the m3cTree is traversed by using a pure
    depth-first order

16
current work
  • several pruning strategies are proposed for
    reducing the search space
  • data cube computing is optimized by performing
    shared partitions and caching intermediate
    aggregations

17
final discussion
  • cubing tradeoff between size complexity and
    efficient computation
  • high performance data cube computing is critical
    to analytical data mining

18
final discussion
  • the challenge could be how to share computation
    and explore optimization
  • further studies must to deal with statistical
    aspects, proper constraints, data mining and data
    cubing functions, tilted time window frame

19
research agenda
1st quarter
2nd quarter
3rd quarter
4th quarter
1
2
2005 2006 2007 2008
2
3
4
5
5
6
6
  • activities
  • area background
  • cube-based mining
  • exceptional patterns
  • on-line changes
  • analytical data mining
  • thesis writing

past
future
20
Department of Informatics, University of Minho
Braga, 22 de February de 2006
Analytical Data Mining for Stream Data Analysis
Ronnie Alves Orlando Belo ronnie,obelo_at_di.umin
ho.pt http//alfa.di.uminho.pt/ronnie
Department of Informatics University of
Minho PORTUGAL
Write a Comment
User Comments (0)
About PowerShow.com