Mining Sequential Patterns - PowerPoint PPT Presentation

About This Presentation
Title:

Mining Sequential Patterns

Description:

Title: Mining Sequential Patterns Author: Mika Klemettinen Description: Nokia Standard Presentation Template - A4 v. 4 2000/01/05 Eric Beasley Fixed RGB values for ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 18
Provided by: MikaKlem3
Category:

less

Transcript and Presenter's Notes

Title: Mining Sequential Patterns


1
Course on Data Mining (581550-4) Seminar
Meetings
Ass. Rules
Clustering
P
P
Episodes
KDD Process
P
M
Text Mining
Home Exam
M
2
Course on Data Mining (581550-4) Seminar
Meetings
Today 16.11.2001
  • R. Feldman, M. Fresko, H. Hirsh, et.al.
    "Knowledge Management A Text Mining Approach",
    Proc of the 2nd Int'l Conf. on Practical Aspects
    of Knowledge Management (PAKM98), 1998
  • B. Lent, R. Agrawal, R. Srikant "Discovering
    Trends in Text Databases", Proc. of the 3rd Int'l
    Conference on Knowledge Discovery in Databases
    and Data Mining, 1997.

3
Course on Data Mining (581550-4) Seminar
Meetings
Good to Read as Background
  • Both papers refer to the Agrawal and Srikant
    paper we had last week
  • Rakesh Agrawal and Ramakrishnan Srikant Mining
    Sequential Patterns. Int'l Conference on Data
    Engineering, 1995.

4
Knowledge Management A Text Mining Approach
  • R. Feldman, M. Fresko, H. Hirsh, et.al
  • Bar-Ilan University and Instict Software, ISRAEL
    Rutgers University, USA LIA-EPFL, Switzerland
  • Published in PAKM'98 (Int'l Conf. on Practical
    Aspects of Knowledge Management)
  • Data Mining course Autumn 2001/University of
    Helsinki
  • Summary by Mika Klemettinen

5
KM A Text Mining Approach
  • Basic idea (see selected phases on the next
    slides)
  • 1. Get input data in SGML (or XML) format
  • Select only the contents of desired elements!
    (title, abstract, etc.)
  • 2. Do linguistic preprocessing
  • 2.1 Term extraction (use linguistic software
    for this)
  • 2.2 Term generation (combine adjacent terms to
    morpho- syntactic patterns like "noun-noun",
    "adj.-noun", etc. by calculating association
    coefficients)
  • 2.3 Term filtering (select only the top M most
    frequent ones)
  • 3. Create taxonomies (there is a tool for this)
  • 4. Generate associations (you may constrain the
    creation)
  • 5. Visualize/explore the results

6
2.1 Term Extraction
7
3 Taxonomy Construction
8
4 Association Rule Generation
9
4 Association Rule Generation
10
5.1 Visualization/Exploration
11
5.2 Visualization/Exploration
12
Discovering Trends in Text Databases
  • Brian Lent, Rakesh Agrawal and Ramakrishnan
    Srikant
  • IBM Almaden Research Center, USA
  • Published in KDD'97
  • Data Mining course Autumn 2001/University of
    Helsinki
  • Summary by Mika Klemettinen

13
Discovering Trends in Text Databases
  • Basic ideas
  • Identify frequent phrases using sequential
    patterns mining (see the slides summaries from
    the Agrawal et. al paper "Mining Sequential
    Patterns" (MSP))
  • Generate histories of phrases
  • Find phrases that satisfy a specified trend
  • Definitions
  • Phrase phrase p is ? (w1)(w2) (wn ) ?, where w
    is a word
  • 1-phrase ? ?(IBM)? ?(data)(mining)? ?
  • 2-phrase ? ?(IBM)? ?(data)(mining)? ? ?
    ?(Anderson) (Consulting) ? ?(decision)(support)?
    ?
  • Itemset, sequence, is contained, etc. as in MSP
    paper

14
Discovering Trends in Text Databases
  • Gaps Minimum and maximum gaps between adjacent
    words identify relations of words/phrases inside
    sentences/paragraphs, between words/phrases in
    different paragraphs, between words/phrases in
    different sections, etc.
  • Sentence boundary 1000
  • Paragraph boundary 100.000
  • Section boundary 10.000.000
  • Phases
  • Partition data/documents based on their time
    stamps, create phrases for each partition (Lent
    al. have patent data documents)
  • Select the frequent phrases and save their
    frequences
  • Define shape queries using SDL (Shape Definition
    Language)

15
Discovering Trends in Text Databases
16
Discovering Trends in Text Databases
17
Discovering Trends in Text Databases
Write a Comment
User Comments (0)
About PowerShow.com