Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data - PowerPoint PPT Presentation

Loading...

PPT – Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data PowerPoint presentation | free to download - id: 4692aa-MGQ3Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data

Description:

Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data SIGKDD Explorations. Copyrightc 2000 ACM SIGKDD, Jan 2000 – PowerPoint PPT presentation

Number of Views:1147
Avg rating:3.0/5.0
Slides: 34
Provided by: Supe2
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data


1
Web Usage Mining Discovery and Applications of
UsagePatterns from Web Data
SIGKDD Explorations. Copyrightc 2000 ACM SIGKDD,
Jan 2000
  • ?????????
  • ?????? m964020015
  • ??? m964020034
  • ??? m964020041

2
Outline
  • Introduction
  • Problem statement
  • Detailed data mining
  • Conclusions and critics

3
Introduction
  • Web mining Data mining efforts associated with
    the Web
  • content, usage, and structure
  • To discover usage patterns from Web data, in
    order to understand and better serve the needs of
    Web-based applications.

4
Classes
  • Web Content Mining mining the data on the Web
    (text, image, audio, video, metadata and
    hyperlinks)
  • Web Structure Mining mining the Web structure
    data
  • Web Usage Mining mining the Web log data
    (preprocessing, pattern discovery, and pattern
    analysis )

5
Data type
  • Content The real data in the Web pages (text and
    graphics)
  • Structure describes the organization of the
    content. (as a tree structure)
  • Usage describes the pattern of usage of Web
    pages (IP addresses, page references, and the
    date and time of accesses.)
  • User Profile provides demographic information
    about users of the Web site. (registration data
    and customer profile information)

6
Data source (Web traffic )
  • Server Level Collection
  • log files, Packet sniffing, Cookies, Query data
    and CGI script
  • Client Level Collection
  • Javascript, Java applet, and the modified browser
  • Proxy Level Collection
  • Proxy caching

7
Data abstractions
  • user
  • a single individual that is accessing file from
    one or more Web servers through a browser
  • page view
  • consists of every file that contributes to the
    display on a user's browser at one time
  • click-stream
  • a sequential series of page view requests
  • user session
  • the click-stream of page views for a singe user
    across the entire Web
  • server session (visit)
  • The set of page-views in a user session for a
    particular Web site
  • episode
  • Any semantically meaningful subset of a user or
    server session

8
Web usage mining phases
9
Preprocessing
  • Usage Preprocessing
  • Content Preprocessing
  • Structure Preprocessing

10
Preprocessing
  • Usage Preprocessing
  • ????????
  • ????????
  • ??IP??,??Server Session
  • ???????Proxy??????
  • ??IP??,??Server Session
  • ??ISP?????Session?,????IP???
  • ??IP??,??User
  • ????????????????
  • ??Agent,??User
  • ????????????????????????????,?????????????

11
Preprocessing
  • Content Preprocessing
  • ???????script???????????????????????????
  • ???????????(classification)???(clustering)????????

12
Preprocessing
  • Structure Preprocessing
  • ???????page view???????
  • ??????????????????
  • ???????????page view?????

13
Pattern Discovery
  • Statistical Analysis
  • Association Rules
  • Clustering
  • Classification
  • Sequential Patterns
  • Dependency Modeling

14
Statistical Analysis
  • ????????
  • ?? session file ???,??? page view???????????????,?
    ?????????????????????

15
Association Rules
  • ??????? server session ?,??????????????
  • ??????????????,??????????

16
Clustering
  • ???????????????
  • usage cluster
  • ????????????????
  • ??????????,?????????????
  • ????????
  • page cluster
  • ?????????????
  • ?????????????

17
Classification
  • ??????????????????
  • ???????????????????????
  • i.e. ?/Procduct/Music ??????,? 30?????18-25????,?
    ?????

18
Sequential Patterns
  • ?? session ????,??????????????,?????????
  • ?????????????
  • ??????????????????

19
Dependency Modeling
  • ?????,??Web domain????????????
  • ???????????????,??????????????
  • ????????????????
  • ???????????

20
Pattern Analysis
  • ???????????
  • ?????????????,??????????
  • SQL?OLAP??????

21
????
  • ?????????2002?9?10???????log?
  • IP
  • ???????
  • ??????????????????
  • ??????
  • ????????
  • ??????????

22
????
23
????
24
????
  • ???(??)???(??)?????
  • ??????
  • ???????54gt37?(Nielsen/NetRatings)

25
????-based on Apriori
26
?????
27
?????
  • (A001,R008)?B002????????????(A001)????????????(R00
    8)?????????????????(B002),????????????????????????
    ????????????????,??????????
  • ???????????????????,???????????????????????
  • ?????????????????,????????

28
???
  • ???????????,??????,?????1,???0
  • ???? Maximum levels ? Minimum support ????????????

29
?????
  • ??????????log?????
  • ??????????????????????,????????
  • ?????????????????

30
?????
  • ?????????35?,??????????????????????????????,??????
    ??????????35????,???????????????

31
?????
  • ??????????,??????D018(???)??????????D035(????????)

32
?????
  • ????,?????????(B002)??????(D479)??????????(D306)??
    ???,?????????(D221)?????

33
Conclusions
  • ??????????,??????????,????????????????
  • ?????????????
  • ????????,?????????????????
  • ??Web Mining ?????,????????????????,???????,?????,
    ?????
About PowerShow.com