Discovering%20Web%20Access%20Patterns%20and%20Trends%20by%20Applying%20OLAP%20and%20Data%20Mining%20Technology%20on%20Web%20logs - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Discovering%20Web%20Access%20Patterns%20and%20Trends%20by%20Applying%20OLAP%20and%20Data%20Mining%20Technology%20on%20Web%20logs

Description:

... name of the request / user name / date and time of the request / the method of ... Time and Date. evaluate user interest by time spent. Domain name ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 19
Provided by: del95
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Discovering%20Web%20Access%20Patterns%20and%20Trends%20by%20Applying%20OLAP%20and%20Data%20Mining%20Technology%20on%20Web%20logs


1
Discovering Web Access Patterns and Trends by
Applying OLAP and Data Mining Technology on Web
logs
Data Engineering Lab ? ? ?
2
Abstract
  • Web server log files analysis
  • server performance improvement
  • system performance improvement
  • customer targeting in electronic commerce
  • problem and difficulty
  • large raw log data processing is not easy
  • data reduce
  • size and time

3
  • current weglogminer
  • slow, inflexible, difficult to maintain
  • only frequency count ? not enough
  • WebLogMiner
  • Virtual University/data mining ?WeblogMiner
  • OLAP and data mining technique
  • multi-dimensional data cube
  • scalability, interactivity, variety, flexibility

4
Design of a Web log Miner
  • Web log server log file information
  • domain name of the request / user name / date and
    time of the request / the method of the
    request(GET, POST) / the name of the file
    requested / the result of the request(success,
    failure, error, etc) / size of the data sent back
    / the URL of the referring page / identification
    of the client agent
  • Example
  • 210.114.3.64 - - 01/Jul/1998173405 0900
  • "GET/yjsung/sign.htmlHTTP/1.1" 200 740
  • 210.114.3.64 -- 01/Jul/1998173844-0900
  • "POST/cgi-bin/yjsung/signHTTP/1.1" 200 352
  • ? POST ????? ??? ??? ??? ?? ? ?
  • GET ?????? ??? ?? ?

5
  • Cache information
  • frequent backtracking and reload deficient
    design
  • client site log
  • Access count
  • not always the measure of interestingness
  • ?? document? access?? ?? ??? ????? ???
  • Time and Date
  • evaluate user interest by time spent
  • Domain name
  • Sequence of requests can predict next request ?
    improve traffic

6
WebLogMiner 4 Stages
  • ?.Filtering the data, creating relational DB
  • 2. Data cube construction
  • 3. OLAP is used
  • 4. Data mining technique are used

7
  • 1.DATABASE CONSTRUCTION FROM SERVER LOG FILES
  • Data Cleansing and Transformation
  • filter out page graphics(sound and video) but ??
  • two types
  • without knowledge about site
  • (time ?day, month, year???? transformation? ??
    ?? ?? ??)
  • with knowledge about site
  • associating server request to intended action
    needs site structure
  • relation database
  • cleaned data and new implicit data is added

8
  • 2.MULTI-DIMENSIONAL WEB LOG DATA
  • CUBE CONSTRUCTION AND MANIPULATION
  • Data Cube
  • group by operator in SQL is used to compute
    aggregates on a set of attributes
  • sum of sales by P, C for each product, give
    a breakdown on how much of it was sold to each
    customer
  • CUBE is the n-dimensional generalization of
    group-by
  • gives remarkable flexibility to manipulate and
    view the data
  • allow OLAP operation such as drill-down, roll-up,
    slice and dice

9
  • Attributes
  • - URL
  • - domain name
  • - size of resource,
  • - time
  • .
  • .
  • .

10
  • 3.DATA MINING ON WEB LOG DATA CUBE
  • AND WEB LOG DATABASE
  • Data Characterization
  • find rule that summarize user defined data set
  • ? the traffic on a web server for a given type of
    media in a particular time of day
  • Class comparison
  • discover discriminant rules
  • ? compare requests from two different web
    browsers
  • Association
  • discover the patterns that access to different
    resources consistently occurring together
  • Prediction
  • ? access to a new resource on a given day can be
    prediected based on accesses to similar old
    resources on similar days

11
  • Classification
  • can be used to develop a better understanding of
    each class in the web log database, and perhaps
    restructure a web sit or customize answers to
    requests based on classes of requests
  • Time-series analysis -
  • to analyze data along time sequences to discover
    time-related interesting patterns
  • ? disclose the patterns and trends of the
    improvement of services of the web server
  • Focus will be on time-series analysis because
    web log records are highly time-related

12
Experiments with the web log miner
  • Virtual-Usix different major component
  • Goal - understand the usage and user behavior
    patterns
  • Data Cleaning and transformations
  • all entries were mapped one on one into
    relational database
  • field site, user action are added.
  • Problem
  • extraneous information gt define those entries
    and eliminate them
  • multiple server requests by same user action
  • same server request by multiple user actions
  • local activities are not recorded

13
(No Transcript)
14
  • Multi-dimensional data cube construction
    manipulation
  • summarization(group-bys on different dimensions)
  • request/domain /event/session/bandwidth/
  • error/referring organization /browser summary
  • Examples
  • Figure2) OLAP analysis of Web log

15
Fig3) Typical event sequence and user behavior
pattern analysis Fig4) Web traffic analysis of
Web log
16
(No Transcript)
17
  • Fig6) Event trees of month one to four

18
Discussion and Conclusion
  • WebLogMiner
  • OLAP and data mining technique
  • multi-dimensional data cube
  • major strength
  • scalability, interactivity, variety, flexibility
  • Current log file? ???
  • web server should collect more information
  • new structure is needed gt would simplify
    pre-processing
About PowerShow.com