Evolving dynamic web pages using web mining - PowerPoint PPT Presentation

Loading...

PPT – Evolving dynamic web pages using web mining PowerPoint presentation | free to download - id: 696859-ZDk0M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Evolving dynamic web pages using web mining

Description:

Evolving dynamic web pages using web mining Kartik Menon Smart Engineering Systems Laboratory Engineering Management Department University of Missouri-Rolla – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 22
Provided by: webMstEdu
Learn more at: http://web.mst.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Evolving dynamic web pages using web mining


1
Evolving dynamic web pages using web mining
Kartik Menon Smart Engineering Systems
Laboratory Engineering Management
Department University of Missouri-Rolla
2
Overview
  • Goal
  • Web Mining
  • General Principle behind web mining
  • Web Data
  • Web Access Pattern Clustering
  • Evolving web pages using cluster information
  • Clustering Techniques
  • Fuzzy C means
  • Experimental Set-up
  • Results
  • Conclusion and Future work
  • Questions

3
Goal
  • Cluster similar web access traversal patterns
    and train the system to understand the needs and
    demands of different users accessing the website
    and use this information to evolve web pages.

4
Web Mining
  • Web Mining
  • Learning about different users accessing a web
    page.
  • The needs and requirements of the user
  • Web Access Traversal Patterns
  • Links which are more popular than others
  • For example www.yahoo.com
  • Emails
  • Search engine
  • News
  • Greeting cards

5
General Principle behind web mining
  • Gather web data from Web Log servers
  • Cluster web traversal patterns
  • Evolve web pages

6
Web Data
  • What information is important for Mining
  • Links traversed (URLs requested)
  • Documents downloaded
  • Time spent on the web page as compared total time
    spent
  • Web Traffic
  • GET or POST messages

7
Web Access Pattern Clustering
  • Find users with similar web access patterns
  • Grouping and separating users
  • Concise representation of a system's behavior
  • Generalize about user needs and interests

8
Evolving Web Pagesusing cluster information
  • The cluster information can be used
  • To know about users
  • Modify the web page
  • Web personalization
  • Evolving Web pages

9
Clustering Techniques
  • Neural Nets
  • Kohonens Self Organizing Maps (SOMs)
  • Statistical
  • K-Means
  • Fuzzy Logic
  • Fuzzy C Means
  • Fuzzy ISODATA

10
Fuzzy C Means
  • Is a data clustering technique where each data
    point belongs to a cluster to some degree that is
    specified by a membership function
  • If
  • X is a set of n data sample vectors
  • U is a partition of X in c part,
  • V are cluster centers
  • d2 is an inner product induced norm
  • u grade of membership of xk to the cluster i
    between 0 and 1
  • m is a parameter to increase or decrease the
    fuzziness

11
Fuzzy C Means (contd)
12
Experimental Set-up
  • Target the website http//campus.umr.edu.
  • Mine the web log files for web data.
  • The main problem is to convert the web sites
    accessed into numeric values.
  • Identify all the URLs from where you can go from
    this web page
  • Number these URLs from 1 to N where N is the Nth
    URL which can be accessed
  • Assign fuzzy weights (w(j)) to each URL that can
    be accessed
  • A Boolean variable s(j) is defined which is set
    to 1 if the jth URL is accessed by the user else
    s(j) is set to null.

13
Experimental Set-up (contd.)
  • Define the data point x as the number
    corresponding to the for all the sites accessed
    by the user in that particular user session.
  • Apply fuzzy c-means by calculating Euclidean
    distance between the data sample as dijxj-ci
    where xj being the data point and ci being the
    center of cluster i.

14

15

IP Address URLs Accessed by the user
131.151.9.999 http//campus.umr.edu, /students, /departments, /departments/academic.htmlarts_science
181.147.7.970 http//campus.umr.edu, /students, /registrar, /registrar/star
181.147.7.972 http//campus.umr.edu, /students, http//web.umr.edu/career, /jobtrak/
181.148.7.979 http//campus.umr.edu, /students, http//web.umr.edu/career, /fairs
16
Results For 2 and 3 clusters
17
Results For 2 and 3 clusters(contd)
18
Web Page Evolution
  • Use the clustered information as
  • an input to modify the web page so that
  • users having similar access patterns get same
    web page as compared to others
  • Adjust the placement of links
  • Remove certain links (if possible)

19
Conclusions
  • Fuzzy c-means is an easy way of
  • clustering similar web access patterns
  • for different user sessions
  • The use of Euclidean distance was very helpful to
    learn more about these web access patterns.
  • The experiment provided easy results and plots
    which was highly interpretable
  • We observe that that fuzzy c-means provided
    stable results for the different data sets we
    took.

20
Future Work
  • Use other clustering algorithms
  • and compare
  • Developing self evolving web sites - sites that
    improve themselves by learning from user access
    patterns
  • The results which we got using the fuzzy
    clustering algorithms could be used to recommend
    the web master of the http//campus.umr.edu
  • Increase the popularity of the web page by
    tailoring it more to the needs of the users
    accessing it

21
Questions ???
About PowerShow.com