Building an Intelligent Web: Theory and Practice - PowerPoint PPT Presentation

1 / 94
About This Presentation
Title:

Building an Intelligent Web: Theory and Practice

Description:

American University of Armenia and SIBER, India. Information Retrieval ... Calculate frequency ... Domain Web Crawlers. An Implementation of a Web Crawler ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 95
Provided by: pawanl
Category:

less

Transcript and Presenter's Notes

Title: Building an Intelligent Web: Theory and Practice


1
Building an Intelligent WebTheory and Practice
  • Pawan Lingras
  • Saint Marys University
  • Rajendra Akerkar
  • American University of Armenia and SIBER, India

2
(No Transcript)
3
(No Transcript)
4
Information Retrieval
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Data Mining has emerged as one of the most
exciting and dynamic fields in computing science.
The driving force for data mining is the presence
of petabyte-scale online archives that
potentially contain valuable bits of information
hidden in them. Commercial enterprises have been
quick to recognize the value of this concept
consequently, within the span of a few years, the
software market itself for data mining is
expected to be in excess of 10 billion. Data
mining refers to a family of techniques used to
detect interesting nuggets of relationships/knowle
dge in data. While the theoretical underpinnings
of the field have been around for quite some time
(in the form of pattern recognition, statistics,
data analysis and machine learning), the practice
and use of these techniques have been largely
ad-hoc. With the availability of large databases
to store, manage and assimilate data, the new
thrust of data mining lies at the intersection of
database systems, artificial intelligence and
algorithms that efficiently analyze data. The
distributed nature of several databases, their
size and the high complexity of many techniques
present interesting computational challenges.
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Figure 2.43 Relationship between precision and
recall
17
(No Transcript)
18
Semantic Web
19
Semantic WebThe layer language model
(Berners-Lee, 2001 Broekstra et al, 2001)
20
(No Transcript)
21
(No Transcript)
22
Figure 3.4 Representing classes and instances
(Noy et al., 2001)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Queries 1 and 2
27
Queries 3 and 4
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
A RDF model for automobiles
33
(No Transcript)
34
(No Transcript)
35
Classification and Association
36
Data Preparation
  • Database Theory
  • SQL
  • Data Transformation
  • http//www.ecn.purdue.edu/KDDCUP/data/

37
Classification
  • Find a rule, a formula, or black box classifier
    for organizing data into classes.
  • Classify clients requesting loans into categories
    based on the likelihood of repayment
  • Classify customers into Big or Moderate Spenders
    based on what they buy
  • Classify the customers into loyal, semi-loyal,
    infrequent based on the products they buy
  • The classifier is developed from the data in the
    training set
  • The reliability of the classifier is evaluated
    using the test set of data

38
Classification
  • ID3 Algorithm
  • Numerical Illustration
  • Application to a Small E-commerce Dataset
  • C4.5 for Experimentation
  • Other approaches
  • Neural Networks
  • Fuzzy Classification
  • Rough Set Theory

39
Association
  • Market basket analysis
  • determine which things go together
  • Transactions might reveal that
  • customers who buy banana also buy candles
  • cheese and pickled onions seem to occur
    frequently in a shopping cart
  • Information can be used for
  • arranging a physical shop or structuring the Web
    site
  • for targeted advertising campaign

40
Association
  • Apriori Algorithm
  • Demonstration for an E-commerce Application

41
Clustering
42
Clustering
  • Breaks a large database into different subgroups
    or clusters
  • Unlike classification there are no predefined
    classes
  • The clusters are put together on the basis of
    similarity to each other
  • The data miners determine whether the clusters
    offer any useful insight

43
(No Transcript)
44
Statistical Methods
  • k means
  • Numerical Example
  • Implementation
  • Data Preparation
  • Clustering
  • Other Methods

45
Neural Network Based Approaches
  • Kohonen Self Organising Maps
  • Numerical Demonstration
  • Application to Web Data Collection
  • Other Neural Network Based Approaches

46
Clustering of customers
47
(No Transcript)
48
Web Usage Mining
49
High level web usage mining process(Srivastava
et al., 2000)
50
Applications of web usage mining(Romanko, 2006
Srivastava et al., 2000)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
Clustering exercise
68
(No Transcript)
69
(No Transcript)
70
Classification exercise
71
Association exercise
72
(No Transcript)
73
Sequence Pattern Analysis of Web Logs
74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
Web Content Mining
78
Data Collection
  • Web Crawlers
  • Public Domain Web Crawlers
  • An Implementation of a Web Crawler

79
Architecture of a search engine(Romanko, 2006)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
Other topics in Web Content Mining
  • Search Engines
  • How to prepare for and setup a search engine
  • Types and listings of search engines (freeware,
    remote hosting services, commercial)
  • Multimedia Information Retrieval

84
Web Structure Mining
85
(No Transcript)
86
http//www.iprcom.com/papers/pagerank/
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
Index quality for different search
engines(Henzinger, et al., 1999)
92
Index quality per page for different search
engines(Henzinger, et al., 1999)
93
(No Transcript)
94
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com