Scalable Web Usage Mining and Soft Computing Approaches for High Performance Intelligent Web Recomme

About This Presentation

Title:

Scalable Web Usage Mining and Soft Computing Approaches for High Performance Intelligent Web Recomme

Description:

Encountering similar pathogen a second time. Remember past encounters ... Memory (remembers past encounters: basis for vaccine) ... – PowerPoint PPT presentation

Number of Views:154

Avg rating:3.0/5.0

Slides: 80

Provided by: wwwbiscCs

Category:

more less

Transcript and Presenter's Notes

Title: Scalable Web Usage Mining and Soft Computing Approaches for High Performance Intelligent Web Recomme

1
Scalable Web Usage Mining and Soft Computing
Approaches for High Performance Intelligent Web
Recommender Systems

Olfa Nasraoui
Research Assistants Cesar Cardona, Carlos Rojas,
Fabio Gonzalez, Elizabeth Leon, Chris Petenes,
Mrudula Pavuluri
Dept. of Electrical Computer Engineering
The University of Memphis
E-mail onasraou_at_memphis.edu

Research sponsored by National Science Foundation
CAREER Award NSF-IIS 0133948
2
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

3
Web Personalization

WWW Personalization Tailor users interaction
with Web info space based on information about
the user
eg recommend items/links based on prior
ratings/visits
Manually entered profiles are subjective, static,
not always available, and raised privacy concerns
Alternative Extract profiles based on all users
access patterns Mass profiling
?anonymous profiles

4
Web Personalization System
Web Usage Mining
P R O F I L E S
Recomm-endation Engine
5
Web Mining Data Types Challenges

Mining the Web Data
Content Web pages
Usage Web access log files
Structure Link structure of Web pages
Challenges
Huge, semi/unstructured, highly dynamic data
data corrupted with noise (not all info. is
relevant)

6
Web Usage Mining Framework
7
Knowledge Discovery Process For Web Usage Mining

Source of Data Web Clickstreams ? Web Log Files
Goal Extract interesting user profiles by
categorizing user sessions into groups or
clusters
Complete KDD process
Preprocessing selecting and cleaning data
Data Mining (Learning Phase)
A model for session data (bag of URLs visited)
Similarity assessment
Clustering algorithm to categorize sessions
Derivation and Interpretation of results
Computing profiles
Evaluating results

8
Different Ways to Mine Web User Profiles
Possible Solutions Problems

Mining Web Usage Data using Clustering
K Means (Shahabi et al. 1997) (problems w/
Euclidean distance, feature vector
representation, sparsity, noise, not knowing the
of clusters, assumes user profiles do not
overlap, sensitive to initialization local
optima, etc)
Relational Clustering (same as above, except that
they use a distance/relation matrix instead of
vector representation, huge memory and
computation requirements, etc)
Fuzzy Relational Clustering (Nasraoui et al.
1999) (same as above, but allows overlapping
clusters, etc)
Robust relational clustering (Nasraoui et al.
1999) (same as above, but can handle noise in
data!)

9
Different Ways to Mine Web User Profiles
Possible Solutions Problems

Evolutionary techniques can avoid the feature
vector representation dilemma (but appropriate
coding is required), flexible allow any
similarity measure, any subjective fitness
criterion measure, better global search
(population based)
Hierarchical Unsupervised Niche Clustering or
HUNC (Nasraoui Krishnapuram, 2001) based on
Darwinian evolution metaphor and niche
speciation, handles noise, unknown of clusters,
reliable w.r.t initialization
Immune Based Clustering (Nasraoui et al., 2002)
based on immune system metaphor a microcosm of
evolution
Need a Scalable Immune Based Clustering linear
complexity even for huge clickstream data that
cannot fit in main memory dynamic online
learning of evolving user profiles (Nasraoui et
al., 2003)

10
Different Ways to Mine Web User Profiles
Possible Solutions Problems

Other Approaches (Web Session Transaction)
Frequent Itemset Association Rule Mining
Very sensitive to required parameters support
and confidence thresholds
Either too few profiles or too many including
spurious profiles
Exorbitant computational complexity for low
support thresholds
Association Rule Mining, followed by Hypergraph
Partition based Clustering
Above all drawbacks of hypergraph partition
clustering (huge complexity)

11
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

12
Background Genetic Algorithms (Inspired by
Nature Darwinian Evolution)

Genetic Algorithms (GAs) Evolve a population of
individuals/solutions using selection, crossover,
mutation as in nature

Operators
Operators
13
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

14
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering Step 1 Data
Preprocessing

Access log Record of all URLs accessed by users
on a Web site
Log entry access time, IP address, URL viewed,
etc.
___________________________________________
171148 141.225.195.29 GET /graphics/griffin.jpg
200
171148 141.225.195.29 GET /people/faculty/nasrao
ui.html 200
___________________________________________
Map NU URLs on site to indices
User session vector s(i) temporally compact
sequence of Web accesses by a user

15
Step 2 Clustering Sessions

Clustering Dividing unlabeled data into groups
Unsupervised clustering when number of
categories unknown
Robust clustering when data contains noise
outliers
Genetic Clustering Can deal with
non-differentiable objective functions/similarity
measures.

16
Unsupervised Niche Clustering (UNC)

Representation binary chromosome strings (one
substring per feature)
Deterministic Crowding Selection Children
replace closest parent if they have better
fitness.
Density fitness measure
Robust weight

17
Evolution Example
18
Adaptation to Web Mining Hierarchical UNC (HUNC)

Encode binary session vectors
Perform UNC in hierarchical mode (HUNC)
? Fast Multi-resolution profiling Vary no. of
levels (L)
Start by applying UNC to entire data set w/ small
pop. size (L 1)
Focus on each cluster recursively Reapply UNC on
data subset assigned to each cluster to extract
more clusters at higher resolution (L gt 1)
Repeat until cluster size or scale become too
small

19
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

20
Web Mining Experimental Results with HUNC
21
(No Transcript)
22
Level 2 examples

General outside visitor Profiles 1 and 3
Prospective students Profiles 2 and 4
Insiders (students) Profiles 6, 7, etc

23
Main Site of Univ. Of Missouri
Profile 16 Example of discovering associations
w/out any prior knowledge of content
24
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

25
Towards Scalability Dynamic Web Mining

Typically, data mining has to be completely
re-applied periodically and offline on newly
generated Web server logs in order to keep the
discovered knowledge up to date.
An intelligent Web mining system should be able
to continuously learn evolving usage trends
without ungraceful stoppages, reconfigurations,
or restarting from scratch
Need to make scalable to handle huge data sets
given limited amounts of main memory
We may view the problem as clustering data
streams huge flux of data with very limited
memory to store it need single pass learning.
Applicable both to usage and content (text) data

26
Another Evolutionary System in Nature The
Immune System

immune system parallel and distributed adaptive
system w/ tremendous potential in many
intelligent computing applications.
Protects our bodies from foreign pathogens
(viruses/bacteria)
Innate Immune System (initial, limited, ex skin,
tears, etc)
Acquired Immune System (Learns how to respond to
NEW threats adaptively through an evolutionary
process)
Primary immune response
First response to invading pathogens
Secondary immune response
Encountering similar pathogen a second time
Remember past encounters
Faster and stronger response than primary response

27
Points of Strength of The Immune System

Recognition (Anomaly detection, Noise tolerance)
Robustness (Noise tolerance)
Feature extraction
Diversity (can face an entire repertoire of
foreign invaders)
Reinforcement learning
Memory (remembers past encounters basis for
vaccine)
Distributed Detection (no single central system)
Multi-layered (defense mechanisms at multiple
levels)
Adaptive (Self-regulated)

28
Learning in the Immune System

Main purpose of the immune system recognize all
cells (or molecules) within the body and
categorize those cells as self or non-self.
Non-self cells are further categorized in order
to stimulate an appropriate type of defensive
mechanism.
The immune system learns through micro-scale
evolution to distinguish between foreign antigens
(e.g., bacteria, viruses, etc.) and the body's
own cells or molecules.

29
B-Cells

Through a process of recognition and stimulation,
B-Cells will clone and mutate to produce a
diverse set of antibodies adapted to different
antigens
B-Cells secrete antibodies that can bind to
specific antigens and destroy their host invading
agent through a KILL, SUICIDE, or INGEST signal.
B-Cells antibody also can bind to antibodies on
other B-Cells, hence sending a STIMULATE or
SUPPRESS signal ? Network!

30
Immune Recognition

Immune recognition based on complementarity
between binding region of the receptor and a
portion
of the antigen called epitope.
B-cell Antibodies present a single type of
receptor, antigens might present several
epitopes.
This means that different antibodies can
recognize a single antigen.
Binding between B-cells and antigens is NOT exact
(soft error-tolerant binding)

31
Artificial Immune Systems (AIS) Recent History

Based on the Immune Network theory (Jerne, 1973)
The system consists of a network of B-Cells
Antigens represent data
B-Cells represent clusters
B-cells form a Network by interacting with each
other through stimulation and suppression (to
form memory of past antigens)
Exponential explosion in B-Cell population!!!
Huge immune network bottleneck against
scalability!
Unscalable (time and memory!)

32
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

33
General Architecture of Proposed Approach

Information in Immune network
Stimulation (competition memory)
Age (old vs. new)
Co-stimulation /
suppression
? network interactions
- Outliers (based on activation)

1-Pass Adaptive Immune Learning
Evolving data ?
?
Evolving Immune Network (compressed into
subnetworks)
34
Model for Artificial Immune Cell

Antigens data
B-Cells clusters or patterns to be
learned/extracted
Dynamic environment antigens are presented to
the immune network one at a time, with the
stimulation and scale measures re-updated with
each presentation.
antigen index, j monotonically increasing with
time antigens are presented in the following
chronological order x1, x2, , xN,
Dynamic Weighted B-cell (D-W-B-Cell)
Represents neighborhood modeled by activation
function robust weight/membership function

35
Model for Artificial Immune Cell

Each D-W-B-Cell is allowed to have is own zone of
influence with size/scale si
D-W-B-Cells dynamically adapt their influence
zones/hence stimulation level in a strife for
survival.
Activation Weight function dynamically adapts to
evolving data (time decay)
Outliers are easily detected through weak
activations
Flexible for different attributes types
(numerical, categorical, etc)
D-W-B-cells are cloned in proportion to their
stimulation levels.

36
Incremental Update Eqs. Network Interactions

Stimulation (fitness)
Influence Zone scale
Stimulation and Suppression from neighboring
B-cells
Positive suppression (competition), but no
stimulation good population control and no
redundancy, but no memory immune network will
forget past encounters.
Positive stimulation, but no suppression good
memory but no competition proliferation of
D-W-B-cell population maximum redundancy.
Natural tradeoff between redundancy/memory and
competition/reduced costs.

37
Divide and Conquer Compress Immune Network into
K Subnetworks

Assuming that the network is divided into roughly
K equal sized subnetworks (ex w/ 2-3 iter. Of K
Means),
Then the number of internal interactions in an
immune network of NB D-W-B-cells, can drop from
(NB)2 in the uncompressed network, to (NB /K)2
intra-subnetwork interactions and (K-1)
inter-subnetwork interactions in the compressed
immune network.
Can approach linear complexity as K ? (NB)1/2
Significant savings in computation

38
Internal and External Immune Interactions
Before After Compression
39

Memory Constraints

Start/ Reset
Activates subNet ?
Yes
No
Outlier?

Domain Knowledge Constraints

Yes
B-cells gt MaxLimit?
Secondary storage
No
ImmuNet Stats Visualization
40
Immune Based Learning of Web profiles

The Web server plays the role of the human body,
and the incoming requests play the role of
antigens that need to be detected
The input data is similar to web log data (a
record of all files/URLs accessed by users on a
Web site)
The data is pre-processed to produce session
lists
A session list Si for user i is a list of URLs
visited by same user
In discovery mode, a session is fed to the
learning system as soon as it is available
B-celli ith candidate profile
List of URLs
Each profile has its own influence zone defined
by ?i

41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
Single Pass Results (locationscale) on a Noisy
dataset presented one at a time in the same order
as clusters
350 samples 1125 samples 1925 samples
all 3200 samples
45
Ability to distinguish between core and outlier
points (wij lt0.001) for noisy data set presented
in the different orders
cluster 1 to cluster 5, cluster 5 to cluster
1, random order

Blank areas first pts used to start the network
Most recent noise pts (depending on order) shown
in black their fate is still uncertain until
future data confirms it

46
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

47
Simulation Scenarios for Tracking Evolving Web
Usage Trends

Scenario 1 We used 20 profiles previously
discovered using Hierarchical Unsupervised Niche
Clustering (HUNC) to partition the Web sessions
into 20 distinct sets of sessions, each one
assigned to the closest profile. Then we
presented these sessions to the immune clustering
algorithm one usage trend at a time from trend 0
to 19. That is, we first present the sessions
assigned to trend 0, then the sessions assigned
to profile 1, , etc.
Scenario 2 we used the same pre-partitioned
session data set as the previous scenario, but
presented the profiles in reverse order from
trend 19 to 0. That is, we first present the
sessions assigned to trend 19, then the sessions
assigned to profile 18, , etc, ending with
sessions from trend 0.
Scenario 3 Natural chronological order exactly
as they were received in real time by the web
server.

48
Distribution of input sessions

Trends 5, 9, 13, 14, 15, and 19 appear to be
weaker and noisier.
Also trends 6 and 7 emerge late in the 12-day
access log, while trend 0 weakens in the last days

49
Noise sessions
50
Evaluation Method

In each scenario, we track the actual composition
of the B cells in the immune network, i.e., the
URLs present in the B cell (profile).
We also track the number of B cells that succeed
in learning each of the 20 ground truth profiles
after each session is presented
by computing an evolving number of hits per
expected usage trend number of B-cells within
0.4 radius of the ground truth profile.
distance is computed as (1 - cosine similarity)2.

51
Hits per usage trend vs. time
52
Hits per usage trend vs. time scenario 1 (from
trend 0 to 19)
53
Hits per usage trend vs. time scenario 2
(reverse order)
54
Hits per usage trend vs. time scenario 3
(chronological)
55
If centroids of compressed sub-networks are
allowed to clone
56
Hits Based on High Precision and Coverage
B-cells form a faithful synopsis of input usage
data in 1 pass.
Compare to Input Data (below)
57
Suitability for Real Time Web Mining

Single pass over all 1704 Web user sessions
(non-optimized Java code) lt 7 seconds on 2 GHz
Pentium IV PC on Linux.
Ability to learn an unknown number of evolving
profiles in real time average of 4 milliseconds
per user session ? suitable for real time
personalization.
Old profiles can be handled in a variety of ways
They may either be discarded, moved to secondary
storage, or cached for possible re-emergence.
Even if discarded, older profiles that re-emerge
later, would be re-learned from scratch.
Logistics of maintaining old profiles are not
crucial.
Used same technique successfully to track learn
evolving topic categories/clusters in text data

58
Outline

Web Usage Mining
Background Evolutionary Computation
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Need for scalability Background Artificial
Immune Systems
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

59
Recommender Systems Motivations

The move from traditional commerce ? e-commerce
Space limited physical inventory ? Huge virtual
inventory (Information Overload)
Recommender systems
Help in navigation (adaptive websites)
Enhance e-commerce sales (cross-sell, customer
loyalty, turning browsers into buyers,etc)
Web request prediction prefetching, caching,
load balancing (for single websites ISPs)
E-commerce sites simply cannot survive without
recommender systems!
Same for huge web portals (Yahoo!)
Same for huge digital libraries and Web
information systems
On Search engines context awareness modify
query or order/rank of search results

60
Web Personalization System
Web Usage Mining
P R O F I L E S
Recomm-endation Engine
61
Recommender Systems

K- Nearest Neighbor based Collaborative
Filtering Recommend items preferred by K
nearest neighbors (usually top N items)
Association Rule based
Discover association rules
items_L ? items_R (support, confidence)
At recommendation time find all rules supported
by customer (rated items included in items_L)
Sort rules by confidence
Recommend first N ranked products
Challenges Scalability, sparsity, low coverage
or precision

62
Fuzzy Inference (IF x is A THEN y is B) via fuzzy
relation matrix

R X ? Y ? 0,1 - encodes strength of
relationship between x and y
(x,y) ? mR (x,y)
R P ? U ? 0,1 - encodes strength of
relationship between Profile i and URL j
(Pi,urlj) ? mR (Pi,urlj) Pij

R (profiles)
A mA(x)
B mB(y)
s (session)
ms(i)
R
r (recommendations) mr(j)
similarity
Input membership computation
Fuzzy Inference
Pi (profiles)
63
Fuzzy Recommendation Engine

Given current session s, infer recommendation r.
Hence, the following implication
s ? r.
Given the Web user profiles discovered by mining
the Web logs, the relation R is defined as
follows
Rik pik
The input fuzzy set is derived from the current
session x s by computing the similarity value
between s defined in (1) and each profile, pi, as
follows msi ms(i) sim (s, pi)
inferrence procedure concludes the recommendation
r as the possibility for URL relevance via the
following composition
mrk mr(URLk) ms(i) ? R

t-norm (intersection/AND, e.g. min)
t-conorm (union/OR, e.g. max)
64
(No Transcript)
65
Simulation Experiments

Given the prediscovered profiles, and a set of
Web sessions extracted from the same Web log
file, we treat every complete session as
ground-truth session.
For each such ground-truth session, all possible
subsets of this session consisting of between 1
and 9 URLs are considered as current test
subsessions,
recommendations are generated for each test
subsession, and
coverage and precision measures are averaged for
each subsession size,
roughly 380,000 separate recommendation tests,
each tested using 16 different recommendation
scenarios, corresponding to different
combinations of input membership generation,
composition, ...etc,

66
Tested Recommendation Scenarios

2 types of similarities cosine or Web session
(denoted as Cosine and WS respectively in the
plots).
2 different compositions were tested Max-Min and
Bounded Sum-Min (denoted as MM and BS in the
plots).
2 different types of profiles were tested
raw profiles generate real similarities (denoted
as Real Cosine and Real WS in plots), and
crisp a cuts with a0.2 (denoted as Binary
Cosine - .2 Thresholded Profile or Binary WS - .2
Thresholded Profile in plots, and was only tested
for max-min composition).
Either raw recommendations or crisp a cuts with
a0.001, 0.2, and 0.3 (denoted Type of Input
Similarity - a Thresholded To Bin in plots).

67
Evaluation Measures

actual completed session, sT treated as
ground-truth,
a subset of this session is treated as incomplete
current sub-session, sj,
rj fuzzy recommendations
Then rj rj - sj recommendations obtained
after omitting all URLs that are part of the
current subsession.
Also, sj sT - sj ground truth URLs, not
including the ones in the current subsession
being processed for recommendations.
Precision is given by
Coverage is given by

68
Coverage Comparison Fuzzy vs. Nearest-Profile
K-NN
69
Precision Fuzzy (better for longer sessions) vs.
Nearest Profile (middle performance) K-NN
70
F1 Measure Fuzzy (better for longer sessions)
vs. Nearest Profile K-NN
71
Discussion of Results Comparison with Nearest
Profile K-NN

Fuzzy Recommendations have better Precision at
larger session sizes (starting at 3 to 4 URLs)
K-NN have highest Precision at small session
sizes (lt4)
However Nearest-Profile performs midway between
k-NN and Fuzzy at small session sizes (lt4)
Fuzzy Recommendations have superior Coverage
regardless of session length
Overall F1 measure Fuzzy Recommendations are
better when the session size is larger and
Nearest-Profile performs midway between k-NN and
Fuzzy at small session sizes (lt4)
Best to combine Nearest-Profile and Fuzzy and
alternate between max-min and Bounded-sum
depending on session length!!!

72
Time and Memory Complexity

Offline training takes longer with the profile
based approach.
However this can be done on a back end computer
and not on the server, and is therefore an
offline process that does not affect the
operation of the web server.
On the other hand, both fuzzy and nearest profile
based approaches are extremely fast at
recommendation time and require a minimal amount
of main memory to function
(a mere summary of the previous usage history
instead of the entire history as in collaborative
filtering).
In our simulations, fuzzy recommendations with
non-optimized Perl script (non-compiled) code
running on a 2 GHz Pentium 4 Linux PC operated at
an average of 48 recommendations per second.
The far more computationally complex K-Nearest
Neighbor generated recommendations at a leisurely
2 recommendations per second.

73
Two-Step Recommendation Process based on a
Committee of Profile-Specific URL-Predictor
Neural Networks
74
Average precision, coverage, F1, cosine for the
Two-Step Profile-Specific URL-Predictor
Recommender model
75
Outline

Web Usage Mining
Background Evolutionary Computation Artificial
Immune Systems
Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC)
H-UNC Web Mining Experimental Results
Scalable Web Usage Mining based on the Immune
System Metaphor
Scalable Web Usage Mining Preliminary Results
Tying it All Together An Automated Real Time
Recommendation System
Conclusions and Future Prospects

76
Conclusion And Future Prospects

Ill-Posed Feature Representation problems,
Subjective dissimilarity between Web sessions and
subjective multi-modal fitness functions handled
well using Evolutionary computation
Why Evolutionary computation?
Deals w/ ill-Posed Feature Representation
problems,
Handles subjective dissimilarity between Web
sessions
Handles subjective multi-modal fitness functions
Insensitive to initialization
HUNC first evolutionary based technique for Web
usage mining
Why NOT Evolutionary computation?
Scalability problems
Immune System Clustering scalable and flexible
in the face of huge dynamic web usage access
trends

77
Conclusion And Future Prospects

Immune System Clustering Also used successfully
to extract topics/cluster text documents in 1
pass.
Also used successfully to cluster and perform
anomaly detection on kdd-cup99 network data in 1
pass (unsupervised learning using only the normal
data with no labels)
Results in both attack detection false alarms
superior to best results (kdd-cup winner
supervised learning trained with data labeled in
normal all different attacks)
Completely Automated real-time web
personalization system is feasible
Soft computing techniques for intelligent
recommender systems
Web usage mining Fuzzy Inference Based better
for longer sessions (increased uncertainty and
noise)
Web usage mining Two Stage Neural Network
Unorecedented performance, but slow in training

78
Impact on WWW

Scalable and adaptive Recommendation engine for
personalization
Improve search results by taking profile/context
into account
Improve design of dynamic Web sites
Facilitate navigation

79
Some Related Publications

O. Nasraoui and R. Krishnapuram, and A. Joshi.
Mining Web Access Logs Using a Relational
Clustering Algorithm Based on a Robust Estimator,
8th International World Wide Web Conference,
Toronto, pp. 40-41, 1999.
O. Nasraoui, and R. Krishnapuram, A Novel
Approach to Unsupervised Robust Clustering using
Genetic Niching, Proc. of the 9th IEEE
International Conf. on Fuzzy Systems, San
Antonio, TX, May 2000, pp. 170-175.
O. Nasraoui and R. Krishnapuram. A New
Evolutionary Approach to Web Usage and Context
Sensitive Associations Mining, International
Journal on Computational Intelligence and
Applications - Special Issue on Internet
Intelligent Systems, Vol. 2, No. 3, pp. 339-348,
Sep. 2002.
Nasraoui O., Cardona C., Rojas C., and Gonzalez
F, "TECNO-STREAMS Tracking Evolving Clusters in
Noisy Data Streams with a Scalable Immune System
Learning Model", in Proc. of Third IEEE
International Conference on Data Mining
(ICDM'03), Melbourne, FL, November 2003.
Nasraoui O., Petenes C., "Combining Web Usage
Mining and Fuzzy Inference for Website
Personalization", in Proc. of WebKDD 2003 KDD
Workshop on Web mining as a Premise to Effective
and Intelligent Web Applications, Washington DC,
August 2003, p. 37.
Nasraoui O., Gonzalez F., Cardona C., Rojas C.,
and Dasgupta D., "A Scalable Artificial Immune
System Model for Dynamic Unsupervised Learning",
Proc. of the Genetic and Evolutionary Computation
Conference (GECCO), Chicago, IL, July 2003, p.
219,
Nasraoui O., Cardona C., Rojas C., and Gonzalez
F., "Mining Evolving User Profiles in Noisy Web
Clickstream Data with a Scalable Immune System
Clustering Algorithm", in Proc. of WebKDD 2003
KDD Workshop on Web mining as a Premise to
Effective and Intelligent Web Applications,
Washington DC, August 2003, p. 71,