Title: Scalable Web Usage Mining and Soft Computing Approaches for High Performance Intelligent Web Recomme
1Scalable Web Usage Mining and Soft Computing
Approaches for High Performance Intelligent Web
Recommender Systems
- Olfa Nasraoui
- Research Assistants Cesar Cardona, Carlos Rojas,
Fabio Gonzalez, Elizabeth Leon, Chris Petenes,
Mrudula Pavuluri - Dept. of Electrical Computer Engineering
- The University of Memphis
- E-mail onasraou_at_memphis.edu
Research sponsored by National Science Foundation
CAREER Award NSF-IIS 0133948
2Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
3Web Personalization
- WWW Personalization Tailor users interaction
with Web info space based on information about
the user - eg recommend items/links based on prior
ratings/visits - Manually entered profiles are subjective, static,
not always available, and raised privacy concerns - Alternative Extract profiles based on all users
- access patterns Mass profiling
- ?anonymous profiles
4Web Personalization System
Web Usage Mining
P R O F I L E S
Recomm-endation Engine
5Web Mining Data Types Challenges
- Mining the Web Data
- Content Web pages
- Usage Web access log files
- Structure Link structure of Web pages
- Challenges
- Huge, semi/unstructured, highly dynamic data
- data corrupted with noise (not all info. is
relevant)
6Web Usage Mining Framework
7Knowledge Discovery Process For Web Usage Mining
- Source of Data Web Clickstreams ? Web Log Files
- Goal Extract interesting user profiles by
categorizing user sessions into groups or
clusters - Complete KDD process
- Preprocessing selecting and cleaning data
- Data Mining (Learning Phase)
- A model for session data (bag of URLs visited)
- Similarity assessment
- Clustering algorithm to categorize sessions
- Derivation and Interpretation of results
- Computing profiles
- Evaluating results
8Different Ways to Mine Web User Profiles
Possible Solutions Problems
- Mining Web Usage Data using Clustering
- K Means (Shahabi et al. 1997) (problems w/
Euclidean distance, feature vector
representation, sparsity, noise, not knowing the
of clusters, assumes user profiles do not
overlap, sensitive to initialization local
optima, etc) - Relational Clustering (same as above, except that
they use a distance/relation matrix instead of
vector representation, huge memory and
computation requirements, etc) - Fuzzy Relational Clustering (Nasraoui et al.
1999) (same as above, but allows overlapping
clusters, etc) - Robust relational clustering (Nasraoui et al.
1999) (same as above, but can handle noise in
data!)
9Different Ways to Mine Web User Profiles
Possible Solutions Problems
- Evolutionary techniques can avoid the feature
vector representation dilemma (but appropriate
coding is required), flexible allow any
similarity measure, any subjective fitness
criterion measure, better global search
(population based) - Hierarchical Unsupervised Niche Clustering or
HUNC (Nasraoui Krishnapuram, 2001) based on
Darwinian evolution metaphor and niche
speciation, handles noise, unknown of clusters,
reliable w.r.t initialization - Immune Based Clustering (Nasraoui et al., 2002)
based on immune system metaphor a microcosm of
evolution - Need a Scalable Immune Based Clustering linear
complexity even for huge clickstream data that
cannot fit in main memory dynamic online
learning of evolving user profiles (Nasraoui et
al., 2003)
10Different Ways to Mine Web User Profiles
Possible Solutions Problems
- Other Approaches (Web Session Transaction)
- Frequent Itemset Association Rule Mining
- Very sensitive to required parameters support
and confidence thresholds - Either too few profiles or too many including
spurious profiles - Exorbitant computational complexity for low
support thresholds - Association Rule Mining, followed by Hypergraph
Partition based Clustering - Above all drawbacks of hypergraph partition
clustering (huge complexity)
11Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
12Background Genetic Algorithms (Inspired by
Nature Darwinian Evolution)
- Genetic Algorithms (GAs) Evolve a population of
individuals/solutions using selection, crossover,
mutation as in nature
Operators
Operators
13Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
14Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering Step 1 Data
Preprocessing
- Access log Record of all URLs accessed by users
on a Web site - Log entry access time, IP address, URL viewed,
etc. - ___________________________________________
- 171148 141.225.195.29 GET /graphics/griffin.jpg
200 - 171148 141.225.195.29 GET /people/faculty/nasrao
ui.html 200 - ___________________________________________
- Map NU URLs on site to indices
- User session vector s(i) temporally compact
sequence of Web accesses by a user
15Step 2 Clustering Sessions
- Clustering Dividing unlabeled data into groups
- Unsupervised clustering when number of
categories unknown - Robust clustering when data contains noise
outliers - Genetic Clustering Can deal with
non-differentiable objective functions/similarity
measures.
16Unsupervised Niche Clustering (UNC)
- Representation binary chromosome strings (one
substring per feature) - Deterministic Crowding Selection Children
replace closest parent if they have better
fitness. - Density fitness measure
- Robust weight
17Evolution Example
18Adaptation to Web Mining Hierarchical UNC (HUNC)
- Encode binary session vectors
- Perform UNC in hierarchical mode (HUNC)
- ? Fast Multi-resolution profiling Vary no. of
levels (L) - Start by applying UNC to entire data set w/ small
pop. size (L 1) - Focus on each cluster recursively Reapply UNC on
data subset assigned to each cluster to extract
more clusters at higher resolution (L gt 1) - Repeat until cluster size or scale become too
small
19Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
20Web Mining Experimental Results with HUNC
21(No Transcript)
22Level 2 examples
- General outside visitor Profiles 1 and 3
- Prospective students Profiles 2 and 4
- Insiders (students) Profiles 6, 7, etc
23Main Site of Univ. Of Missouri
Profile 16 Example of discovering associations
w/out any prior knowledge of content
24Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
25Towards Scalability Dynamic Web Mining
- Typically, data mining has to be completely
re-applied periodically and offline on newly
generated Web server logs in order to keep the
discovered knowledge up to date. - An intelligent Web mining system should be able
to continuously learn evolving usage trends
without ungraceful stoppages, reconfigurations,
or restarting from scratch - Need to make scalable to handle huge data sets
given limited amounts of main memory - We may view the problem as clustering data
streams huge flux of data with very limited
memory to store it need single pass learning. - Applicable both to usage and content (text) data
26Another Evolutionary System in Nature The
Immune System
- immune system parallel and distributed adaptive
system w/ tremendous potential in many
intelligent computing applications. - Protects our bodies from foreign pathogens
(viruses/bacteria) - Innate Immune System (initial, limited, ex skin,
tears, etc) - Acquired Immune System (Learns how to respond to
NEW threats adaptively through an evolutionary
process) - Primary immune response
- First response to invading pathogens
- Secondary immune response
- Encountering similar pathogen a second time
- Remember past encounters
- Faster and stronger response than primary response
27Points of Strength of The Immune System
- Recognition (Anomaly detection, Noise tolerance)
- Robustness (Noise tolerance)
- Feature extraction
- Diversity (can face an entire repertoire of
foreign invaders) - Reinforcement learning
- Memory (remembers past encounters basis for
vaccine) - Distributed Detection (no single central system)
- Multi-layered (defense mechanisms at multiple
levels) - Adaptive (Self-regulated)
28Learning in the Immune System
- Main purpose of the immune system recognize all
cells (or molecules) within the body and
categorize those cells as self or non-self. - Non-self cells are further categorized in order
to stimulate an appropriate type of defensive
mechanism. -
- The immune system learns through micro-scale
evolution to distinguish between foreign antigens
(e.g., bacteria, viruses, etc.) and the body's
own cells or molecules.
29B-Cells
- Through a process of recognition and stimulation,
B-Cells will clone and mutate to produce a
diverse set of antibodies adapted to different
antigens - B-Cells secrete antibodies that can bind to
specific antigens and destroy their host invading
agent through a KILL, SUICIDE, or INGEST signal. - B-Cells antibody also can bind to antibodies on
other B-Cells, hence sending a STIMULATE or
SUPPRESS signal ? Network!
30Immune Recognition
- Immune recognition based on complementarity
between binding region of the receptor and a
portion - of the antigen called epitope.
- B-cell Antibodies present a single type of
receptor, antigens might present several
epitopes. - This means that different antibodies can
recognize a single antigen. - Binding between B-cells and antigens is NOT exact
(soft error-tolerant binding)
31Artificial Immune Systems (AIS) Recent History
- Based on the Immune Network theory (Jerne, 1973)
- The system consists of a network of B-Cells
- Antigens represent data
- B-Cells represent clusters
- B-cells form a Network by interacting with each
other through stimulation and suppression (to
form memory of past antigens) - Exponential explosion in B-Cell population!!!
- Huge immune network bottleneck against
scalability! - Unscalable (time and memory!)
32Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
33General Architecture of Proposed Approach
- Information in Immune network
- Stimulation (competition memory)
- Age (old vs. new)
- Co-stimulation /
- suppression
- ? network interactions
- - Outliers (based on activation)
1-Pass Adaptive Immune Learning
Evolving data ?
?
Evolving Immune Network (compressed into
subnetworks)
34Model for Artificial Immune Cell
- Antigens data
- B-Cells clusters or patterns to be
learned/extracted - Dynamic environment antigens are presented to
the immune network one at a time, with the
stimulation and scale measures re-updated with
each presentation. - antigen index, j monotonically increasing with
time antigens are presented in the following
chronological order x1, x2, , xN, - Dynamic Weighted B-cell (D-W-B-Cell)
- Represents neighborhood modeled by activation
function robust weight/membership function
35Model for Artificial Immune Cell
- Each D-W-B-Cell is allowed to have is own zone of
influence with size/scale si - D-W-B-Cells dynamically adapt their influence
zones/hence stimulation level in a strife for
survival. - Activation Weight function dynamically adapts to
evolving data (time decay) - Outliers are easily detected through weak
activations - Flexible for different attributes types
(numerical, categorical, etc) - D-W-B-cells are cloned in proportion to their
stimulation levels.
36Incremental Update Eqs. Network Interactions
- Stimulation (fitness)
- Influence Zone scale
- Stimulation and Suppression from neighboring
B-cells - Positive suppression (competition), but no
stimulation good population control and no
redundancy, but no memory immune network will
forget past encounters. - Positive stimulation, but no suppression good
memory but no competition proliferation of
D-W-B-cell population maximum redundancy. - Natural tradeoff between redundancy/memory and
competition/reduced costs.
37Divide and Conquer Compress Immune Network into
K Subnetworks
- Assuming that the network is divided into roughly
K equal sized subnetworks (ex w/ 2-3 iter. Of K
Means), - Then the number of internal interactions in an
immune network of NB D-W-B-cells, can drop from
(NB)2 in the uncompressed network, to (NB /K)2
intra-subnetwork interactions and (K-1)
inter-subnetwork interactions in the compressed
immune network. - Can approach linear complexity as K ? (NB)1/2
- Significant savings in computation
38Internal and External Immune Interactions
Before After Compression
39Start/ Reset
Activates subNet ?
Yes
No
Outlier?
- Domain Knowledge Constraints
Yes
B-cells gt MaxLimit?
Secondary storage
No
ImmuNet Stats Visualization
40Immune Based Learning of Web profiles
- The Web server plays the role of the human body,
and the incoming requests play the role of
antigens that need to be detected - The input data is similar to web log data (a
record of all files/URLs accessed by users on a
Web site) - The data is pre-processed to produce session
lists - A session list Si for user i is a list of URLs
visited by same user - In discovery mode, a session is fed to the
learning system as soon as it is available - B-celli ith candidate profile
- List of URLs
- Each profile has its own influence zone defined
by ?i
41(No Transcript)
42(No Transcript)
43(No Transcript)
44Single Pass Results (locationscale) on a Noisy
dataset presented one at a time in the same order
as clusters
350 samples 1125 samples 1925 samples
all 3200 samples
45Ability to distinguish between core and outlier
points (wij lt0.001) for noisy data set presented
in the different orders
cluster 1 to cluster 5, cluster 5 to cluster
1, random order
- Blank areas first pts used to start the network
- Most recent noise pts (depending on order) shown
in black their fate is still uncertain until
future data confirms it
46Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
47Simulation Scenarios for Tracking Evolving Web
Usage Trends
- Scenario 1 We used 20 profiles previously
discovered using Hierarchical Unsupervised Niche
Clustering (HUNC) to partition the Web sessions
into 20 distinct sets of sessions, each one
assigned to the closest profile. Then we
presented these sessions to the immune clustering
algorithm one usage trend at a time from trend 0
to 19. That is, we first present the sessions
assigned to trend 0, then the sessions assigned
to profile 1, , etc. - Scenario 2 we used the same pre-partitioned
session data set as the previous scenario, but
presented the profiles in reverse order from
trend 19 to 0. That is, we first present the
sessions assigned to trend 19, then the sessions
assigned to profile 18, , etc, ending with
sessions from trend 0. - Scenario 3 Natural chronological order exactly
as they were received in real time by the web
server.
48 Distribution of input sessions
- Trends 5, 9, 13, 14, 15, and 19 appear to be
weaker and noisier. - Also trends 6 and 7 emerge late in the 12-day
access log, while trend 0 weakens in the last days
49Noise sessions
50Evaluation Method
- In each scenario, we track the actual composition
of the B cells in the immune network, i.e., the
URLs present in the B cell (profile). - We also track the number of B cells that succeed
in learning each of the 20 ground truth profiles
after each session is presented - by computing an evolving number of hits per
expected usage trend number of B-cells within
0.4 radius of the ground truth profile. - distance is computed as (1 - cosine similarity)2.
51Hits per usage trend vs. time
52Hits per usage trend vs. time scenario 1 (from
trend 0 to 19)
53Hits per usage trend vs. time scenario 2
(reverse order)
54Hits per usage trend vs. time scenario 3
(chronological)
55If centroids of compressed sub-networks are
allowed to clone
56Hits Based on High Precision and Coverage
B-cells form a faithful synopsis of input usage
data in 1 pass.
Compare to Input Data (below)
57Suitability for Real Time Web Mining
- Single pass over all 1704 Web user sessions
(non-optimized Java code) lt 7 seconds on 2 GHz
Pentium IV PC on Linux. - Ability to learn an unknown number of evolving
profiles in real time average of 4 milliseconds
per user session ? suitable for real time
personalization. - Old profiles can be handled in a variety of ways
They may either be discarded, moved to secondary
storage, or cached for possible re-emergence. - Even if discarded, older profiles that re-emerge
later, would be re-learned from scratch. - Logistics of maintaining old profiles are not
crucial. - Used same technique successfully to track learn
evolving topic categories/clusters in text data
58Outline
- Web Usage Mining
- Background Evolutionary Computation
- Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Need for scalability Background Artificial
Immune Systems - Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
59Recommender Systems Motivations
- The move from traditional commerce ? e-commerce
- Space limited physical inventory ? Huge virtual
inventory (Information Overload) - Recommender systems
- Help in navigation (adaptive websites)
- Enhance e-commerce sales (cross-sell, customer
loyalty, turning browsers into buyers,etc) - Web request prediction prefetching, caching,
load balancing (for single websites ISPs) - E-commerce sites simply cannot survive without
recommender systems! - Same for huge web portals (Yahoo!)
- Same for huge digital libraries and Web
information systems - On Search engines context awareness modify
query or order/rank of search results
60Web Personalization System
Web Usage Mining
P R O F I L E S
Recomm-endation Engine
61Recommender Systems
- K- Nearest Neighbor based Collaborative
Filtering Recommend items preferred by K
nearest neighbors (usually top N items) - Association Rule based
- Discover association rules
- items_L ? items_R (support, confidence)
- At recommendation time find all rules supported
by customer (rated items included in items_L) - Sort rules by confidence
- Recommend first N ranked products
- Challenges Scalability, sparsity, low coverage
or precision
62Fuzzy Inference (IF x is A THEN y is B) via fuzzy
relation matrix
- R X ? Y ? 0,1 - encodes strength of
relationship between x and y - (x,y) ? mR (x,y)
- R P ? U ? 0,1 - encodes strength of
relationship between Profile i and URL j - (Pi,urlj) ? mR (Pi,urlj) Pij
R (profiles)
A mA(x)
B mB(y)
s (session)
ms(i)
R
r (recommendations) mr(j)
similarity
Input membership computation
Fuzzy Inference
Pi (profiles)
63Fuzzy Recommendation Engine
- Given current session s, infer recommendation r.
Hence, the following implication - s ? r.
- Given the Web user profiles discovered by mining
the Web logs, the relation R is defined as
follows - Rik pik
- The input fuzzy set is derived from the current
session x s by computing the similarity value
between s defined in (1) and each profile, pi, as
follows msi ms(i) sim (s, pi) - inferrence procedure concludes the recommendation
r as the possibility for URL relevance via the
following composition - mrk mr(URLk) ms(i) ? R
t-norm (intersection/AND, e.g. min)
t-conorm (union/OR, e.g. max)
64(No Transcript)
65Simulation Experiments
- Given the prediscovered profiles, and a set of
Web sessions extracted from the same Web log
file, we treat every complete session as
ground-truth session. - For each such ground-truth session, all possible
subsets of this session consisting of between 1
and 9 URLs are considered as current test
subsessions, - recommendations are generated for each test
subsession, and - coverage and precision measures are averaged for
each subsession size, - roughly 380,000 separate recommendation tests,
each tested using 16 different recommendation
scenarios, corresponding to different
combinations of input membership generation,
composition, ...etc,
66Tested Recommendation Scenarios
- 2 types of similarities cosine or Web session
(denoted as Cosine and WS respectively in the
plots). - 2 different compositions were tested Max-Min and
Bounded Sum-Min (denoted as MM and BS in the
plots). - 2 different types of profiles were tested
- raw profiles generate real similarities (denoted
as Real Cosine and Real WS in plots), and - crisp a cuts with a0.2 (denoted as Binary
Cosine - .2 Thresholded Profile or Binary WS - .2
Thresholded Profile in plots, and was only tested
for max-min composition). - Either raw recommendations or crisp a cuts with
a0.001, 0.2, and 0.3 (denoted Type of Input
Similarity - a Thresholded To Bin in plots).
67Evaluation Measures
- actual completed session, sT treated as
ground-truth, - a subset of this session is treated as incomplete
current sub-session, sj, - rj fuzzy recommendations
- Then rj rj - sj recommendations obtained
after omitting all URLs that are part of the
current subsession. - Also, sj sT - sj ground truth URLs, not
including the ones in the current subsession
being processed for recommendations. - Precision is given by
- Coverage is given by
68Coverage Comparison Fuzzy vs. Nearest-Profile
K-NN
69Precision Fuzzy (better for longer sessions) vs.
Nearest Profile (middle performance) K-NN
70F1 Measure Fuzzy (better for longer sessions)
vs. Nearest Profile K-NN
71Discussion of Results Comparison with Nearest
Profile K-NN
- Fuzzy Recommendations have better Precision at
larger session sizes (starting at 3 to 4 URLs) - K-NN have highest Precision at small session
sizes (lt4) - However Nearest-Profile performs midway between
k-NN and Fuzzy at small session sizes (lt4) - Fuzzy Recommendations have superior Coverage
regardless of session length - Overall F1 measure Fuzzy Recommendations are
better when the session size is larger and
Nearest-Profile performs midway between k-NN and
Fuzzy at small session sizes (lt4) - Best to combine Nearest-Profile and Fuzzy and
alternate between max-min and Bounded-sum
depending on session length!!!
72Time and Memory Complexity
- Offline training takes longer with the profile
based approach. - However this can be done on a back end computer
and not on the server, and is therefore an
offline process that does not affect the
operation of the web server. - On the other hand, both fuzzy and nearest profile
based approaches are extremely fast at
recommendation time and require a minimal amount
of main memory to function - (a mere summary of the previous usage history
instead of the entire history as in collaborative
filtering). - In our simulations, fuzzy recommendations with
non-optimized Perl script (non-compiled) code
running on a 2 GHz Pentium 4 Linux PC operated at
an average of 48 recommendations per second. - The far more computationally complex K-Nearest
Neighbor generated recommendations at a leisurely
2 recommendations per second.
73Two-Step Recommendation Process based on a
Committee of Profile-Specific URL-Predictor
Neural Networks
74Average precision, coverage, F1, cosine for the
Two-Step Profile-Specific URL-Predictor
Recommender model
75Outline
- Web Usage Mining
- Background Evolutionary Computation Artificial
Immune Systems - Mining Web Profiles with Hierarchical
Unsupervised Niche Clustering (H-UNC) - H-UNC Web Mining Experimental Results
- Scalable Web Usage Mining based on the Immune
System Metaphor - Scalable Web Usage Mining Preliminary Results
- Tying it All Together An Automated Real Time
Recommendation System - Conclusions and Future Prospects
76Conclusion And Future Prospects
- Ill-Posed Feature Representation problems,
Subjective dissimilarity between Web sessions and
subjective multi-modal fitness functions handled
well using Evolutionary computation - Why Evolutionary computation?
- Deals w/ ill-Posed Feature Representation
problems, - Handles subjective dissimilarity between Web
sessions - Handles subjective multi-modal fitness functions
- Insensitive to initialization
- HUNC first evolutionary based technique for Web
usage mining - Why NOT Evolutionary computation?
- Scalability problems
- Immune System Clustering scalable and flexible
in the face of huge dynamic web usage access
trends
77Conclusion And Future Prospects
- Immune System Clustering Also used successfully
to extract topics/cluster text documents in 1
pass. - Also used successfully to cluster and perform
anomaly detection on kdd-cup99 network data in 1
pass (unsupervised learning using only the normal
data with no labels) - Results in both attack detection false alarms
superior to best results (kdd-cup winner
supervised learning trained with data labeled in
normal all different attacks) - Completely Automated real-time web
personalization system is feasible - Soft computing techniques for intelligent
recommender systems - Web usage mining Fuzzy Inference Based better
for longer sessions (increased uncertainty and
noise) - Web usage mining Two Stage Neural Network
Unorecedented performance, but slow in training
78Impact on WWW
- Scalable and adaptive Recommendation engine for
personalization - Improve search results by taking profile/context
into account - Improve design of dynamic Web sites
- Facilitate navigation
79Some Related Publications
- O. Nasraoui and R. Krishnapuram, and A. Joshi.
Mining Web Access Logs Using a Relational
Clustering Algorithm Based on a Robust Estimator,
8th International World Wide Web Conference,
Toronto, pp. 40-41, 1999. - O. Nasraoui, and R. Krishnapuram, A Novel
Approach to Unsupervised Robust Clustering using
Genetic Niching, Proc. of the 9th IEEE
International Conf. on Fuzzy Systems, San
Antonio, TX, May 2000, pp. 170-175. - O. Nasraoui and R. Krishnapuram. A New
Evolutionary Approach to Web Usage and Context
Sensitive Associations Mining, International
Journal on Computational Intelligence and
Applications - Special Issue on Internet
Intelligent Systems, Vol. 2, No. 3, pp. 339-348,
Sep. 2002. - Nasraoui O., Cardona C., Rojas C., and Gonzalez
F, "TECNO-STREAMS Tracking Evolving Clusters in
Noisy Data Streams with a Scalable Immune System
Learning Model", in Proc. of Third IEEE
International Conference on Data Mining
(ICDM'03), Melbourne, FL, November 2003. - Nasraoui O., Petenes C., "Combining Web Usage
Mining and Fuzzy Inference for Website
Personalization", in Proc. of WebKDD 2003 KDD
Workshop on Web mining as a Premise to Effective
and Intelligent Web Applications, Washington DC,
August 2003, p. 37. - Nasraoui O., Gonzalez F., Cardona C., Rojas C.,
and Dasgupta D., "A Scalable Artificial Immune
System Model for Dynamic Unsupervised Learning",
Proc. of the Genetic and Evolutionary Computation
Conference (GECCO), Chicago, IL, July 2003, p.
219, - Nasraoui O., Cardona C., Rojas C., and Gonzalez
F., "Mining Evolving User Profiles in Noisy Web
Clickstream Data with a Scalable Immune System
Clustering Algorithm", in Proc. of WebKDD 2003
KDD Workshop on Web mining as a Premise to
Effective and Intelligent Web Applications,
Washington DC, August 2003, p. 71,