Adaptive Information Retrieval - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Adaptive Information Retrieval

Description:

The Pennsylvania State University, University Park, PA, 16802. 2School of Information ... Dick Cheney: Scooter scandal last month, Hunting accident today. ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 63
Provided by: Office2004476
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Information Retrieval


1
Adaptive Information Retrieval
  • Eren Manavoglu1
  • Advisor
  • Dr. C. Lee Giles1,2
  • 1Department of Computer Science and Engineering
  • The Pennsylvania State University, University
    Park, PA, 16802
  • 2School of Information Sciences and Technology
  • The Pennsylvania State University, University
    Park, PA, 16802

2
Information Retrieval over the Web Today
  • Main tools Search engines
  • User input query terms/questions
  • Result presentation ranked list of matching
    documents
  • Ranking algorithm a combination of topical
    relevancy, citations and document popularity
    (probably)

3
Why Do We Need Adaptive Information Retrieval?
  • Different users use the same query to access
    different information.
  • Jaguar animal or automobile?
  • Even the same user may have different information
    needs at different points in time about the same
    topic.
  • Dick Cheney Scooter scandal last month, Hunting
    accident today.

4
Adaptive Information Retrieval Techniques
  • Personalization
  • Personalization on server side Relies on
    explicit user input mainly (MyYahoo, Personalized
    Google) and the parameters of the ranking
    algorithm is personalized
  • Personalization on client side Uses both
    implicit and explicit user feedback. The user
    model is usually in the form of interest
    profile(s)
  • Recommender Systems
  • Will be discussed in more detail later
  • Clustering search results
  • Will be discussed in more detail later

A survey can be found in Manavoglu, Giles,
Sping, Wang, The Power of One - Gaining Business
Value from Personalization Technologies, Chapter
9
5
What this Proposal is about
  • Learn the user behavior models to capture
    individual differences
  • Group the documents at query time to capture the
    current state of available information sources
  • Use the user model to highlight or recommend
    groups of documents of interest to him/her

6
Work Done So Far
  • Recommender Systems
  • User Behavior Modeling
  • Clustering Search Results

7
Recommender Systems
  • Goal To automate the word-of-mouth
    recommendations within a community.
  • Approaches
  • Content-based filtering Recommendations are done
    solely based on the content of the items (e.g.
    topic, author, etc.)
  • Collaborative filtering Similar users are
    identified based on the items they view and then
    the items viewed by like-minded users are
    recommended to a given user Resnick 94
  • Hybrid Both content of the items and the
    similarity of the users are used in making
    recommendations Shani 2005

8
Recommender Systems -A Probablistic Approach
  • Represent the data as a collection of ordered
    sequences of document requests
  • Assume that we are given a dataset consisting of
    ordered sequences in some fixed alphabet. The
    elements of the alphabet are documents.
  • For each document define its history H as a
    so-far observed subsequence of the current
    sequence
  • Our goal is to predict the next document Dnext
    given the history H
  • Learn a probabilistic model P(Dnext H, Data)

9
Recommender Systems - Probabilistic Models
  • A mixture model based approach could be taken for
    learning the probabilistic model P(Dnext H,
    Data)
  • Nc is the number of components, P(Dnext
    H,Data,k) is the component model
  • Component distribution, P(Dnext H,Data,k),
    should make use of the sequence information and
    the learning should be scalable
  • Mixture of multinomials, mixture of markov models

10
Proposed Recommendation Model - Maximum Entropy
(maxent)
  • Maximum entropy provides a framework to combine
    information from different knowledge sources.
  • Each knowledge source imposes a set of
    constraints.
  • Intersection of all constraints contains a set of
    probability functions.
  • Maximum entropy principle chooses the one with
    the highest entropy, i.e. most flat function.
  • Advantage of maxent approach
  • Combine 1st-order Markov model features and long
    term dependencies among the data.

11
Representation for Long Term Dependence - Triggers
  • Long term dependence of Dnext on H is represented
    with triggers.
  • The pair of actions (a,b) is a trigger iff
  • is substantially different from P(Dnextb).
  • Triggers are pairs with high mutual information
    scores.

12
Maximum Entropy Model Definition
  • Maximum entropy objective function leads to the
    following model
  • Fs(D,H) are the feature indicator functions. S is
    the total number of bigramstriggers for document
    D
  • s are maxent model parameters for component k
  • If s corresponds to bigram (a,b) then Fs(D,H)1
    iff Da and Dprevb, and Fs(D,H)0 otherwise
  • Z?,k(H) is a normalization constant

13
Maximum Entropy Model -A Complication and its
Solution
  • Maxent computation is expensive if the number of
    items is large. On a dataset of 500K items it
    could take months
  • This problem can be solved if the items are
    clustered before applying the maxent model
  • We employed a user navigation based clustering
    algorithm using this simple idea documents that
    are requested consecutively are related
  • Compute the document bigrams
  • Apply greedy clustering using the bigram counts

Details of the algorithm can be found in
Pavlov, Manavoglu, Giles, Pennock, Giles 2004
14
Experimental Evaluation
  • Competing Models
  • Maxent
  • 1st Order Markov model
  • Mixture of Markov models (30-components)
  • Mixture of Multinomial (60-components, with the
    similarity based CiteSeer predictors as its
    feature set)
  • Correlation
  • Individual CiteSeer similarity-based predictors
  • CiteSeer merged-similarity-based predictor
  • We call ResearchIndex Merge the recommender that
    is obtained by pulling all similarity-based
    recommenders togetheressentially, this
    corresponds to the currently available
    recommending system in CiteSeer.

15
Current Recommendation Algorithms in CiteSeer
  • Similarity recommenders
  • Content-based text or citation similarity
  • Sentence Similarity, Text Similarity, Active
    Bibliography, Cited By, Co-citations, Cites
  • Collaborative user similarity
  • Users Who Viewed
  • Based on source similarity
  • On Same Site

For detailed information on CiteSeer
recommenders please see Lawrence, Giles,
Bollacker, 1999.
16
Experimental Setup
  • Conducted on 6 months worth of CiteSeer data in
    the form of lt time, user, request gt
  • Chronologically partitioned the data into 5
    million training and 1.7 million test requests
  • Document requests were aggregated by userID and
    sessionized. An inactivity time of 300 seconds
    was used to break the requests into sessions
  • Robots were identified and eliminated based on
    the number of requests submitted
  • Documents that appeared less than 3 times were
    discarded (250,000 documents remained)

17
Evaluation Metrics
  • Hit ratio the ratio of correct predictions to
    the total number of predictions made
  • Average height of prediction Predictions are a
    list of ranked actions, where the ranking is done
    by ordering the actions by their probability
    values, thus average height of prediction is the
    average height of the hit within this list
  • First N predictions in the list are considered
    and the performance is evaluated based on the
    success of these N predictions, for N1,,5,10
    (called bins hereafter)

18
Results
  • Among the individual CiteSeer recommenders
    Active Bibliography was the best predictor.
    However the merged predictor performed better
    than Active Bibliography hence it will represent
    the CiteSeer recommender in the remainder of the
    experiments.

19
Results
Although merged CiteSeer predictor does better
for longer recommendation lists, maxent is the
best for small number of guesses. Correlation
improves as the recommendation list gets
longer Also presented is the average height
results of the competing models. With respect to
average height maxent outperformed all the other
recommenders
20
Conclusions - Discussion
  • Maxent proved to be one of the best models
    especially when the number of recommendations to
    be made was limited
  • But clustering the documents is a crucial point
    for its feasibility.
  • Most of the experiments done in the field of
    recommender systems have been offline
    experiments. However this could be misleading,
    the effect of the recommendation on the user
    should also be taken into account. We plan to
    setup a medium for online experiments.

21
User Behavior Models
22
Motivation
  • In the previous section we focused on only the
    items users viewed. But users do much more than
    just viewing items. They browse, query, look at
    recommendations, etc.
  • Can we improve user experience by looking at
    these other actions as well?
  • Can we detect what the users are trying to do in
    a given session?

23
User Behavior Modeling
  • What data do we have about users?
  • Web logs, in clickstream format
  • How can we make use of this data?
  • By applying web usage mining techniques

24
Previous Work on Web Usage Mining
  • Frequent item set identification by association
    rule extraction methods1
  • Collaborative Filtering for recommender
    systems2,3
  • Probabilistic graphical models (e.g. Bayesian
    nets) to discover and represent dependencies
    among different variables4
  • Sequential pattern analysis to discover patterns
    in time-ordered sessions5

1Liu et al, Integrating Classification and
Association Rule Mining. Fourth International
Conference on Knowledge Discovery and Data
Mining, 1998 2Resnick et al, GroupLens. An open
architecture for Collaborative Filtering of
Netnews. ACM 1994 Conference on Computer
Supported Cooperative Work. 3Pavlov, Manavoglu,
Pennock, Giles, Collborative Filtering with
Maximum Entropy. IEEE Intelligent Systens
2004 4D. Heckerman, Bayesian Networks for Data
Mining. Data Mining and Knowledge Discovery
1997 5Cadez et al, Predictive Profiles for
Transaction Data Using Finite Mixture Models. TR,
UC Irvine 2001
25
Problem Definition
  • We propose the following sequential framework for
    modeling user behavior.
  • Each user session is represented as a sequence,
    labeled with a user ID.
  • Each individual item in the sequence represents a
    user action.
  • For each action, history H(U) is defined as the
    so-far observed ordered sequence of actions.
  • P(AnextH(U),Data) is the behavior model for user
    U that predicts the next action Anext, given the
    history H(U).

User U, time t
User U, time t1
User U, time t2
Action At Query
Action At1 Browse
Action At2 Help
H(U) empty
H(U) At
H(U) AtAt1
26
Problem Definition
  • Problem
  • To infer P(Anext H(U),Data) for each individual
    given the training data.
  • Method Proposed
  • Mixture Models

27
Why Mixture Models?
  • Mixture models can handle heterogeneous data and
    provide a clustering framework.
  • Mixture models are weighted combinations of
    component models.
  • Each component represents a dominant pattern in
    the data.
  • A global mixture model captures the general
    patterns among users.
  • A mixture model can be personalized to capture
    individual behavior patterns.

Buyer Model
Browser Model
Customer A (Frequently visits, but rarely buys)
0.2Buyer Model 0.8Browser Model
McLachlan Basford 1988, Cadez 2001
28
Mixture Models
  • Global Mixture Models
  • Parameters are same for all individuals. An Nc-
    component mixture model

29
Personalized Mixture Models
  • Learn individual component probabilities, aU,ks
    for each user. Resulting model is therefore
    specific to each user.
  • For component distribution P(AnextH(U),Data,k)
    Markov and maximum entropy distributions are used.

30
Parameter Estimation
  • Learn the parameters of the global model by
    expectation-maximization (EM) algorithm.
  • Fix the component distributions.
  • Learn the individual component weights, aU,ks,
    for known users by EM.
  • The aU,ks for users who do not appear in the
    training data will be the global aks values.

31
Visualization
  • Each component of a mixture model can be viewed
    as a cluster.
  • Each session is a weighted combination of these
    clusters.
  • Probability of user session Su belonging to
    cluster k is

Sample Cluster
  • Each session is assigned to the cluster that
    maximizes P(kSU,Data).

32
Visualization - Graphs
  • If number of actions is large, viewing and
    interpreting sample sessions becomes impossible.
  • Build a graph for each cluster.
  • Triggers and bigrams are computed and sorted for
    each cluster
  • Pairs of actions gt threshold value are used to
    build the graph
  • The sequence of actions defines the direction of
    the edges

33
Experimental Setup
  • CiteSeer1 log files of approximately 2 months.
  • Log files are series of transactions in the form
    of lttime, action, user ID, related infogt.
  • lt1014151912.455 event authorhomepage 3491213
    197248 - - ip 42779 agent Mozilla/2.0
    (compatible MSIE 3.02 Update a Windows NT)gt
  • Returning users are identified by cookies.
  • Actions are broken into sessions based on time
    heuristics (300 seconds of inactivity indicates
    the end of a session).
  • User1 Seesion1 37 31 34 31 32
  • User1 Session2 37 31 34 31 34 33 34
  • User2 Session1 31 34 33 20
  • User2 Session2 33 20
  • Robots are identified based on the number of
    requests per session and are removed.

http//citeseer.ist.psu.edu
34
CiteSeer user actions and descriptions used
during experiments
35
Experiments
  • User behavior models are evaluated based on the
    accuracy of their predictions.
  • User behavior clusters are visualized to
    demonstrate the descriptive ability of the
    models.
  • For maxent models the history was empirically set
    to the last 5 actions.

36
Hit Ratio Results Returning Users
Hit ratio results on known users for 3 component
mixture model
  • Personalized models outperformed global models.
  • Personalization significantly improved for maxent
    model.
  • Personalized Markov mixture is better for
    returning users.

Hit ratio results on known users for 10 component
mixture model
37
Hit Ratio Results All Users
Hit ratio results on all users for 3 component
mixture model
  • Maxent performs better than Markov for all users.
  • Maxent chooses the most flat function and has
    more parameters. Mixture of Markov models can
    better fit the training data (for returning
    users).

Hit ratio results on all users for 10 component
mixture model
38
User Behavior Clusters - Maxent Model Graphs
  • Cluster 4 Users Querying CiteSeer through
    another search engine
  • Cluster 6 Users who use CiteSeer query interface
  • Cluster 9 Users who look at recommendations and
    citations

39
Conclusions
  • Both mixture of maximum entropy and Markov models
    are able to generate strong predictive models.
  • Markov models are better for predicting future
    actions of returning users.
  • Maximum Entropy models perform better for
    predicting the global model and first time users.
  • Predicting future actions could be useful in
    recommending shortcuts and personalizing the user
    interface.
  • CiteSeer users significantly differ in the way
    they use the engine.
  • Mixture models can be used to discover these
    different patterns.
  • Identifying the cluster a session belongs to can
    be used to recommend certain functionalities of
    CiteSeer (e.g. bibtex, recommendations, etc).

Manavoglu, Pavlov, Giles ICDM 2003
40
Navigation Graph
41
Query Behavior Analysis(Work in Progress)
42
Query Behavior Patterns - Motivation
  • Goal
  • To investigate how the users interact with the
    query interface
  • To identify dominant patterns of querying
    behavior, which could then be used in
    recommendations
  • If the user is submitting the same query over and
    over should we show the same recommendations?
  • If the user is modifying the original query
    string persistently should we change the
    recommendation criteria to give more priority to
    diversity of the recommendations (because
    modifying the query could be an indication of
    unsatisfactory results/recommendations)?

43
Query Types
  • Query Types
  • New Query String none the terms in the query
    (excluding stop words and logical operators) were
    seen in the previous query
  • Modified Query String a partially new query,
    some terms are eliminated and/or added
  • Same Query String same as the most previous
    query string
  • Document Query Document database is queried
  • Citation Query Citation database is queried

44
Query Behavior Patterns - Methodology
  • Sequential Modeling
  • Sessionize log files
  • Consider the queries in a session as an ordered
    sequence
  • Identify the query types within each session
  • Investigate the patterns for the ordered sequence
    of query types
  • Algorithm
  • Mixture of 1st order Markov
  • Experimental Setup
  • Same as the previous section

45
Query Types - Example
  • Input
  • ltsessionID queryEvent userID queryStringgt
  • 1 documentquery uid1 online routing
  • 1 documentquery uid1 online routing lipmann
  • 1 documentquery uid1 lipmann
  • 1 documentquery uid1 Maarten Lipmann
  • 1 documentquery uid1 oltsp
  • 1 citationquery uid1 oltsp
  • Output
  • uid1 docNew docModified docModified docModified
    docNew citeOld

46
Query Behavior - Results
47
Query Behavior Analysis - Observations
  • Users tend to submit the same query more than
    once in the same session
  • Only a small portion of CiteSeer users are
    submitting citation queries (15)
  • CiteSeer users are trying to modify their queries
    instead of giving up
  • Can query modification be viewed as an indication
    to change document recommendation or ranking
    strategy?

48
Online Clustering of Search Results
49
Online Clustering of Search Results -Motivation
  • So far we looked at users behavior to improve
    user experience and retrieval quality.
  • Is it also possible to organize the data
    adaptively to improve the quality of the results?
  • Grouping topically related items together in the
    result page may improve coverage and diversity of
    the displayed results

50
Why Online?
  • An offline clustering/classification of search
    results would not be query dependent.
  • Two documents could be related for the query
    machine learning but they may not be as
    relevant for a rather more specific one machine
    learning for recommender systems.
  • The time of query changes the set of available
    documents.
  • A news search on hurricane query today and
    hurricane query a week ago would result in very
    different set of documents

51
Online Clustering of Search Results - Test Case
  • To evaluate the performance of clustering on a
    frequently updated index (with deletions as well
    as addition of documents) we chose to work on
    News articles.
  • Due to architecture of the underlying news search
    engine (Yahoo! News) the implemented solution had
    to satisfy the following conditions
  • Access to 100 documents per query
  • Work in a stateless/memoryless environment
  • Fetch original results, cluster them and return
    the clustered results in less than 300
    milliseconds

This work was done during an internship at Yahoo!
The paper is in review for confidentiality
requirements. Please ask about the details.
52
Online Clustering of Search Results - Algorithm
  • Clustering Algorithm
  • Hierarchical Agglomerative Clustering, Mixture of
    Multinomials, Cluto, KNN, Kmeans were implemented
    and compared. Hierarchical Aglomerative
    Clustering (HAC) algorithm outperformed others in
    news domain.
  • HAC algorithm works on the pairwise similarities
    of news articles.
  • Pairwise similarity matrix is computed at the
    time of query
  • The similarity of a cluster to another one is
    computed as a factor of the pairwise document
    similarities.
  • Similarity metric is an approximation of cosine
    similarity measure.
  • Stopping criteria is a threshold on the
    similarity of the clusters. The optimum threshold
    value is discovered with an exhaustive search on
    the hold-out dataset.

For a survey on Clustering algorithms see
Berkhin, 2002
53
Online Clustering of Search Results - Clustering
Algorithm Evaluation
  • HAC clustering algorithm was evaluated against
    the aforementioned algorithms and Google News
    Clusters by an editorial review.
  • 300 queries were chosen in random from that days
    user logs.
  • Reviewers were shown 4 documents per cluster.
  • Results of 2 different algorithms were displayed
    side by side and reviewers were asked to chose
    between the 2.
  • HAC was shown to outperform the rest of the
    algorithms (the difference was statistically
    significant)

54
Online Clustering of Search Results - Naming the
Clusters
  • Users may not realize how and why the documents
    were grouped together. Having names for clusters
    serving as short descriptions would help users
    process and navigate through the result set
    easier.
  • Common naming approaches
  • Choose the most representative article for each
    cluster and use its title as the cluster label
    (Google news search http//news.google.com)
  • Find the most representative phrases within the
    documents belonging to each cluster (Clusty
    http//news.clusty.com/)

For Keyphrase Extraction see Zha, SIGIR 2002
55
Online Clustering of Search Results - Naming the
Clusters
  • Observations
  • News articles are usually formulated to answer
    more than one of the following questions where,
    when, who, what and how.
  • Articles belonging to the same cluster may be
    sharing answers to only some of these questions.
  • Goal
  • We want to partition the titles into substrings
    corresponding to one (or more) of these
    questions. And then choose among these substrings
    the most common one.

56
Online Clustering of Search Results - Naming the
Clusters
  • Methodology
  • Use POS tags to find the boundaries between
    compact substrings.
  • Break the sentence into smaller parts at these
    points and generate a candidate set composed of
    these shorter substrings and their continuous
    concatenations.
  • Example
  • Title Wilma forces Caribbean tourists to flee
  • Substrings Wilma, Caribbean tourists, flee
  • Candidates Wilma, Wilma forces Caribbean
    tourists, Caribbean tourists, Wilma forces
    Caribbean tourists to flee, flee, Caribbean
    tourists to flee
  • Score the candidates based on coverage, frequency
    or length or a combination of these metrics.
  • Choose the candidate with the highest score to be
    the cluster label.

57
Online Clustering of Search Results - Naming the
Clusters
  • Scoring Algorithms
  • Coverage Score A metric to measure how well the
    candidate covers the information in all titles.
  • Frequency Score Is the frequency score for the
    full string.
  • Length normalized frequency Score Candidates
    with less number of words will have more
    frequencies. To avoid this bias towards shorter
    candidates we normalize the frequency scores
    based on the number of words they include

58
Online Clustering of Search Results - Naming
Evaluation
  • 160 queries were chosen at random from the daily
    query logs. The results of these queries were
    clustered and saved.
  • Candidate names for each cluster were generated
    offline. Candidates included titles as well.
  • 8 users were asked to either select from the list
    of candidates the best label or type in their own
    choice if they could not find an appropriate
    candidate.
  • At least 3 judgments were collected per query.
  • 55 of the names chosen by reviewers were titles.
  • Experimental results showed that length
    normalized frequency score outperforms the others
    (60 match with user labels).
  • In the case of ties, coverage scores were found
    to be the best tie-breaker.

59
Work In Progress
  • Further query analysis
  • What are the users searching for Titles,
    abstracts, references, publication dates, venues?
  • Are the users viewing/downloading on linked
    documents, or not?
  • Investigating community based adaptation
    strategies
  • Designing a framework for online evaluation of
    recommender systems
  • Such a platform for CiteSeer was designed by
    Cosley et al (called REFEREE) in its early days.
    Our goal is to make it scale to and be available
    in the next generation CiteSeer.

Recently published paper on discovering
E-communities Zhou, Manavoglu, Li, Giles, Zha in
WWW 2006
60
My Proposal
  • MyCiteSeer
  • A unified adaptive information retrieval engine
    where document, action and query behaviors are
    all factored in

61
How?
  • Action Model selects the modules and shortcuts to
    be displayed
  • user U is interested in only users who viewed
    document recommendations gt only this document
    recommendation module will be shown on document
    page
  • Query Model and Action Model selects the query
    interface
  • user U searches for titles only and looks at more
    than 20 results gt the default query field is
    title and results are clustered
  • Document model selects the content with input
    from Action and Query models.
  • Action and Query models act as implicit feedback

62
Evaluation?
  • A user study would give the most reliable and
    comprehensive assessment of the system.
    Individual components as well as the overall
    system can be evaluated.
  • Some form of implicit feedback can also be used
    for evaluation.
  • Clickthrough rate on results page, number of
    actions taken to access a given document/page,
    time spent on completing certain tasks.
  • If used together with some explicit user feedback
    (through random questionnaires, ratings, or
    feedback via email, blogs or forums) the
    appropriate metrics could be identified and used.
Write a Comment
User Comments (0)
About PowerShow.com