Rank Aggregation Methods II Experiments - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Rank Aggregation Methods II Experiments

Description:

Rank Aggregation Methods II Experiments CS728 Lecture 12 Recall the Rank Aggregation Problem m candidates (a.k.a. alternatives ) M = {1, ,m}: set of candidates ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 20
Provided by: 6649195
Learn more at: https://eecs.ceas.uc.edu
Category:

less

Transcript and Presenter's Notes

Title: Rank Aggregation Methods II Experiments


1
Rank Aggregation Methods IIExperiments
  • CS728
  • Lecture 12

2
Recall the Rank Aggregation Problem
  • m candidates (a.k.a. alternatives)
  • M 1,,m set of candidates
  • n voters (a.k.a. agents or judges)
  • N 1,,n set of voters
  • Each voter i, has an ranking ?i on M
  • ?i(a) lt ?i(b) means i-th voter prefers a to b
  • Ranking may be a total or partial order
  • The rank aggregation problem
  • Combine ?1,,?n into a single ranking ? on M,
    which represents the social choice of the
    voters.
  • Rank aggregation function f(?1,,?n) ?
  • ? may be a total or partial order

3
Experiments Distance Measures
  • Goal Quantitatively compare different rank
    aggregation methods.
  • Performance Measures
  • (1) Spearman footrule distance is sum of
    pointwise distances. It is normalized by dividing
    this number by the maximum value (1/2)S2, value
    between 0 and 1.
  • (2) Kendall tau distance counts the number of
    pairwise disagreements. Dividing by the maximum
    possible value (1/2)S(S - 1) we obtain a
    normalized version, value between 0 and 1.
  • (3) The induced footrule distance is obtained by
    taking the projections of a full list s with each
    partial list. In a similar manner, induced
    Kendall tau distance can be defined.
  • (4) The scaled footrule distance weights
    contributions of elements based on the length of
    the lists they are present in. If s is a full
    list and t is a partial list, then
  • SF(s, t) Sum  s(i)/s) - (t(i)/t) .
    Normalize SF by dividing by t/2.

4
Experiments Distance Measures
  • So for each aggregation method and each distance
    measure we get a vector of values, each component
    representing a distance to from the aggregation
    to each voter list
  • Simplest is to take the average (or 1-norm)
  • Other norms are interesting
  • Mean square distance (2-norm)
  • Max distance (8-norm)

5
Experiments Minimizing AverageAltavista (AV),
Alltheweb (AW), Excite (EX), Google (GG), Hotbot
HB),Lycos (LY), and Northernlight (NL)
K Kendall distance
SF scaled footrule distance IF induced
footrule distance LK Local
Kemenization
6
Experiments in Spam Filtering
  • Define spam to be web pages are low-ranked by
    majority opinion (machine and human a
    simplifying assumption) although they may be
    highly ranked by some search engines
  • Intuition if a page spams most search engines
    for a particular query, then no combination of
    these search engines can filter the
    spam.---garbage in, garbage out.
  • Spam pages are the Condorcet losers, and will
    occupy the bottom of ranking that satisfies the
    extended Condorcet criterion
  • Similarly, good pages will be in the Condorcet
    winners, and will rank above the losers.

7

Condorcet Criteria
  • Condorcet Criterion
  • An candidate of M which wins every other in
    pairwise simple majority voting should be ranked
    first.
  • Extended Condorcet Criterion (XCC)
  • Version 1 If most voters prefer candidate a to
    candidate b (i.e., of i s.t. ?i(a) lt ?i(b) is
    at least n/2), then also ? should prefer a to b
    (i.e., ?(a) lt ?(b)).
  • Version 2 If there is a partition (W, L) of M
    such that for any x in W and y in L the majority
    prefers x to y, then x must be ranked above y. W
    is called Condorcet winners and L is Condorcet
    losers

8
XCC(2) and SPAM Filtering
  • Note that XCC(1) gt XCC(2), so Version 1 is
    stronger
  • But XCC(1) is not always realizable
  • As we will see XCC(2) is always realizable via
    Local Keminization
  • Hence using rank aggregation with XCC(2) should
    assist in SPAM filtering, since Condorcet losers
    will be lowest rank
  • Let us look at where spam pages (human
    determined) are ranked with good aggregation
    methods.

9
Experiments Filtering SPAM
10
Experiment Word association
  • Different search engines and portals have
    different (default) semantics of handling a
    multi-word query.
  • Some use OR semantics (documents contain one of
    the given query terms) while Google uses the AND
    semantics (all the query words must appear). Both
    inconvenient in many situations.
  • Consider searching for the job of a software
    engineer from an on-line job database. The user
    lists a number of skills and a number of
    potential keywords in the job description, for
    example, "Silicon Valley C Java CORBA TCP-IP
    algorithms start-up pre-IPO stock options". It is
    clear that the "AND" rule might produce no
    document or SPAM, and the "OR" rule is equally
    disastrous.
  • Experiment with rank aggregation using multiple
    queries based on small subsets of terms.

11
  • Results for query madras madurai coimbatore
    vellore.  (cities in the state of Tamil Nadu,
    India)   
  • Google www.mssrf.org/Fris9809/location-tamilnadu.h
    tml  www.indiaplus.com/Info/schools.html 
    www.focustamilnadu.com/tamilnadu/Policy20Note
    ...Forests.html  www.tn.gov.in/policy/environ.htm
      www.indiacolleges.com/Tamil_Nadu.htm 
  • SFO with LK www.madurai.com  www.ozemail.com.au/c
    lday/locations.htm  www.utoledo.edu/homepages/spe
    elam/coimbatore.html  www.ozemail.com.au/clday/ma
    dras.htm  www.madurai.com/around.htm 
    www.indiatraveltimes.com/tamilnadu/tamil1.html 
  • MC4 with LK www.madurai.com  www.surfindia.com/om
    sakthi/tourism.htm  www.indiatraveltimes.com/tami
    lnadu/tamil1.html  www.indiatraveltimes.com/tamil
    nadu/tamil2.html  www.indiatravels.com/forts/vell
    ore_fort.htm  www.india-tourism.de/english/south/
    tamil_nadu.html 
  •  
  •  

12
Locally Kemeny optimal aggregation and XCC(2)
  • Many of existing aggregation methods do not
    satisfy XCC(1) or XCC(2).
  • It is possible to use your favorite aggregation
    method to obtain a
    full list. Then apply local kemenization to
    realize XCC(2) which filters Condorcet losers.

13
Locally Kemeny optimal
  • Recall that Kemeny optimal is NP-hard
  • Definition of locally optimalA permutation p is
    a locally Kemeny optimal aggregation of partial
    lists t1, t2, ..., tk, if there is no permutation
    p' that can be obtained from p by performing a
    single transposition of an adjacent pair of
    elements and for which  Kendal distance
  • K(p', t1, t2, ..., tk) lt K(p, t1, t2, ..., tk).
  • In other words, it is impossible to reduce the
    total distance to the t's by flipping an adjacent
    pair.

14
Example of LKO but not KO
  • Example 1
  • t1 (1,2), t2 (2,3), t3 t4 t5 (3,1).
  • p (1,2,3), We have that p satisfies
    Definition of LKO, K(p, t1, t2, ..., t5) 3, but
    transposing 1 and 3 decreases the sum to 2.

15
LKO satisfies XCC(2)
  • Proof by contradiction If the result is false
    then there exist partial lists t1, t2, ..., tk, a
    LKO aggregation p, and a partition (W,L) that
    violates XCC(2) that is some pair c in W and d
    in L, such that p(d) lt p(c). Let (c,d) be the
    closest such pair in p.
  • Consider the immediate successor of d in p, call
    it e. If ec then c is adjacent to d in p and
    transposing this adjacent pair of alternatives
    produces a p' such that K(p', t1, t2, ..., tk) lt
    K(p, t1, t2, ..., tk), contradicting the
    assumption on p.
  • If e does not equal c, then either e is in W, in
    which case the pair (e,d) is a closer pair in p
    than (d,c) and also violates the XCC(2), or e is
    in L, in which case (e,c) is a closer pair than
    (d,c) that violates XCC(2). Both cases contradict
    the choice of (d,c).

16
Local Kemenization procedure
  • A local Kemenization of a full list with respect
    to preference lists so as to compute a locally
    Kemeny optimal aggregation that is maximally
    consistent with original.
  • This approach
  • (1) preserves the strengths of the initial
    aggregation
  • (2) ranks non-spam above spam.
  • (3) gives a result that disagrees with original
    on any pair (i, j) only if a majority
    endorse this disagreement.
  • (4) for every d, 1 d µ , the restriction
    of the output is a local Kemenization of the top
    d elements of µ

17
Local Kemenization procedure
  • A simple inductive construction.
  • Assume inductively for that we have constructed
    p, a local Kemenization of the projection of the
    t's onto the elements 1, ..., l-1.
  • Insert next element x into the lowest-ranked
    "permissible" position in p just below the
    lowest-ranked element y in p such that
  • (a) no majority among the (original) t's prefers
    x to y and
  • (b) for all successors z of y in p there is a
    majority that prefers x to z.
  • In other words, we try to insert x at the end
    (bottom) of the list p we bubble it up toward
    the top of the list as long as a majority of the
    t's insists that we do.

18
Example local kemenization procedure
  • Local Kemenization Example!

A B F E C D
B C A E F D
A C F D E B
B F D C A E
C A B F E D
B A DC E F
B
B A
A B
A B D
A B DC
A B CD
A B CF E D
disagree
AgtB 3 AltB 2
BgtD 4 BltD 1
19
RA and Searching Workplace Web
  • Axiom 1 Intranet documents are not spam
  • Axiom 2 Queries usually have unique answers (not
    broad topic based)
  • Axiom 3 Intranet docs are not search engine
    friendly (docs are accessed through portals and
    database queries
  • Rank aggregation allows us to combine number of
    heuristic alternatives static and dynamic, query
    dependent and independent
Write a Comment
User Comments (0)
About PowerShow.com