Rank Aggregation Methods II Experiments - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Rank Aggregation Methods II Experiments

Description:

Rank Aggregation Methods II Experiments CS728 Lecture 12 Recall the Rank Aggregation Problem m candidates (a.k.a. alternatives ) M = {1, ,m}: set of candidates ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 20

Provided by: 6649195

Learn more at: https://eecs.ceas.uc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Rank Aggregation Methods II Experiments

1
Rank Aggregation Methods IIExperiments

CS728
Lecture 12

2
Recall the Rank Aggregation Problem

m candidates (a.k.a. alternatives)
M 1,,m set of candidates
n voters (a.k.a. agents or judges)
N 1,,n set of voters
Each voter i, has an ranking ?i on M
?i(a) lt ?i(b) means i-th voter prefers a to b
Ranking may be a total or partial order
The rank aggregation problem
Combine ?1,,?n into a single ranking ? on M,
which represents the social choice of the
voters.
Rank aggregation function f(?1,,?n) ?
? may be a total or partial order

3
Experiments Distance Measures

Goal Quantitatively compare different rank
aggregation methods.
Performance Measures
(1) Spearman footrule distance is sum of
pointwise distances. It is normalized by dividing
this number by the maximum value (1/2)S2, value
between 0 and 1.
(2) Kendall tau distance counts the number of
pairwise disagreements. Dividing by the maximum
possible value (1/2)S(S - 1) we obtain a
normalized version, value between 0 and 1.
(3) The induced footrule distance is obtained by
taking the projections of a full list s with each
partial list. In a similar manner, induced
Kendall tau distance can be defined.
(4) The scaled footrule distance weights
contributions of elements based on the length of
the lists they are present in. If s is a full
list and t is a partial list, then
SF(s, t) Sum s(i)/s) - (t(i)/t) .
Normalize SF by dividing by t/2.

4
Experiments Distance Measures

So for each aggregation method and each distance
measure we get a vector of values, each component
representing a distance to from the aggregation
to each voter list
Simplest is to take the average (or 1-norm)
Other norms are interesting
Mean square distance (2-norm)
Max distance (8-norm)

5
Experiments Minimizing AverageAltavista (AV),
Alltheweb (AW), Excite (EX), Google (GG), Hotbot
HB),Lycos (LY), and Northernlight (NL)
K Kendall distance
SF scaled footrule distance IF induced
footrule distance LK Local
Kemenization
6
Experiments in Spam Filtering

Define spam to be web pages are low-ranked by
majority opinion (machine and human a
simplifying assumption) although they may be
highly ranked by some search engines
Intuition if a page spams most search engines
for a particular query, then no combination of
these search engines can filter the
spam.---garbage in, garbage out.
Spam pages are the Condorcet losers, and will
occupy the bottom of ranking that satisfies the
extended Condorcet criterion
Similarly, good pages will be in the Condorcet
winners, and will rank above the losers.

7

Condorcet Criteria

Condorcet Criterion
An candidate of M which wins every other in
pairwise simple majority voting should be ranked
first.
Extended Condorcet Criterion (XCC)
Version 1 If most voters prefer candidate a to
candidate b (i.e., of i s.t. ?i(a) lt ?i(b) is
at least n/2), then also ? should prefer a to b
(i.e., ?(a) lt ?(b)).
Version 2 If there is a partition (W, L) of M
such that for any x in W and y in L the majority
prefers x to y, then x must be ranked above y. W
is called Condorcet winners and L is Condorcet
losers

8
XCC(2) and SPAM Filtering

Note that XCC(1) gt XCC(2), so Version 1 is
stronger
But XCC(1) is not always realizable
As we will see XCC(2) is always realizable via
Local Keminization
Hence using rank aggregation with XCC(2) should
assist in SPAM filtering, since Condorcet losers
will be lowest rank
Let us look at where spam pages (human
determined) are ranked with good aggregation
methods.

9
Experiments Filtering SPAM
10
Experiment Word association

Different search engines and portals have
different (default) semantics of handling a
multi-word query.
Some use OR semantics (documents contain one of
the given query terms) while Google uses the AND
semantics (all the query words must appear). Both
inconvenient in many situations.
Consider searching for the job of a software
engineer from an on-line job database. The user
lists a number of skills and a number of
potential keywords in the job description, for
example, "Silicon Valley C Java CORBA TCP-IP
algorithms start-up pre-IPO stock options". It is
clear that the "AND" rule might produce no
document or SPAM, and the "OR" rule is equally
disastrous.
Experiment with rank aggregation using multiple
queries based on small subsets of terms.

Results for query madras madurai coimbatore
vellore. (cities in the state of Tamil Nadu,
India)
Google www.mssrf.org/Fris9809/location-tamilnadu.h
tml www.indiaplus.com/Info/schools.html
www.focustamilnadu.com/tamilnadu/Policy20Note
...Forests.html www.tn.gov.in/policy/environ.htm
www.indiacolleges.com/Tamil_Nadu.htm
SFO with LK www.madurai.com www.ozemail.com.au/c
lday/locations.htm www.utoledo.edu/homepages/spe
elam/coimbatore.html www.ozemail.com.au/clday/ma
dras.htm www.madurai.com/around.htm
www.indiatraveltimes.com/tamilnadu/tamil1.html
MC4 with LK www.madurai.com www.surfindia.com/om
sakthi/tourism.htm www.indiatraveltimes.com/tami
lnadu/tamil1.html www.indiatraveltimes.com/tamil
nadu/tamil2.html www.indiatravels.com/forts/vell
ore_fort.htm www.india-tourism.de/english/south/
tamil_nadu.html

12
Locally Kemeny optimal aggregation and XCC(2)

Many of existing aggregation methods do not
satisfy XCC(1) or XCC(2).
It is possible to use your favorite aggregation
method to obtain a
full list. Then apply local kemenization to
realize XCC(2) which filters Condorcet losers.

13
Locally Kemeny optimal

Recall that Kemeny optimal is NP-hard
Definition of locally optimalA permutation p is
a locally Kemeny optimal aggregation of partial
lists t1, t2, ..., tk, if there is no permutation
p' that can be obtained from p by performing a
single transposition of an adjacent pair of
elements and for which Kendal distance
K(p', t1, t2, ..., tk) lt K(p, t1, t2, ..., tk).
In other words, it is impossible to reduce the
total distance to the t's by flipping an adjacent
pair.

14
Example of LKO but not KO

Example 1
t1 (1,2), t2 (2,3), t3 t4 t5 (3,1).
p (1,2,3), We have that p satisfies
Definition of LKO, K(p, t1, t2, ..., t5) 3, but
transposing 1 and 3 decreases the sum to 2.

15
LKO satisfies XCC(2)

Proof by contradiction If the result is false
then there exist partial lists t1, t2, ..., tk, a
LKO aggregation p, and a partition (W,L) that
violates XCC(2) that is some pair c in W and d
in L, such that p(d) lt p(c). Let (c,d) be the
closest such pair in p.
Consider the immediate successor of d in p, call
it e. If ec then c is adjacent to d in p and
transposing this adjacent pair of alternatives
produces a p' such that K(p', t1, t2, ..., tk) lt
K(p, t1, t2, ..., tk), contradicting the
assumption on p.
If e does not equal c, then either e is in W, in
which case the pair (e,d) is a closer pair in p
than (d,c) and also violates the XCC(2), or e is
in L, in which case (e,c) is a closer pair than
(d,c) that violates XCC(2). Both cases contradict
the choice of (d,c).

16
Local Kemenization procedure

A local Kemenization of a full list with respect
to preference lists so as to compute a locally
Kemeny optimal aggregation that is maximally
consistent with original.
This approach
(1) preserves the strengths of the initial
aggregation
(2) ranks non-spam above spam.
(3) gives a result that disagrees with original
on any pair (i, j) only if a majority
endorse this disagreement.
(4) for every d, 1 d µ , the restriction
of the output is a local Kemenization of the top
d elements of µ

17
Local Kemenization procedure

A simple inductive construction.
Assume inductively for that we have constructed
p, a local Kemenization of the projection of the
t's onto the elements 1, ..., l-1.
Insert next element x into the lowest-ranked
"permissible" position in p just below the
lowest-ranked element y in p such that
(a) no majority among the (original) t's prefers
x to y and
(b) for all successors z of y in p there is a
majority that prefers x to z.
In other words, we try to insert x at the end
(bottom) of the list p we bubble it up toward
the top of the list as long as a majority of the
t's insists that we do.

18
Example local kemenization procedure

Local Kemenization Example!

A B F E C D
B C A E F D
A C F D E B
B F D C A E
C A B F E D
B A DC E F
B
B A
A B
A B D
A B DC
A B CD
A B CF E D
disagree
AgtB 3 AltB 2
BgtD 4 BltD 1
19
RA and Searching Workplace Web

Axiom 1 Intranet documents are not spam
Axiom 2 Queries usually have unique answers (not
broad topic based)
Axiom 3 Intranet docs are not search engine
friendly (docs are accessed through portals and
database queries
Rank aggregation allows us to combine number of
heuristic alternatives static and dynamic, query
dependent and independent

Write a Comment

User Comments (0)