Title: Algorithms%20for%20Large%20Data%20Sets
1Algorithms for Large Data Sets
Lecture 7 April 20, 2005
http//www.ee.technion.ac.il/courses/049011
2Rank Aggregation
3Outline
- The rank aggregation problem
- Applications
- Desired properties
- Arrows impossibility theorem
- Rank aggregation methods
4The Rank Aggregation Problem
- m candidates (a.k.a. alternatives)
- M 1,,m set of candidates
- n voters (a.k.a. agents or judges)
- N 1,,n set of voters
- Each voter i, has an ranking ?i on M
- ?i(a) lt ?i(b) means i-th voter prefers a to b
- Ranking may be a total or partial order
- The rank aggregation problem
- Combine ?1,,?n into a single ranking ? on M,
which represents the social choice of the
voters. - Rank aggregation function f(?1,,?n) ?
- ? may be a total or partial order
-
5Examples
- m small, n large elections (multi-party
parliament, academies, boards,...) - m modest, n small program committees, sports
- m large, n small meta-search, travel plans,
restaurant selection
6Applications to Web Search
- Meta search
- Combine results of different search engines into
a better overall ranking - Combat spam
- Spam results unlikely to rank high in aggregate
ranking, even though they can rank high in one or
two search engines. - Search for multiple terms
- AND bad recall
- OR bad precision
- Complex boolean queries too complicated for
average user - Solution search for small subsets of terms and
aggregate results - Combine multiple ranking functions
- Use different ranking functions (e.g., VSM,
PageRank, HITS, ) and aggregate them into a
single function
7Applications to Databases
- Rank items in a database according to multiple
criteria - Ex Choose a restaurant by cuisine, distance,
price, quality, etc. - Ex Choose a flight ticket by price, of stops,
date and time, frequent flier bonuses, etc.
8Desired Properties Unanimity
- Unanimity (a.k.a. Pareto optimality)
- If all voters prefer candidate a to candidate b
(i.e., ?i(a) lt ?i(b) for all i), then also ?
should prefer a to b (i.e., ?(a) lt ?(b)).
a c a
c a b
b b c
ab 30
9Desired Properties Condorcet
- Condorcet Criterion Condorcet, 1785
- Condorcet winner a candidate a, which is
preferred by most voters to any other candidate b
(i.e., for all b, of i s.t. ?i(a) lt ?i(b) is at
least n/2). - Condorcet criterion If Condorcet winner exists,
? should rank it first (i.e., ?(a) 1).
c b a
a a b
b c c
c b a
a c b
b a c
ab 21, ac 21
No Condorcet winner
10Desired Properties XCC
- Extended Condorcet Criterion (XCC)
- If most voters prefer candidate a to candidate b
(i.e., of i s.t. ?i(a) lt ?i(b) is at least
n/2), then also ? should prefer a to b (i.e.,
?(a) lt ?(b)). - Not always realizable
c b a
a a b
b c c
c b a
a c b
b a c
?(a) lt ?(b) lt ?(c)
Not realizable
11XCC and Spam Dwork et al. 2001
- Definition a page p is said spam to a ranking
?, if there is a page q ranked lower than p,
which most human evaluators will think should be
ranked higher than p. - Assumption for any two pages p,q, majority of
human evaluators agrees with majority of search
engine rankings on the order of p,q. - Conclusion
- Spam pages are always Condorcet losers
- If rank aggregation function respects XCC, it
eliminates spam.
12Desired Properties Independence from Irrelevant
Alternatives
- Independence from Irrelevant Alternatives
- Relative order of a and b in ? should depend
only on relative order of a and b in ?1,,?n. - Ex if ?i (a b c) changes to (a c b), relative
order of a,b in ? should not change. -
13Desired Properties Neutrality and Anonymity
- Neutrality
- No candidate should be favored to others.
- If two candidates switch positions in ?1,,?n,
they should switch positions also in ?. - Anonymity
- No voter should be favored to others.
- If two voters switch their orderings, ? should
remain the same.
14Desired Properties Monotonicity and Consistency
- Monotonicity
- If the ranking of a candidate is improved by a
voter, its ranking in ? can only improve. - Consistency
- If voters are split into two disjoint sets, S
and T, and both the aggregation of voters in S
and the aggregation of voters in T prefer a to b,
then also the aggregation of all voters should
prefer a to b.
15Dictatorship and Democracy
- Dictatorship f(?1,,?n) ?i
- Democracy (a.k.a. Majoritian aggregation)
- Use extended Condorcet Criterion to rank
candidates. - Always works for m 2.
- Not always realizable for m 3.
- Theorem May, 1952 For m 2, Democracy is the
only rank aggregation function which is monotone,
neutral, and anonymous.
16Arrows Impossibility Theorem Arrow, 1951
- Theorem If m 3, then the only rank aggregation
function that is unanimous and independent from
irrelevant alternatives is dictatorship. - Won Nobel prize (1972)
17Positional Rank Aggregation Methods
- Plurality
- score(a) of voters who chose a as 1
- ? order candidates by decreasing scores
- Top-k approval
- score(a) of voters who chose a as one of the
top k - ? order candidates by decreasing scores
- Bordas rule Borda, 1781
- score(a) ?i ?i(a)
- ? order candidates by increasing scores
- Violate independence from irrelevant alternatives
18Positional Methods Example
b c a a
d b d b
c d c c
a a b d
Borda Top-2 Approval Plurality
114410 2 2 a
24219 3 1 b
331310 1 1 c
423211 2 0 d
19Optimal Rank Aggregation
- d distance measure among rankings
- Definition The optimal rank aggregation for
?1,,?n w.r.t. d is the ranking ? which minimizes
?i d(?,?i).
?1
?n
?2
?
20Distance Measures
- Kendall tau distance (a.k.a. bubble sort
distance) - K(?,?) of pairs of candidates (a,b) on which
? and ? disagree - Ex K( (a b c d), (a d c b)) 0 2 1 3
- Spearman footrule distance
- F(?,?) ?a ?(a) - ?(a)
- Ex F((a b c d), (a d c b)) 0 2 0 2 4
21Kemeny Optimal AggregationKemeny 1959
- Optimal aggregation w.r.t. Kendall-tau distance
- Theorem Young Levenglick, 1978 Truchon
1998 Kemeny optimal aggregation is the only
rank aggregation function, which is neutral,
consistent, and satisfies the Extended Condorcet
principle. - Effective for fighting spam
- Generative model
- ? is the correct ranking
- ?1,,?n are generated from ? by swapping every
pair with probability lt ½. - Then Kemeny optimal aggregation gives the
maximum likelihood ? given ?1,,?n. Young 1988
22Complexity of Kemeny Optimal Aggregation
- NP-hard, even for n 4 Dwork et al. 2001
- In P, for n 2.
- Unknown for n 3.
- Can be approximated using Spearman footrule
- Proposition Diaconis-Graham
- K(?,?) F(?,?) 2 K(?,?)
- What is the complexity of footrule optimal
aggregation?
23Footrule Optimal Aggregation
- Theorem Dwork et al. 2001
- Footrule optimal aggregation can be computed in
polynomial time. - Proof
- Want to find ? which minimizes ?i ?a ?(a) -
?i(a) - Define a weight bipartite graph G (L,R,W) as
follows - L M (the candidates)
- R 1,,m the available ranks
- W(a,r) ?i r - ?i(a)
- A matching in G ranking
- Cost of a matching ?i ?a ?(a) - ?i(a)
- Hence, reduced to finding a minimum cost matching
in a bipartite graph
24Local Kemenization Dwork et al. 2001
- Definition A ranking ? is locally Kemeny optimal
aggregation for ?1,,?n if there is no other
ranking ?, which - Can be obtained from ? by flipping one pair
- Satisfies ?i K(?, ?i) lt ?i K(?,?i)
- Features
- Every Kemeny optimal aggregation is also locally
Kemeny optimal, but converse is not necessarily
true. - Locally Kemeny optimal aggregations satisfy XCC.
- Locally Kemeny optimal aggregations can be
computed in O(n m log m) time.
25Markov Chain TechniquesDwork et al. 2001
- Markov Chain states candidates
- Transitions depend on the voter rankings
- Basic idea probabilistically switch to a
better candidate - Final ranking induced by stationary distribution
26Four MC Methods
- Current state is candidate a.
- MC1 Choose uniformly from multiset of all
candidates that were ranked at least as high as a
by some voter. - Probability to stay at a average rank of a.
- MC2 Choose a voter i u.a.r. and pick u.a.r. from
among the candidates that the i-th voter ranked
at least as high as a. - MC3 Choose a voter i u.a.r. and pick u.a.r. a
candidate b. If i-th voter ranked b higher than
a, go to b. Otherwise, stay in a. - MC4 Choose a candidate b u.a.r. If most voters
ranked b higher than a, go to b. Otherwise, stay
in a. - Rank of a of pairwise contests a wins.
27End of Lecture 7