Algorithms%20for%20Large%20Data%20Sets

About This Presentation

Title:

Algorithms%20for%20Large%20Data%20Sets

Description:

The rank aggregation problem: ... ranking functions (e.g., VSM, PageRank, HITS, ...) and aggregate them into a single function ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 28

Provided by: zivbar

Category:

more less

Transcript and Presenter's Notes

Title: Algorithms%20for%20Large%20Data%20Sets

1
Algorithms for Large Data Sets

Ziv Bar-Yossef

Lecture 7 April 20, 2005
http//www.ee.technion.ac.il/courses/049011
2
Rank Aggregation
3
Outline

The rank aggregation problem
Applications
Desired properties
Arrows impossibility theorem
Rank aggregation methods

4
The Rank Aggregation Problem

m candidates (a.k.a. alternatives)
M 1,,m set of candidates
n voters (a.k.a. agents or judges)
N 1,,n set of voters
Each voter i, has an ranking ?i on M
?i(a) lt ?i(b) means i-th voter prefers a to b
Ranking may be a total or partial order
The rank aggregation problem
Combine ?1,,?n into a single ranking ? on M,
which represents the social choice of the
voters.
Rank aggregation function f(?1,,?n) ?
? may be a total or partial order

5
Examples

m small, n large elections (multi-party
parliament, academies, boards,...)
m modest, n small program committees, sports
m large, n small meta-search, travel plans,
restaurant selection

6
Applications to Web Search

Meta search
Combine results of different search engines into
a better overall ranking
Combat spam
Spam results unlikely to rank high in aggregate
ranking, even though they can rank high in one or
two search engines.
Search for multiple terms
AND bad recall
OR bad precision
Complex boolean queries too complicated for
average user
Solution search for small subsets of terms and
aggregate results
Combine multiple ranking functions
Use different ranking functions (e.g., VSM,
PageRank, HITS, ) and aggregate them into a
single function

7
Applications to Databases

Rank items in a database according to multiple
criteria
Ex Choose a restaurant by cuisine, distance,
price, quality, etc.
Ex Choose a flight ticket by price, of stops,
date and time, frequent flier bonuses, etc.

8
Desired Properties Unanimity

Unanimity (a.k.a. Pareto optimality)
If all voters prefer candidate a to candidate b
(i.e., ?i(a) lt ?i(b) for all i), then also ?
should prefer a to b (i.e., ?(a) lt ?(b)).

a c a
c a b
b b c
ab 30
9
Desired Properties Condorcet

Condorcet Criterion Condorcet, 1785
Condorcet winner a candidate a, which is
preferred by most voters to any other candidate b
(i.e., for all b, of i s.t. ?i(a) lt ?i(b) is at
least n/2).
Condorcet criterion If Condorcet winner exists,
? should rank it first (i.e., ?(a) 1).

c b a
a a b
b c c
c b a
a c b
b a c
ab 21, ac 21
No Condorcet winner
10
Desired Properties XCC

Extended Condorcet Criterion (XCC)
If most voters prefer candidate a to candidate b
(i.e., of i s.t. ?i(a) lt ?i(b) is at least
n/2), then also ? should prefer a to b (i.e.,
?(a) lt ?(b)).
Not always realizable

c b a
a a b
b c c
c b a
a c b
b a c
?(a) lt ?(b) lt ?(c)
Not realizable
11
XCC and Spam Dwork et al. 2001

Definition a page p is said spam to a ranking
?, if there is a page q ranked lower than p,
which most human evaluators will think should be
ranked higher than p.
Assumption for any two pages p,q, majority of
human evaluators agrees with majority of search
engine rankings on the order of p,q.
Conclusion
Spam pages are always Condorcet losers
If rank aggregation function respects XCC, it
eliminates spam.

12
Desired Properties Independence from Irrelevant
Alternatives

Independence from Irrelevant Alternatives
Relative order of a and b in ? should depend
only on relative order of a and b in ?1,,?n.
Ex if ?i (a b c) changes to (a c b), relative
order of a,b in ? should not change.

13
Desired Properties Neutrality and Anonymity

Neutrality
No candidate should be favored to others.
If two candidates switch positions in ?1,,?n,
they should switch positions also in ?.
Anonymity
No voter should be favored to others.
If two voters switch their orderings, ? should
remain the same.

14
Desired Properties Monotonicity and Consistency

Monotonicity
If the ranking of a candidate is improved by a
voter, its ranking in ? can only improve.
Consistency
If voters are split into two disjoint sets, S
and T, and both the aggregation of voters in S
and the aggregation of voters in T prefer a to b,
then also the aggregation of all voters should
prefer a to b.

15
Dictatorship and Democracy

Dictatorship f(?1,,?n) ?i
Democracy (a.k.a. Majoritian aggregation)
Use extended Condorcet Criterion to rank
candidates.
Always works for m 2.
Not always realizable for m 3.
Theorem May, 1952 For m 2, Democracy is the
only rank aggregation function which is monotone,
neutral, and anonymous.

16
Arrows Impossibility Theorem Arrow, 1951

Theorem If m 3, then the only rank aggregation
function that is unanimous and independent from
irrelevant alternatives is dictatorship.
Won Nobel prize (1972)

17
Positional Rank Aggregation Methods

Plurality
score(a) of voters who chose a as 1
? order candidates by decreasing scores
Top-k approval
score(a) of voters who chose a as one of the
top k
? order candidates by decreasing scores
Bordas rule Borda, 1781
score(a) ?i ?i(a)
? order candidates by increasing scores
Violate independence from irrelevant alternatives

18
Positional Methods Example
b c a a
d b d b
c d c c
a a b d
Borda Top-2 Approval Plurality
114410 2 2 a
24219 3 1 b
331310 1 1 c
423211 2 0 d
19
Optimal Rank Aggregation

d distance measure among rankings
Definition The optimal rank aggregation for
?1,,?n w.r.t. d is the ranking ? which minimizes
?i d(?,?i).

?1
?n
?2
?
20
Distance Measures

Kendall tau distance (a.k.a. bubble sort
distance)
K(?,?) of pairs of candidates (a,b) on which
? and ? disagree
Ex K( (a b c d), (a d c b)) 0 2 1 3
Spearman footrule distance
F(?,?) ?a ?(a) - ?(a)
Ex F((a b c d), (a d c b)) 0 2 0 2 4

21
Kemeny Optimal AggregationKemeny 1959

Optimal aggregation w.r.t. Kendall-tau distance
Theorem Young Levenglick, 1978 Truchon
1998 Kemeny optimal aggregation is the only
rank aggregation function, which is neutral,
consistent, and satisfies the Extended Condorcet
principle.
Effective for fighting spam
Generative model
? is the correct ranking
?1,,?n are generated from ? by swapping every
pair with probability lt ½.
Then Kemeny optimal aggregation gives the
maximum likelihood ? given ?1,,?n. Young 1988

22
Complexity of Kemeny Optimal Aggregation

NP-hard, even for n 4 Dwork et al. 2001
In P, for n 2.
Unknown for n 3.
Can be approximated using Spearman footrule
Proposition Diaconis-Graham
K(?,?) F(?,?) 2 K(?,?)
What is the complexity of footrule optimal
aggregation?

23
Footrule Optimal Aggregation

Theorem Dwork et al. 2001
Footrule optimal aggregation can be computed in
polynomial time.
Proof
Want to find ? which minimizes ?i ?a ?(a) -
?i(a)
Define a weight bipartite graph G (L,R,W) as
follows
L M (the candidates)
R 1,,m the available ranks
W(a,r) ?i r - ?i(a)
A matching in G ranking
Cost of a matching ?i ?a ?(a) - ?i(a)
Hence, reduced to finding a minimum cost matching
in a bipartite graph

24
Local Kemenization Dwork et al. 2001

Definition A ranking ? is locally Kemeny optimal
aggregation for ?1,,?n if there is no other
ranking ?, which
Can be obtained from ? by flipping one pair
Satisfies ?i K(?, ?i) lt ?i K(?,?i)
Features
Every Kemeny optimal aggregation is also locally
Kemeny optimal, but converse is not necessarily
true.
Locally Kemeny optimal aggregations satisfy XCC.
Locally Kemeny optimal aggregations can be
computed in O(n m log m) time.

25
Markov Chain TechniquesDwork et al. 2001

Markov Chain states candidates
Transitions depend on the voter rankings
Basic idea probabilistically switch to a
better candidate
Final ranking induced by stationary distribution

26
Four MC Methods

Current state is candidate a.
MC1 Choose uniformly from multiset of all
candidates that were ranked at least as high as a
by some voter.
Probability to stay at a average rank of a.
MC2 Choose a voter i u.a.r. and pick u.a.r. from
among the candidates that the i-th voter ranked
at least as high as a.
MC3 Choose a voter i u.a.r. and pick u.a.r. a
candidate b. If i-th voter ranked b higher than
a, go to b. Otherwise, stay in a.
MC4 Choose a candidate b u.a.r. If most voters
ranked b higher than a, go to b. Otherwise, stay
in a.
Rank of a of pairwise contests a wins.

27
End of Lecture 7

Write a Comment

User Comments (0)

About PowerShow.com

Algorithms%20for%20Large%20Data%20Sets - PowerPoint PPT Presentation

Algorithms%20for%20Large%20Data%20Sets

The rank aggregation problem: ... ranking functions (e.g., VSM, PageRank, HITS, ...) and aggregate them into a single function ... – PowerPoint PPT presentation