Algorithms%20for%20Large%20Data%20Sets - PowerPoint PPT Presentation

About This Presentation
Title:

Algorithms%20for%20Large%20Data%20Sets

Description:

The rank aggregation problem: ... ranking functions (e.g., VSM, PageRank, HITS, ...) and aggregate them into a single function ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 28
Provided by: zivbar
Category:

less

Transcript and Presenter's Notes

Title: Algorithms%20for%20Large%20Data%20Sets


1
Algorithms for Large Data Sets
  • Ziv Bar-Yossef

Lecture 7 April 20, 2005
http//www.ee.technion.ac.il/courses/049011
2
Rank Aggregation
3
Outline
  • The rank aggregation problem
  • Applications
  • Desired properties
  • Arrows impossibility theorem
  • Rank aggregation methods

4
The Rank Aggregation Problem
  • m candidates (a.k.a. alternatives)
  • M 1,,m set of candidates
  • n voters (a.k.a. agents or judges)
  • N 1,,n set of voters
  • Each voter i, has an ranking ?i on M
  • ?i(a) lt ?i(b) means i-th voter prefers a to b
  • Ranking may be a total or partial order
  • The rank aggregation problem
  • Combine ?1,,?n into a single ranking ? on M,
    which represents the social choice of the
    voters.
  • Rank aggregation function f(?1,,?n) ?
  • ? may be a total or partial order

5
Examples
  • m small, n large elections (multi-party
    parliament, academies, boards,...)
  • m modest, n small program committees, sports
  • m large, n small meta-search, travel plans,
    restaurant selection

6
Applications to Web Search
  • Meta search
  • Combine results of different search engines into
    a better overall ranking
  • Combat spam
  • Spam results unlikely to rank high in aggregate
    ranking, even though they can rank high in one or
    two search engines.
  • Search for multiple terms
  • AND bad recall
  • OR bad precision
  • Complex boolean queries too complicated for
    average user
  • Solution search for small subsets of terms and
    aggregate results
  • Combine multiple ranking functions
  • Use different ranking functions (e.g., VSM,
    PageRank, HITS, ) and aggregate them into a
    single function

7
Applications to Databases
  • Rank items in a database according to multiple
    criteria
  • Ex Choose a restaurant by cuisine, distance,
    price, quality, etc.
  • Ex Choose a flight ticket by price, of stops,
    date and time, frequent flier bonuses, etc.

8
Desired Properties Unanimity
  • Unanimity (a.k.a. Pareto optimality)
  • If all voters prefer candidate a to candidate b
    (i.e., ?i(a) lt ?i(b) for all i), then also ?
    should prefer a to b (i.e., ?(a) lt ?(b)).

a c a
c a b
b b c
ab 30
9
Desired Properties Condorcet
  • Condorcet Criterion Condorcet, 1785
  • Condorcet winner a candidate a, which is
    preferred by most voters to any other candidate b
    (i.e., for all b, of i s.t. ?i(a) lt ?i(b) is at
    least n/2).
  • Condorcet criterion If Condorcet winner exists,
    ? should rank it first (i.e., ?(a) 1).

c b a
a a b
b c c
c b a
a c b
b a c
ab 21, ac 21
No Condorcet winner
10
Desired Properties XCC
  • Extended Condorcet Criterion (XCC)
  • If most voters prefer candidate a to candidate b
    (i.e., of i s.t. ?i(a) lt ?i(b) is at least
    n/2), then also ? should prefer a to b (i.e.,
    ?(a) lt ?(b)).
  • Not always realizable

c b a
a a b
b c c
c b a
a c b
b a c
?(a) lt ?(b) lt ?(c)
Not realizable
11
XCC and Spam Dwork et al. 2001
  • Definition a page p is said spam to a ranking
    ?, if there is a page q ranked lower than p,
    which most human evaluators will think should be
    ranked higher than p.
  • Assumption for any two pages p,q, majority of
    human evaluators agrees with majority of search
    engine rankings on the order of p,q.
  • Conclusion
  • Spam pages are always Condorcet losers
  • If rank aggregation function respects XCC, it
    eliminates spam.

12
Desired Properties Independence from Irrelevant
Alternatives
  • Independence from Irrelevant Alternatives
  • Relative order of a and b in ? should depend
    only on relative order of a and b in ?1,,?n.
  • Ex if ?i (a b c) changes to (a c b), relative
    order of a,b in ? should not change.

13
Desired Properties Neutrality and Anonymity
  • Neutrality
  • No candidate should be favored to others.
  • If two candidates switch positions in ?1,,?n,
    they should switch positions also in ?.
  • Anonymity
  • No voter should be favored to others.
  • If two voters switch their orderings, ? should
    remain the same.

14
Desired Properties Monotonicity and Consistency
  • Monotonicity
  • If the ranking of a candidate is improved by a
    voter, its ranking in ? can only improve.
  • Consistency
  • If voters are split into two disjoint sets, S
    and T, and both the aggregation of voters in S
    and the aggregation of voters in T prefer a to b,
    then also the aggregation of all voters should
    prefer a to b.

15
Dictatorship and Democracy
  • Dictatorship f(?1,,?n) ?i
  • Democracy (a.k.a. Majoritian aggregation)
  • Use extended Condorcet Criterion to rank
    candidates.
  • Always works for m 2.
  • Not always realizable for m 3.
  • Theorem May, 1952 For m 2, Democracy is the
    only rank aggregation function which is monotone,
    neutral, and anonymous.

16
Arrows Impossibility Theorem Arrow, 1951
  • Theorem If m 3, then the only rank aggregation
    function that is unanimous and independent from
    irrelevant alternatives is dictatorship.
  • Won Nobel prize (1972)

17
Positional Rank Aggregation Methods
  • Plurality
  • score(a) of voters who chose a as 1
  • ? order candidates by decreasing scores
  • Top-k approval
  • score(a) of voters who chose a as one of the
    top k
  • ? order candidates by decreasing scores
  • Bordas rule Borda, 1781
  • score(a) ?i ?i(a)
  • ? order candidates by increasing scores
  • Violate independence from irrelevant alternatives

18
Positional Methods Example
b c a a
d b d b
c d c c
a a b d
Borda Top-2 Approval Plurality
114410 2 2 a
24219 3 1 b
331310 1 1 c
423211 2 0 d
19
Optimal Rank Aggregation
  • d distance measure among rankings
  • Definition The optimal rank aggregation for
    ?1,,?n w.r.t. d is the ranking ? which minimizes
    ?i d(?,?i).

?1
?n
?2
?
20
Distance Measures
  • Kendall tau distance (a.k.a. bubble sort
    distance)
  • K(?,?) of pairs of candidates (a,b) on which
    ? and ? disagree
  • Ex K( (a b c d), (a d c b)) 0 2 1 3
  • Spearman footrule distance
  • F(?,?) ?a ?(a) - ?(a)
  • Ex F((a b c d), (a d c b)) 0 2 0 2 4

21
Kemeny Optimal AggregationKemeny 1959
  • Optimal aggregation w.r.t. Kendall-tau distance
  • Theorem Young Levenglick, 1978 Truchon
    1998 Kemeny optimal aggregation is the only
    rank aggregation function, which is neutral,
    consistent, and satisfies the Extended Condorcet
    principle.
  • Effective for fighting spam
  • Generative model
  • ? is the correct ranking
  • ?1,,?n are generated from ? by swapping every
    pair with probability lt ½.
  • Then Kemeny optimal aggregation gives the
    maximum likelihood ? given ?1,,?n. Young 1988

22
Complexity of Kemeny Optimal Aggregation
  • NP-hard, even for n 4 Dwork et al. 2001
  • In P, for n 2.
  • Unknown for n 3.
  • Can be approximated using Spearman footrule
  • Proposition Diaconis-Graham
  • K(?,?) F(?,?) 2 K(?,?)
  • What is the complexity of footrule optimal
    aggregation?

23
Footrule Optimal Aggregation
  • Theorem Dwork et al. 2001
  • Footrule optimal aggregation can be computed in
    polynomial time.
  • Proof
  • Want to find ? which minimizes ?i ?a ?(a) -
    ?i(a)
  • Define a weight bipartite graph G (L,R,W) as
    follows
  • L M (the candidates)
  • R 1,,m the available ranks
  • W(a,r) ?i r - ?i(a)
  • A matching in G ranking
  • Cost of a matching ?i ?a ?(a) - ?i(a)
  • Hence, reduced to finding a minimum cost matching
    in a bipartite graph

24
Local Kemenization Dwork et al. 2001
  • Definition A ranking ? is locally Kemeny optimal
    aggregation for ?1,,?n if there is no other
    ranking ?, which
  • Can be obtained from ? by flipping one pair
  • Satisfies ?i K(?, ?i) lt ?i K(?,?i)
  • Features
  • Every Kemeny optimal aggregation is also locally
    Kemeny optimal, but converse is not necessarily
    true.
  • Locally Kemeny optimal aggregations satisfy XCC.
  • Locally Kemeny optimal aggregations can be
    computed in O(n m log m) time.

25
Markov Chain TechniquesDwork et al. 2001
  • Markov Chain states candidates
  • Transitions depend on the voter rankings
  • Basic idea probabilistically switch to a
    better candidate
  • Final ranking induced by stationary distribution

26
Four MC Methods
  • Current state is candidate a.
  • MC1 Choose uniformly from multiset of all
    candidates that were ranked at least as high as a
    by some voter.
  • Probability to stay at a average rank of a.
  • MC2 Choose a voter i u.a.r. and pick u.a.r. from
    among the candidates that the i-th voter ranked
    at least as high as a.
  • MC3 Choose a voter i u.a.r. and pick u.a.r. a
    candidate b. If i-th voter ranked b higher than
    a, go to b. Otherwise, stay in a.
  • MC4 Choose a candidate b u.a.r. If most voters
    ranked b higher than a, go to b. Otherwise, stay
    in a.
  • Rank of a of pairwise contests a wins.

27
End of Lecture 7
Write a Comment
User Comments (0)
About PowerShow.com