?et????? ??t? - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

?et????? ??t?

Description:

Title: PowerPoint Presentation Author: C. Makris Last modified by: xrhstos Created Date: 2/20/2002 12:12:36 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 41
Provided by: C936
Category:
Tags: slide

less

Transcript and Presenter's Notes

Title: ?et????? ??t?


1
?et????? ??t?µ?s?? ?p?d?s??
  • ?as??? p??? t? ß?ß??? R. Baeza-Yates, B.
    Ribeiro-Neto, Modern Information Retrieval,
    Addison Wesley, 1999 (second edition,
    2011, http//mir2ed.org/)
  • ?as??? p??? t? ß?ß??? R. Baeza-Yates, B.
    Ribeiro-Neto, Modern Information Retrieval,
    Addison Wesley, 1999 (second edition,
    2011, http//mir2ed.org/)

?as??? p??? t? ß?ß??? ?a? ?? d?af??e?e? R.
Baeza-Yates, B. Ribeiro-Neto, Modern Information
Retrieval, Addison Wesley, 1999 (second edition,
2011, http//mir2ed.org/)
2
??ass???? ?et????? (??t?µ?s? ?p?d?s??)
  • ???????/???????? p???p????t?te? d?µ??
    de??t?d?t?s??
  • ?p????????a µe t? ?e?t??????? S?st?µa
  • ?a??ste??se?? st??? d?a????? ep????????a?
  • ?p?ßa???se?? ap? ?pa??? p????? ep?p?d??
    ????sµ????

3
??d???? ?et?????(??t?µ?s? ?p?d?s?? ????t?s??)
  • S?????? ?e?µ???? ??af????
  • s?????? ?e?µ????
  • s?????? p??t?p?? p????f???a??? a?a???? Q
  • s?????? s?et???? ?e?µ???? ??a ???e q ? Q
  • ?at?????? µet???? ap?d?s?? a???t?s??

4
?e?µe?a ??af????
  • TREC (TREC evaluation collections WSJ (Wall
    Street Journal, AP (Associated Press), ZIFF, FR,
    DOE, PATents)
  • GOV2 (25 million page GOV2 web page collections
    terabyte track)
  • NTCIR (NII Test Collections for IR systems,
    focusing on East Asian, cross language
    information retrieval)
  • CLEF (Cross Language Evaluation Forum
    http//www.clef-campaign.org)
  • Reuters (Reuters-21578 and Reuters Corpus Volume
    1 collection)
  • Cranfield (1398 abstract of aerodynamics journal
    articles, 225 queries)
  • CACM collection
  • ISI (Institute of Scientific Information)
    collection
  • Newsgroups

5
??????s? (Recall) ?a? ????ße?a (Precision)
?st? ? µ?a p??t?p? p????f???a?? a????? ?a? R t?
s????? t?? s?et???? t?? ?e?µ????. ?p???ste ?t?
µ?a d?sµ??? st?at????? a???t?s?? pa???e? ??a
s????? ?e?µ???? ap??t?s?? ?. ?st? Ra t? s?????
t?? ?e?µ???? p?? e??a? ????? sta s????a R ?a? A.
  • ??????s?
  • ????ße?a

6
S??s? ????ße?a?/??????s??
D
R
A
Ra
7
S??s? ????ße?a?/??????s??
8
S?ed?as? ??a???µµat??
?st? e??t?µa q t? ?p??? a???e? st? s?????? t??
p??t?p?? p????f???a??? a?a???? ?a? ?st? Rq t?
s????? t?? s?et???? ?e?µ???? ??a t? e??t?µa q
?p?? ??e? ?a????ste? ap? e?d?????. G?a pa??de??µa
a? ?p???s??µe ?t? t? s????? Rq pe????e? ta
a??????a ?e?µe?a Rqd1, d3, d5,d7, d9,d13, d21,
d41, d43, d45.
1. d7 6. d5 11. d4 2. d2 7. d28 12. d40 3.
d3 8. d12 13. d10 4. d6 9. d22 14. d36 5.
d8 10. d13 15. d1
9
S?ed?as? ??a???µµat??
Te????ta? ?t? ? a???µ?? t?? ep?st?ef?µe???
?e?µ???? e??a? 30, s?ed??ste ta ??af?µata
a?????s?? a???ße?a?, ??a ta a??????a e??t?µata
(d????ta? ? a???µ?? t?? s?et???? ?e?µ???? ?a? ?
??s? t??? st? ap?t??esµa) ???a??1,????µ?? 10,
T?s? 1, 5, 7, 8, 9, 13, 17, 26, 27, 28
???a??2.????µ?? 10, T?s? 2, 3, 4, 5, 7, 10,
11, 12, 16, 27.   ?e ß?s? ta d?? p????pt??ta
??af?µata s??????ete µeta?? t??? t?? d?? µ??a???.
10
S?ed?as? ??a???µµat??
  • S?????? t? d????aµµa a?t? ßas??eta? se 11 p??t?pa
    ep?peda a?????s?? ta 0, 10, ..., 100, ?p?? se
    ???e ep?ped? ? a???ße?a ?p??????eta? µe ???s?
    µ?a? d?e??as?a? pa?eµß???? (interpolation) t??
    a???????? µ??f?? ?st? rj, j?0,1,2,,10 t?
    j-?st? ep?ped? a?????s?? t?te
  • P(rj)max rj?r?rj1 P(r)
  • ??µata ?????s?? (t?p??? ??a TREC)
  • ?p?????se interpolated precision ??a recall
    levels 0.0, 0.1,
  • ?p?????se ??a ???e e??t?s? se ???e evaluation
    benchmark
  • ?p?????se µ?se? t?µ?? ??a ???e e??t?µa

11
S????? ??a??aµµ?t??
  • ??s? a???ße?a ??a ???e s?et??? ?e?µe?? p??
    a?a?t?ta? (Mean Average Precision (latest TREC
    Conferences)) -- µp??e? ?a ?e????e? ?a? ?t?
    a?apa??st? t? s??????? eµßad??
  • R-????ße?a
  • pa???eta? µ?a t?µ? s?????? p?? ?p??????eta? ?? ?
    a???ße?a st? R-?st? ??s? d??ta???, ?p?? R e??a?
    ? s???????? a???µ?? t?? s?et???? ?e?µ???? ??a
    t?? t?????sa e??t?s? (d??ad? ? a???µ?? t??
    ?e?µ???? st? s????? Rq).
  • ?st????µµata ????ße?a?
  • ?st? RPA(i) ?a? RPB(i) ?? t?µ?? t?? R-a???ße?a?
    ??a d?? a??????µ??? a???t?s?? A,B ??a t? i-?st?
    e??t?µa. ??????µe t?? a??????? d?af???
    RPA/B(i)RPA(i)-RPB(i).

12
Receiver Operating Characteristics
  • true positives (tp) retrieved and relevant
  • false positives (fp) retrieved and non
    relevant
  • true negatives (tn) non relevant and
    non-retrieved
  • false negatives (fn) non relevant and
    retrieved
  • sensitivitytp/(tpfn), false-positive
    rate or 1-specificityfp/(fptn).
  • Ptp/(tpfp), Rtp/(tpfn)

13
?ata?????t?ta ????ße?a?/??????s??
  • ?pa?te?ta? ?ept?µe?? ???s? ???? t?? ?e?µ???? t??
    s??????? p?? se µe???e? s??????? de? e??a?
    d?a??s?µ?
  • ? ?ata??af? µ?a? µ??? µet????? a?t? ??a d??
    e??a? s?????? e????st?
  • Se µ??t???a s?st?µata ? d?epaf? ?a? ?
    a????ep?d?as? µe t?? ???st? ap?te???? s?µe??
    ??e?d? st?? epe?e??as?a e??? e??t?µat??, ??t? p??
    ?a??st? ep?ta?t??? t?? ?????t?s? µet????? p??
    t?? ?aµß????? ?p???.
  • O? µet????? a?????s?? ?a? a???ße?a? e??a?
    ?at?????e? ?ta? ?p???e? µ?a ??aµµ??? d??ta?? sta
    a?a?t?µe?a ?e?µe?a, d?af??et??? µp??e? ?a e??a?
    a?a???ße??.

14
??a??a?t???? ?et?????
  • ??µ?????? ??s?? ????
  • ? ?et???? ?
  • ?et????? ???sa?at???sµ??e? p??? t?? ???st?

15
??µ?????? ??s?? ????
? a?µ?????? µ?s?? ???? F a?????s?? ?a? a???ße?a?
????eta? ?? e???
?p?? R(j) e??a? ? a?????s? ??a t? j-?st? ?e?µe??
st? d??ta??, P(j) e??a? ? a???ße?a ??a t? j-?st?
?e?µe?? st? d??ta?? ?a? F(j) e??a? ? a?µ??????
µ?s?? ???? t?? R(j), P(j). ??t?a ??a t?? ep?????
a?t?, e??a? ?t? ? a?µ?????? µ?s?? ??? p??se????e?
t? e????st? t?? d?? t?µ?? ?a? ??? t? µ???st?.
16
? ?et???? ?
? µet???? ? ????eta? ?? e???
-- R(j) e??a? ? a?????s? ??a t? j-?st? ?e?µe??
st? d??ta??, P(j) e??a? ? a???ße?a ??a t? j-?st?
?e?µe?? st? d??ta?? ?a? F(j) e??a? ? a?µ??????
µ?s?? ???? t?? R(j), P(j). -- t?µ?? bgt1,
s?µa??e? ?t? ? ???st?? e?d?af??eta? p?? p??? ??a
a???ße?a, t?µ?? blt1 ?t? e?d?af??eta? ??a
a?????s?.
17
?et????? ???sa?at???sµ??e? p??? t?? ???st? (1)
?st? R t? s????? t?? s?et???? ?e?µ???? ??a t??
p????f???a?? a????? I, A t? s????? t?? ?e?µ????
p?? ??e? a?a?t??e? ?a? U ? R t? s????? t??
?e?µ???? p?? e??a? ???st? st? ???st? ?t? e??a?
s?et??? p??? t? e??t?µa t??. ?st? Rk ? t?µ? t??
s?????? ? ?a? U ?a?Ru o a???µ?? t?? s?et????
?e?µ????, p?? de? ??????e p??? ? ???st?? ?a? ta
?p??a ????? a?a?t??e?.
  • ?a?µ?? ??????? (coverage ratio)
  • Ba?µ?? ?a???t?µ?a? (novelty ratio)

18
???e? ?et?????
  • S?et??? a?????s? (relative recall) ????eta? ?? t?
    p????? a??µesa st?? a???µ? t?? s?et???? ?e?µ????
    p?? ????? a?a?t??e? ?a? t?? s?et???? ?e?µ???? p??
    ? ???st?? pe??µ??e? ?a a?a?t?????.
  • ??st?? a?????s?? (recall effort) ??????µe t?
    p????? a??µesa sta s?et??? ?e?µe?a p?? ? ???st??
    a?aµ??e? ?a e?t?p?se? ?a? ta ?e?µe?a p?? e?et??e?
    µ????? ?t?? e?t?p?se? a?t? p?? a?aµ??e?.

19
???e? ?et????? ???a??? ?a??µat??
  • ??s? ??????a de??t?d?te?
  • ????µ?? ?e?µ????/??a
  • µ?s? µ??e??? ?e?µ????
  • ??s? ??????a apa?t?
  • ??f?ast???t?ta ???ssa? e??t?s??
  • ??a??t?ta d?at?p?s?? p???p????? p????f???a???
    a?a????
  • ?a??t?ta p???p????? e??t?se??

20
??t??s? ??a??p???s?? ???st?
  • T?µa p???? ???st? ?????µe ?a ??a??p???s??µe
  • e?a?t?ta? ap? t?? efa?µ???
  • Web engine ? ???st?? e?t?p??e? a?t? p?? ???e?
    ?a? ep?st??fe? st?? ?d?a µ??a??
  • ?ata??af? ???µ?? ep?st??f?? ???st?
  • eCommerce site ? ???st?? ß??s?e? a?t? p?? ???e?
    ?a? ???e? a????
  • ???a? ? end-user, ? t? eCommerce site t? ?p???
    µet??µe
  • ??t??s? ?????? a?????, ? p?s?st? ???st?? p??
    ????a? a???ast??

21
??t??s? ??a??p???s?? ???st?
  • Enterprise (company/govt/academic) Care about
    user productivity
  • How much time do my users save when looking for
    information?
  • breadth of access, secure access, etc.

22
Web Search Evaluation
- H a?????s? e??a? d?s???? ?a ?p?????ste? st?
Web - ?? µ??a??? ?a??µat?? s???? ???s?µ?p?????
a???ße?a sta p??ta k, p.?., k 10 ?e?µe?a ?
µet????? p?? p??µ?d?t???, t?? ??????? a???t?s?
????fa??? se??d?? - ?? µ??a??? ???s?µ?p?????
ep?s?? non-relevance-based µet?????. ?a??de??µa
1 clickthrough st? p??t? ap?t??esµa (a? ?a? ???
p??? a???p?st? µet???? e??a? a???p?st? ?at? µ?s?
???). ?a??de??µa 2 ??e? te?????? p?? a??µ? de?
????? ????a???se? st? pe????? ?a??de??µa 3 A/B
testing
23
?/? ?et????
?est???sµa ?a???t?µ?? ???????µ?? ???apa?t??µe??
?pa??? µ?a? µ??a??? ?a??µat?? ?etat?p?s? e???
µ????? p?s?st?? t?? ?????f???a? (pe??p?? 1) se
??a ??? s?st?µa, p?? s?µpe???aµß??e? t??
?a???t?µ?a ????????s? µe µ?a a?t?µat? µet????
?p?? clickthrough st? p??t? ap?t??esµa ?a?a??a??
d?ste st??? ???ste? t? d??at?t?ta ?a µeta????????
st? ??? a??????µ?.
24
Benchmark collection
S?????? ?e?µ???? - a?t?p??s?pe?t??? t?? ?e?µ????
p?? d?a?e?????µaste S?????? p????f???a???
a?a???? - ... ?a??asµ??a a?af????ta? ??
e??t?µata - a?t?p??s?pe?t??? a?t?? p??
a?aµ????µe ?ata??af? s?et???t?ta? - apa?a?t?t? ?
???s? ???t?? ? d?af??et??? e?t?µ?t??
s?s??t?s?? - d?ad??as?a a???ß? ?a? ?????ß??a -
?? ???se?? p??pe? ?a e??a? a?t?p??s?pe?t???? t??
e?t?µ?s?? t?? ???st?? - ?? ???se?? p??pe? ?a
e??a? µeta?? t??? s??epe?? - p?? µp??e? ?a
a????????e? ? s???pe?a t?? ???st?? (kappa
µet????) - t?µ?? t?? k ap? 2/3 ?? 1 ?e?????ta?
??a??p???t????.
25
K µet????
  • K e??a? µet???? p?? a??????e? ?at? p?s? d??
    ???t?? s?µf????? ? d?af?????
  • S?ed?asµ??? ??a ?at??????? ???sµata
  • P(A) e??a? t? p?s?st? s?µf???a? t?? d?? ???t??
  • - P(E) e??a? t? p?s?st? s?µf???a? ap? t???
  • ? µet???? K ?p??????eta? ?? e???
  • K(P(A)-P(E))/(1-P(E))
  • ?a? ?? d?? p??a??t?te? ?p????????ta? ap? p??a?e?
    a???????se?? t?? d?? ???t??.
  • ??? s???e???µ??a P(E)P(relevant)2P(non_relevant)
    2 ?p?? ?a? st?? d?? a???????se?? pa?????µe
    ?p???? µa? ??e? t?? a???????se?? t?? referee.

26
S?????? Cranfield
  • - ?p? t?? p??te? s??????? ded?µ????, µe pa????
    a?t?p?s?pe?t???? µ?t??? ??a ?ata??af? p?s?t???
    t?? ap?te?esµat???t?ta? s???????.
  • ???? 1950, UK
  • 1938 abstracts ?????? se pe???d??? ae??d??aµ????,
    s????? 225 e??t?µ?t??, e?a?t??t???? ???se??
    s?et???t?ta? ??a ??a ta ?e??? e??t?µ?t??-?e?µ????
  • - ???et? µ????, ?a? ??? t?s? t?p??? ??a s?ßa??
    a???????s? a???t?s? p????f???a? s?µe?a.

27
S?????? TREC
  • TREC (Text Retrieval Conference)
  • ???a?????e ap? U.S. National Institute of
    Standards Organization (NIST)
  • TREC e??a? µ?a s?????? ap? d?af??et???
    benchmarks
  • G??st? ?? TREC Ad Hoc, ???s?µ?p??????e ??a t??
    p??te? 8 TREC a???????se?? 1992-1999.
  • 1.89 e?at?µµ???a ?e?µe?a, ?????? ????a, 450
    p????f???a??? a????e?
  • ??? e?a?t??t???? a???????se??, a??et? a???ß??
  • ?as??? e?t?µ?se?? a???????s?? ?p?????? µ??? ??a
    ?e?µe?a p?? ?ta? a??µesa sta k p??ta p?? ?ta?
    st?? TREC s?????? ?a? ep?st??f??a? st? d????e?a
    ap??t?s?? µ?a? p????f???a??? a??????.

28
S???????
  • GOV2
  • -- µ?a ???? TREC/NIST s??????
  • -- 25 e?at?µµ???a web se??de?
  • -- ap? t?? µe?a??te?e? d?a??s?µe? s???????
  • -- 3 t??e?? µe?????? µ????te?? ap?
    Google/Yahho/MSN
  • NTCIR
  • -- East Asian Language ?a? Cross Language
    Information Retrieval
  • Cross Language Evaluation Forum (CLEF)
  • -- ??t? ? s?????? ??e? ep??e?t???e? se
    ????pa???? ???sse? ?a? cross language
    information retrieval

29
??sta ?p?te?esµ?t??
  • ??? s???? title, url, ??sta µetaded?µ????
  • ??a pe??????
  • ??? ?p??????eta? ? pe??????
  • ??? ßas??? e?d? pe???????, stat??? ?a? d??aµ???
  • - stat??? a?e???t?t? e??t?s??
  • - d??aµ??? e?a?t?µe?? ap? e??t?s?.

30
Stat??? ?e??????
  • ?e?????? t?? pe??e??µ???? t?? ?e?µ????
  • ?? p??te? pe??p?? 50 ???e?? t?? ?e?µ????
  • ??? p???p???e? pe?????e??, ???s? te?????? NLP
  • - NLP heuristics ??a µa?????sµa p??t?se??
  • - pe?????? pa???eta? ap? t?? ????fa?e?
    p??t?se??
  • ??? p???p???e? p??se???se?? efa?µ????? NLP ??a
    pa?a???? p??t?se??
  • - ??? ?t??µ? ??a ???s? se efa?µ????

31
???aµ???? ?e?????e??
  • ?a???s?as? e??? ? pe??ss?t???? pa??????? ?
    snippets st? ?e?µe?? p?? pa???s?????? µe??????
    ap? t??? ????? e??t?s??
  • ?a?????ta? se s??d?asµ? µe t?? ap??t?s? st???
    ????? e??t?s??
  • S?????? p??t?µ??ta? snippets ?p?? ?? ????
    eµfa?????ta? sa? µ?a f??s? ? ?p?? ? e???t?t? t???
    µ?sa st? f??s? p?a?µat?p??e?ta? se ??a pa??????
    p?? ????eta? ap? t?? ???st?
  • ? pe?????? ? ?p??a ?p??????eta? ?ts? eµfa???e?
    ????? t??? ????? t?? pa?a?????, ??? µ??? a?t???
    p?? eµpe??????ta? st?? e??t?s?.

32
?e????? T?µata
  • G?a t?? ??????? ???p???s? ?p?????sµ?? t??
    snippets ?a p??pe? ?a ?????µe cache documents sta
    ?p??a ?a ???e? ? ?p?????sµ?? (ep????d???t?ta
    te???? a?t? ?a e??a? outdated)
  • ??s? t? caching ?a ???eta? se ??a prefix t??
    ?e?µ???? ?at??????? µe??????
  • ?da???? ta snippets ?a p??pe? ?a e??a? µ???? ?a?
    ?a µetaf????? ?da???? t? pe??e??µe?? t?? ?e?µ????
  • ? ?pa??? d??aµ???? pe?????e?? e??a? s?µa?t???
    ??µa t? ?p??? p??pe? ?a p??se??e? ?ts? ?ste ?a
    e??a? e??a??st?µ???? ? te????? ???st??.

33
???te??p???s?
  • ?a S?st?µata ?.?. ???s?µ?p????? ?????
    de??t?d?t?s?? ??a ?a a?t?µet?p?s??? t??
    p????f???a??? a????e? t?? ???st?.
  • ???? ?e??t?d?t?s??
  • ??a keyword ? ?µ?da ep??e??µe??? ???e??
  • ???e ???? (p?? ?e????)
  • ?p?µ?????s? ?ata???e?? (stemming) µp??e? ?a
    ???s?µ?p????e?
  • connect connecting, connection, connections
  • ??a a?est?aµµ??? a??e?? ?t??eta? ??a t???
    d?sµ????? ????? de??t?d?t?s??.

34
?e?µe?a
???? ?e??t?d?t?s??
?e?µe??
?a???asµa
?????f???a?? ??????
?at?ta??
???t?µa
35
Ad-Hoc ????t?s? ?a? F??t????sµa
  • Ad hoc a???t?s?

Q1
Q2
S?????? ?epe?asµ???? ?e??????
Q3
Q4
Q5
36
Ad-Hoc ????t?s? ?a? F??t????sµa
  • F??t????sµa

?e?µe?a ??a ???st?2
???st??2 ???f??
???st??1 ???f??
?e?µe?a ??a ???st?1
??? ?e?µ????
37
  • ?at?ta?? e??a? µ?a ta????µ?s? t?? a?a?t?µ????
    ?e?µ???? p?? a?apa??st? t? s?et???t?ta t??
    ?e?µ???? µe t? e??t?µa t?? ???st?.
  • ??a ?at?ta?? ßas??eta? se ?p???se?? s?et??? µe
    t?? ?????a t?? s?et???t?ta? ?p??
  • ????? s????? ???? de??t?d?t?s??
  • ??aµ???as? ????sµ???? ????
  • ???a??t?ta s?s??t?s??
  • ??af??et??? s????? ?p???se?? ?d????? se
    d?af??et??? µ??t??a ?.?.

38
??p???? ???sµ?? ???t???? ?.?.
??a µ??t??? a???t?s?? p????f???a? e??a? ? tet??da
D, Q, F, R(qi, dj) ?p?? 1)   - D e??a? ??a
s????? ap? ??????? a?apa?ast?se?? ??a ta ?e?µe?a
t?? s??????? 2)      - Q e??a? ??a s????? ap?
??????? a?apa?ast?se?? ??a t?? p????f???a???
a????e? t?? ???st?. ??t?? ?? a?apa?ast?se??
?a????ta? e??t?µata 3)      - F e??a? ??a
?p?ßa??? ??a t?? µ??te??p???s? t?? a?apa??stas??
t?? ?e?µ????, t?? e??t?µ?t?? ?a? t?? s??se??
µeta?? t??? - R(qi, dj) e??a? µ?a s????t?s?
?at?ta???, ? ?p??a s??d?e? ??a? p?a?µat???
a???µ? µe ??a e??t?µa qi ? Q ?a? µ?a a?apa??stas?
?e?µ???? dj ? D. ??a t?t??a ?at?ta?? ????e? µ?a
d??ta?? p??? sta ?e?µe?a p??ta µe ß?s? t?
e??t?µa. qi.
39
???t??a ?.?.
40
???t??a ?.?.
  • ?? ???t??? ?.?., ? ?????? ??? t?? ?e?µ???? ?a? ?
    d?e??as?a a???t?s?? ap?te???? d?a???t?? ??e?? t??
    s?st?µat??.
Write a Comment
User Comments (0)
About PowerShow.com