IR-LAB - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

IR-LAB

Description:

Lucene IR-LAB [D, Q, F, R(qi, dj)] D: Q: F ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 17
Provided by: huxg
Category:
Tags: lab

less

Transcript and Presenter's Notes

Title: IR-LAB


1
Lucene????
  • IR-LAB
  • ???

2
????
  • ??????????D, Q, F, R(qi, dj)
  • D ????????
  • Q ?????????
  • F ??????????????????????(Frame)
  • R(qi, dj) ?query qi ?document dj??

3
??????
  • ?????????????????
  • ????????????????
  • ?????????????????
  • ?????????????????????????????

4
???????
5
?????
  • ???????( tf)????( idf)????(frequency)???????
  • tf(i , j) ??j???i????
  • df( j ) ??j????? ????j?????
  • idf( j ) ??j?????? log2( N/ dfj)

6
?????
  • ????????
  • w i,j tf i,j idf i
  • ??tf i,j????????
  • tfi,j tfi,j / maxk tf k,j
  • ????????
  • Salton and Buckley????
  • wi,q (0.5 0.5tfi,q/maxk tfk,q) idfi

7
TermQuery
  • TermQuery?Lucene??????????? Query??????Term
  • TermQuery?????
  • score sqrt(freq) idf boost norm
  • idf ln(maxDoc/(docFreq 1) ) 1.0
  • norm fieldboost / sqrt(fieldlength)
  • ??
  • ???idf?boost?????? ?????
  • ???? sqrt(freq) fieldboost / sqrt(fieldlength)
  • fieldboost???????? ?????1.0
  • ??lucene?????????????????freq/fieldlength???????

8
BooleanQuery
  • BooleanQuery???????Query ??????Query?????
  • BooleanQuery??
  • ??? ?? ?? -??
  • (??? ??) ?? ??
  • ??????query?????boost????query???BooleanQuery?????
    ?
  • ?? ???3.0 ??2.0 ??1.0

9
BooleanQuery????
  • ?????querynorm
  • querynorm boost / sqrt(?i idfiidfiboostiboost
    i)
  • ???????Term????????
  • weight queryWeight fieldWeight
  • queryWeight boost idf querynorm
  • fieldWeight tf idf fieldnorm
  • ????????????
  • score coord (?i weight i )
  • coord ?????/????

10
BooleanQuery????
  • ??????????
  • scorej coordj?i(boost iidf itfi,jidf
    ifieldnorm) / sqrt(?i (idf i idf i boost i
    boost i))
  • fieldnorm fieldboost / sqrt(fieldlength)
  • ??sqrt(?i (idf i idf i boost i boost
    i))?????,?????????

11
Lucene????
wi,q boost qidf q
w i,j tfi,jidf i
  • ????????
  • ???????????
  • ????????
  • w i,j tfi,jidf i
  • ????????
  • wi,q boost qidf q
  • ????? djsqrt(fieldlength)

djsqrt(fieldlength)
12
????????????
  • Lucene?????????
  • ??? ?? ?? -??
  • (??? ??) ?? ??
  • ?????? ?-??????????????????????
  • ?????????????????????????

13
Lucene????
  • ????? ????
  • aaa.txt
  • You are a student. He is a student.
  • bbb.txt
  • I am a student.
  • ccc.txt
  • Lee is a student.He comes from China.
  • ????????? fieldboost 1.0
  • ????fieldnorm fieldboost / sqrt(fieldlength)
  • ?????fieldnorm??? 0.3125,0.5,0.3125
  • ?? norm???????1????? ?????

14
Lucene????
  • ?????? student ????????
  • score sqrt(freq) idf boost norm
  • idf ln(maxDoc/(docFreq 1) ) 1.0
  • ????????

?? docFreq idf freq norm score
aaa 3 0.7123 2 0.3125 0.3148
bbb 3 0.7123 1 0.5 0.3561
ccc 3 0.7123 1 0.3125 0.2225
15
Lucene????
  • ??????? student china
  • scorej coordj?i(boost iidf itfi,jidfifieldn
    orm) / sqrt(?i (idf i idf i boost i boost
    i))
  • ????????

?? queryNorm student china coord score
aaa 0.6346 0.1423 0.0 0.5 0.0711
bbb 0.6346 0.1610 0.0 0.5 0.0805
ccc 0.6346 0.1006 0.3917 1.0 0.4923
16
Any Question?
Write a Comment
User Comments (0)
About PowerShow.com