Title: Improving Search Effectiveness in the Legal E-Discovery Process using Relevance Feedback
1Improving Search Effectiveness in the Legal
E-Discovery Process using Relevance Feedback
- Feng Charlie Zhao, University of Washington
- Douglas W. Oard, University of Maryland
- Jason R. Baron, National Archives and Records
Administration - fcz_at_u.washington.edu, oard_at_umd.edu,
jason.baron_at_nara.gov
2Meet-and-Confer Alternatives
3High Recall is Possible
Precision RelRet / Ret Recall RelRet / Rel
TREC 2008 Interactive Task, Topic 103, High
OCR-accuracy documents only
4Meet-and-Confer Alternatives
5Boolean Misses Many Relevant
- Mean estR0.33 (26 topics)
- Missed 67 of relevant documents (on average)
- Max estR 0.99, Topic 127 (sanitation procedures)
- Min estR0.00, Topic 142 (contingent sales)
Estimated Recall
26 topics
TREC 2008 Ad Hoc Task, Consensus Boolean Queries
6Boolean Misses Many Highly Relevant
- Mean estR0.42 (24 topics)
- Missed 58 of highly relevant docs (on average)
- Max estR1.00, Topic 137 (intellectual property
rights) - Min estR0.00, Topic 147 (returns of cigarettes)
Estimated Recall
24 topics
TREC 2008 Ad Hoc Task, Consensus Boolean Queries
7Meet-and-Confer Alternatives
8Research Questions
- Can incremental disclosure (with query
renegotiation) increase recall without increasing
the total manual review effort? - How many review stages are needed?
- What criteria can be used to recognize when
renegotiation might be helpful?
9Research Method
- Select a collection with relevance judgments
- We used the TREC 2005 Robust Track
- Almost 1 million news stories from 3 sources
- Model renegotiation as relevance feedback
- Issue initial query
- Simulate review of some number of documents
- Lock in those results
- Add the best terms from relevant docs to query
- Measure Recall_at_N at end of each stage
- Simulates completion of responsive review
10Why Not Legal Track Collection?
Scanned
OCR
Metadata
Philip Moxx's. U.S.A. x.dramc.
cvrrespoaa.aa Benffrts Departmext Riehgtpwna,
Yfeia Ta Dishlbutfon Data aday 90,1997. From
Lisa Fislla Sabj.csr CIGNA WeWedng Newsbttsr
-Yntsre StratsU During our last CIGNA Aatfoa Plan
meadng, tlu iasuo of wLetSae to i0op
per'Irwng artieles aod discontinue mndia6 CIGNA
Well-Being aawslener to om employees was a msiter
of disanision . I Imvm done somme reaearcgtgt, and
wanted to pruedt you with my Sadings and
pcdiminary recwmmeadatioa for PM's atratezy
Ieprding l4aas aewelattee . I believe .vayone'a
input is valusble, and would epproolate hoarlng
fmaa aaeh of you on whetlne you concur with my
reeommendatioa
Title CIGNA WELL-BEING NEWSLETTER - FUTURE
STRATEGY Organization Authors PMUSA, PHILIP
MORRIS USA Person Authors HALLE, L Document
Date 19970530 Document Type MEMO,
MEMORANDUM Bates Number 2078039376/9377 Page
Count 2 Collection Philip Morris
11How Many Expansion Terms?
Arithmetic Progression Partitions
12Relevance-Guided Partitioning
- Judge relevance in best-first order
- Requires a relevance ranking technique
- Stop when N1 relevant docs have been found
- Renegotiate the query
- Judge unseen docs in best first order
- Stop when N2 relevant docs have been found
- Renegotiate the query
- Judge all remaining unseen documents
- Or iterate more renegotiation stages, if desired
13How Many Iterations?
50 Topics, Title Queries, TREC-2005 Robust Track
Collection
14Research Questions
- Can incremental disclosure (with query
renegotiation) increase recall without increasing
the total manual review effort? - Yes, although renegotiation takes time and effort
- How many review stages are needed?
- At least two (maybe more, if many relevant docs)
- What criteria can be used to recognize when
renegotiation might be helpful? - Number of relevant documents found so far