Genetic Learning for Information Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Genetic Learning for Information Retrieval

Description:

Fitness proportionate selection. Genetic Algorithms. Chromosome is ... Weekend time limit. Compare to Probabilistic. 67% queries improved. 15% increase in MAP ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 15
Provided by: andrewt6
Category:

less

Transcript and Presenter's Notes

Title: Genetic Learning for Information Retrieval


1
Genetic Learning forInformation Retrieval
  • Andrew Trotman
  • Computer Science
  • 365 24 60 / 40 13,140

2
Genetic Learning
  • The Core Algorithm
  • Crossover, Mutation, Reproduction
  • Fitness proportionate selection
  • Genetic Algorithms
  • Chromosome is an array
  • Genetic Programming
  • Chromosome isan abstract syntax tree

A B C D E F X 1 2 3 4 5 6
3
Information Retrieval (Text)
  • Online Systems
  • Dialog, LexisNexis, etc.
  • Web Systems
  • Alta Vista, Excite, Google, etc.
  • Scientific Literature Systems
  • CiteSeer, PubMed, BioMedNet, etc.
  • Question
  • How should scientific literature be ranked?
  • Less time searching / More time researching
  • Higher exposure for good work

4
How Google Works
  • PageRank
  • Document ranking from PageRank
  • A documents PageRank is some factor (d) of the
    rank of incoming citations
  • A documents influence is some factor of its rank
    and its outgoing citations
  • Characteristics of Scientific Literature
  • Citations unidirectional (backwards in time)
  • 12 month publication cycle
  • Scientific citation cliques

5
How IR works
  • Indexing
  • Build the dictionary
  • Construct the Postings (ltd,fgt pairs)
  • Searching
  • Look up terms in dictionary
  • Boolean resolution
  • Rank on density (probability, vector space, etc.)
  • Performance
  • Recall and precision

6
Structured-IR
  • Sci-Lit documents have structure
  • Title, abstract, conclusions, etc.
  • ltd,fgt becomes ltd,p,fgt

7
Using Structure in Ranking
  • Documents have structure
  • Title, Abstract, Conclusions, etc.
  • Weight each structure on importance
  • Title higher than Abstract higher than
  • How to choose the weights
  • Specified in the query (XIRQL)
  • Query feedback
  • Learn with a Genetic Algorithm
  • Adapt ranking model to use structure
  • Each tree node is a locus
  • Weights are genes

8
Experiment
  • 50 training queries
  • 50 evaluation queries
  • 25 generations
  • Probabilistic IR
  • Vector Space IR

Results
  • PROBABILISTIC IR
  • 75.5 queries improved
  • 6.7 increase in MAP (8.8 max)
  • VECTOR SPACE IR
  • 61 queries improved
  • 4.7 increase in MAP (5.4 max)

9
Ranking Algorithms
  • Multitude exist
  • Probability, vector space, Boolean
  • Several published nomenclatures
  • Over 100,000 published algorithms
  • Purpose
  • Put relevant documents first
  • Sorting
  • Performance measures with precision
  • Sources
  • Some guy thought it up

10
Experiment
  • 50 training queries
  • 50 evaluation queries
  • 31 runs
  • Weekend time limit
  • Compare to Probabilistic

Results
  • 67 queries improved
  • 15 increase in MAP

11
Function Comparison
Vector Space
Probability
Learned
wdqStÎq(((((((((U / sqrt(sqrt(nt))) / (mq /
sqrt((((Lq / (sqrt(sqrt(Ld)) / sqrt((U / nc))))
min(mq, N)) / sqrt(((((((Tmax / sqrt(U)) /
sqrt((((log2(sqrt(nt)) / sqrt(nt)) / sqrt(Umax))
/ (M / nc)))) / sqrt((U / nc))) - uq) / mq) /
sqrt(nt))))))) / sqrt((log(Tmax) / nc))) /
sqrt(nt)) / sqrt(nt)) / sqrt((Lq /
sqrt(((sqrt((sqrt(sqrt(Ld)) / sqrt((min(mq,
sqrt((((log(Tmax) / nc) / sqrt(Umax)) / (mq /
sqrt(((N min((sqrt(nc) / sqrt(U)), Ld)) /
sqrt(N))))))) / sqrt(Ld))))) / sqrt((Tmax / nc)))
/ sqrt(nt)))))) / sqrt((min(mq, N) / nc))) /
sqrt((log(Tmax) / nc))) / sqrt(nt))
12
Conclusions
  • Using document structure improved ranking
  • Structure weights can be learned with a GA
  • GP can be used to learn ranking functions
  • Speculation
  • Combining GA and GP to learn a structure ranking
    algorithm will better GA and GP alone

13
Questions?
14
Random NumbersAre your results an artifact of
your random number generator?
Write a Comment
User Comments (0)
About PowerShow.com