Combining KR and search: Crossword puzzles - PowerPoint PPT Presentation

About This Presentation
Title:

Combining KR and search: Crossword puzzles

Description:

... KR and search: Crossword puzzles. Next: Logic representations. Reading: ... Examine frequency in crossword puzzles and normalize to account for bias caused ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 30
Provided by: Kathleen268
Category:

less

Transcript and Presenter's Notes

Title: Combining KR and search: Crossword puzzles


1
Combining KR and search Crossword puzzles
  • Next Logic representations
  • Reading C. 7.4-7.8

2
Changes in Homework
  • Mar 4th Hand in written design, planned code for
    all modules
  • Mar 9th midterm
  • Mar 25th Fully running system due
  • Mar 30th Tournament begins

3
Changes in Homework
  • Dictionary
  • Use dictionary provided do not use your own
  • Start with 300 words only
  • Switch to larger set by time of tournament
  • Representation of dictionary is important to
    reducing search time
  • Using knowledge to generate word candidates could
    also help

4
Midterm Survey
  • Start after 9AM Friday and finish by Thursday,
    Mar. 4th
  • Your answers are important they will affect
    remaining class structure

5
Crossword Puzzle Solver
  • Proverb Michael Litman, Duke Univ
  • Developed by his AI class
  • Combines knowledge from multiple sources to solve
    clues (clue/target)
  • Uses constraint propogation in combination with
    probabilities to select best target

6
Algorithm Overview
  • Independent programs specialize in different
    types of clues knowledge experts
  • Information retrieval, database search, machine
    learning
  • Each expert module generates a candidate list
    (with probabilities)
  • Centralized solver
  • Merges the candidates lists for each clue
  • Places candidates on the puzzle grid

7
Performance
  • Averages 95.3 words correct and 98.1 letters
    correct
  • Under 15 minutes/puzzle
  • Tested on a sample of 370 NYT puzzles
  • Misses roughly 3 words or 4 letters on a daily
    15X15 puzzle

8
Questions
  • Is this approach any more intelligent than the
    chess playing programs?
  • Does the use of knowledge correspond to
    intelligence?
  • Do any of the techniques for generating words
    apply to Scrabble?

9
(No Transcript)
10
To begin research style
  • Study of existing puzzles
  • How hard?
  • What are the clues like?
  • What sources of knowledge might be helpful?
  • Crossword Puzzle database (CWDB)
  • 350,000 clue-target pairs
  • gt250,000 unique pairs
  • of puzzles seen over 14 years at rate of one
    puzzle/day

11
How novel are crossword puzzles?
  • Given complete database and a new puzzle, expect
    to have seen
  • 91 of targets
  • 50 of clues
  • 34 of clue target pairs
  • 96 of individual words in clues

12
(No Transcript)
13
Categories of clues
  • Fill in the blank
  • 28D Nothing ____ less
  • Trailing question mark
  • 4D The end of Plato?
  • Abbreviations
  • 55D Key abbr maj

14
Expert Categories
  • Synonyms
  • 40D Meadowsweet spiraea
  • Kind-of
  • 27D Kind of coal or coat pea
  • pea coal and pea coat standard phrases
  • Movies
  • 50D Princess in Woolfs Orlando sasha
  • Geography
  • 59A North Sea port aberdeen
  • Music
  • 2D Hold Me country Grammay winner, 1988 oslin
  • Literature
  • 53A Playwright/novelist Capek karel
  • Information retrieval
  • 6D Mountain known locally as Chomolungma everest

15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Candidate generator
  • Farrow of Peyton Place mia
  • Movie module returns
  • 0.909091 mia
  • 0.010101 tom
  • 0.010101 kip
  • 0.010101 ben
  • 0.010101 peg
  • 0.010101 ray

19
(No Transcript)
20
(No Transcript)
21
Ablation tests
  • Removed each module one at a time, rerunning all
    training puzzles
  • No single module changed overall percent correct
    by more than 1
  • Removing all modules that relied on CWDB
  • 94.8 to 27.1 correct
  • Using only the modules that relied exclusively on
    CWDB
  • 87.6 correct

22
Word list modules
  • WordList, WordListBig
  • Ignore their clues and return all words of
    correct length
  • WordList
  • 655,000 terms
  • WordListBig
  • WordList plus constructed terms
  • First and last names, adjacent words from clues
  • 2.1 million terms, all weighted equally
  • 5D 10,000 words, perhaps novelette
  • Wordlist-CWDB
  • 58,000 unique targets
  • Returns all targets of appropriate length
  • Weights with estimates of their prior
    probabilities as targets of arbitrary clues
  • Examine frequency in crossword puzzles and
    normalize to account for bias caused by letters
    intersecting across and down terms

23
CWDB-specific modules
  • Exact Match
  • Returns all targets of the correct length
    associated with the clue
  • Example error it returns eeyore for 19A Pal of
    Pooh tigger
  • Transformations
  • Learns transformations to clue-target pairs
  • Single-word substitution, remove one phrase from
    beginning or end and add another, depluralizing a
    word in clue, pluralize word in target
  • Nice X lt-gt X in France
  • X for short lt-gt X abbr.
  • X start lt-gt Prefix with X
  • X city lt-gt X capital
  • 51D Bugs chaser elmer, solved by Bugs pursuer
    elmer and the transformation rule X pursuer lt-gt X
    chaser
  • http//www.oneacross.com

24
Information retrieval modules
  • Encyclopedia
  • For each query term, compute distribution of
    terms close to query
  • Counted 10-k times every times it apears at a
    distance of klt10 from query term
  • Extremely common terms (as, and) are ignored
  • Partial match
  • For a clue c, find all clues in CWDB that share
    words
  • For each such clue, give its target a weight
  • LSI-Ency, LSI-CWDB
  • Latent semantic indexing (LSI) identifies
    correlations between words synonyms
  • Return closest word for each word in the clue

25
Database Modules
  • Movie
  • www.imdb.com
  • Looks for patterns in the clue and formulates
    query to database
  • Quoted titles 56D The Thief of Baghdad role
    abu
  • Boolean operations Cary or Lee grant
  • Music, literary, geography
  • Simple pattern matching of clue (keywords city,
    author, band, etc) to formulate query
  • 15A Foundation of Trilogy author asimov
  • Geography database Getty Information Institute

26
Synonyms
  • WordNet
  • Look for root forms of words in the clue
  • Then find variety of related words
  • 49D Chop-chop apace
  • Synonyms of synonyms
  • Forms of related words converted to forms of clue
    word (number, tense)
  • 18A Stymied thwarted
  • Is this relevant to Scrabble?

27
Syntactic Modules
  • Fill-in-the-blanks
  • gt5 clues
  • Search databases (music, geography, literary and
    quotes) to find clue patterns
  • 36A Yerbys A Rose for _ _ _ Maria ana
  • Pattern for _ _ _ Maria
  • Allow any 3 characters to fill the blanks
  • Kindof
  • Pattern matching over short phrases
  • 50 clues of this type
  • type of (A type of jacket nehru)
  • starter for (Starter for saxon anglo)
  • suffix with (Suffix with switch or sock eroo

28
Implicit Distribution Modules
  • Some targets not included in any database, but
    more probable than random
  • Schaeffer vs. srhffeeca
  • Bigram module
  • Generates all possible letter sequences of the
    given length by returning a letter bigram
    distribution over all possible strings, learned
    from CWDB
  • Lowest probability clue-target, but higher
    probability than random sequence of letters
  • Honolulu wear hawaiianmuumuu
  • How could this be used for Scrabble?

29
Questions
  • Is this approach any more intelligent than the
    chess playing programs?
  • Does the use of knowledge correspond to
    intelligence?
  • Do any of the techniques for generating words
    apply to Scrabble?
Write a Comment
User Comments (0)
About PowerShow.com