RisuQL: Boolean Query Language for Image Data - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

RisuQL: Boolean Query Language for Image Data

Description:

Since 2002, I've been running RisuPicWeb, an image database website: risukun.com ... Could be expanded to other WordNet link types, such as antonym, etc. ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 29
Provided by: brendan46
Category:

less

Transcript and Presenter's Notes

Title: RisuQL: Boolean Query Language for Image Data


1
RisuQLBoolean Query Language for Image Data
  • Brendan Elliott
  • 2006/03/30

2
Background
  • Since 2002, Ive been running RisuPicWeb, an
    image database website risukun.com
  • Grown from 1,600 to 45,000 images
  • Investigating ways to manage this data
  • Multiple semantic hierarchies
  • Keyword search (captions, comments, etc.)
  • Content-based image retrieval

3
Hierarchies
Places
People
Animals
Plants
North America

Mammals
Family
Trees
Grass
CWRU
Deer
Brendan
Top of the Hill
4
RisuQL Overview
  • Data available for querying
  • Image metadata
  • Locations in semantic hierarchies (is-a or
    part-of)
  • Text in captions, descriptions, comments, and
    inherited from hierarchies
  • Goal (1) Design a Boolean query language support
    all three
  • Goal (2) Integrate an existing large ontology

5
Progress Overview
  • Focus on RisuQL implementation
  • Parser completed
  • Abstract syntax tree constructed
  • SQL generation system working
  • Most language features implemented

6
Parser
  • Use parser generator or write my own?
  • Examined existing systems
  • Lex/Yacc
  • GOLD parsing system
  • But, wanted a native C parser
  • Also, wanted an error correcting/error hiding
    parser (users make mistakes and dont want to
    show lots of parse errors)
  • RisuQL not a large language -gt wrote my own
  • Used an existing C tokenizer

7
Parser classes
BooleanExpression
OrExpression
AndExpression
NegativeExpression
ParenthesizedExpression
BooleanCondition
BooleanConditionList
KeywordCondition
FieldCondition
HierarchyCondition
AttributeCondition
KeywordPhrase
8
Boolean Condition List
  • Initial syntax didnt allow the most basic type
    of search a list of keywords without typing
    AND/ORs
  • Dont want to give an explicit error if AND/OR
    not specified between other types of conditions
  • Idea allow lists of boolean conditions without
    AND/OR Realized that negation parenthesized
    expressions should be boolean conditions so they
    could be in the list, too
  • Imply OR keyword keyword phrase
  • Imply AND others

9
SQL Generation
  • Data already in relational database
  • Strategy implement RisuQL with SQL
  • Syntax tree classes have a common SQL generation
    interface
  • First attempted implementation using SQL
    AND/OR/NOT in one query -gt cross joins!
  • Implemented sub-expressions as uncorrelated
    nested subqueries
  • Common interface return nodeIds

10
Result Scoring
  • If generators sort-by option is set to score,
    then a score value is also returned
  • Keywords score comes from support relevancy
    feedback scores
  • Hierarchy/Attributes score is 1.0 (0.0 implied
    if node not returned)

11
Evaluating AND/OR/NOT
  • AND
  • INNER JOIN between subqueries
  • Score is simple addition
  • OR
  • FULL OUTER JOIN
  • COALESE() used with nodeIds scores
  • NOT
  • LEFT JOIN on all nodeIds where subquerys nodeId
    is NULL
  • Score is 1.0

12
Evaluating Hierarchy Conditions
  • Folders pre-processed to locate target folder
  • Node instances have string labels based on
    counting children of a parent node
  • Ex 37.4.3.5.
  • Find all subtree children label LIKE _
  • Exact folder queries turn into simple parent
    equals condition

13
Evaluating Attribute-Value Conditions
  • Take the form ltattribgt ltopgt ltvaluegt
  • Different ltattribgt map very differently in
    database
  • Simple relational fields values (ex name)
  • Name-value lookup tables (ex metadata)
  • Evaluated via a subquery (ex children)
  • AttributeTypeManager maps names to types and maps
    canonical names
  • Some attributes have type, so user input must be
    processed to avoid generating bad SQL
  • Operators gt, lt, , !, gt, lt, CONTAINS, STARTS
    WITH, ENDS WITH, LIKE

14
Supported attributes
  • Node table name, date_created
  • Tag table tag (type), leaf
  • File table filename, file_date_created,
    file_date_modified, file_date_discovered,
    file_date_last_updated
  • File metadata date_picture_taken, aperture,
    max_aperture, camera_make, camera_model,
    brightness, exposure_bias, expose_time, flash,
    focal_length, iso, metering_mode, shutter_speed,
    width, height, orientation, sensing_method,
    color_space
  • Node attributes recursive_files,
    recursive_folders, recursive_size, file_size,
    highlight
  • Calculated visits, comments, url_referrers,
    files, folders

15
Example (titlewatashi OR mayumi) AND
/Plants/Trees
  • SELECT n1., i2., t3., cond4.score
  • FROM Nodes n1, NodeInstances i2, NodeTags t3,
  • (SELECT exp1.nodeId, exp1.score exp2.score AS
    score FROM
  • (SELECT pexp1.nodeId, pexp1.score FROM
  • (SELECT COALESCE(exp1.nodeId, exp2.nodeId) AS
    nodeId, COALESCE(exp1.score, 0.0E0)
    COALESCE(exp2.score, 0.0E0) AS score FROM
  • (SELECT phrsnk1.nodeId, ((phrsnk1.supportScore
    phrsnk1.rateScore phrsnk1.clickScore)/3.0) AS
    score FROM NodesToKeywords phrsnk1, Keywords
    phrsk2 WHERE phrsnk1.keywordId phrsk2.keywordId
    AND phrsk2.name 'watashi' AND
    phrsnk1.titleSupport gt 0 )
  • AS exp1
  • FULL OUTER JOIN
  • (SELECT nk1.nodeId, ((nk1.supportScore
    nk1.rateScore nk1.clickScore)/3.0) AS score
    FROM NodesToKeywords nk1, Keywords k2 WHERE
    nk1.keywordId k2.keywordId AND k2.name
    'mayumi' )
  • AS exp2 ON exp1.nodeId exp2.nodeId )
  • AS pexp1)
  • AS exp1
  • INNER JOIN
  • (SELECT DISTINCT ni1.nodeId, 1.0E0 AS score FROM
    NodeInstances ni1 WHERE ni1.nodeLabel LIKE
    '45.1._' )
  • AS exp2 ON exp1.nodeId exp2.nodeId )
  • AS cond4
  • WHERE n1.primaryNodeInstanceId
    i2.nodeInstanceId AND n1.tagId t3.tagId AND
    n1.hidden 0 AND n1.nodeId cond4.nodeId ORDER
    BY score DESC

16
User Interface Work
  • Integrated RisuQL into the main search interface
    on test server http//www.risukun.com8000
  • Created a separate prototype of using
    auto-complete for typing hierarchy names at
    http//elliott.cwru.edu/AtlasTest/

17
Remaining Work
  • Debugging refinement of RisuQL
  • Implement // in hierarchy paths
  • WordNet integration tasks
  • Import additional links
  • Created WordNet browsing interface
  • Implement synonymltwordgt operator in RisuQL for
    fuzzy search
  • More AJAX (integrate auto-complete)
  • More options for viewing search results

18
Demo
19
Questions?
20
Backup Slides
21
Language Basics
  • Logic operators AND, OR, NOT
  • (Parentheses)
  • Keyword search trees AND mammals
  • Metadata camera-make Canon
  • In a hierarchy /Animals/Mammals/Squirrel

22
Keywords
  • Syntax ltkeywordgt
  • Implied joining operator OR
  • Searching within special fields
  • ltfieldgtltvaluegt
  • Fields title, comment, note, inherited
  • Ex titleTekin, commentmore spam

23
Metadata Attributes
  • Syntax ltfieldgt ltopgt ltvaluegt
  • Basic fields name, date-created, tag/type,
    data-picture-taken, flash, iso, shutter-speed,
    width, height, file-extension, etc.
  • Aggregate fields visits, comments, files,
    folders, url-referrers
  • ltopgt gt lt gt lt !
    ltgt
  • ltvaluegt ltstring no spacegt ltstringgt

24
Hierarchy Folders
  • In a sub-tree lthierarchy pathgt
  • In an exact folder lthierarchy pathgt
  • A hierarchy path is a set of steps and folder
    names, i.e. /Animals/Mammals/Squirrels
  • Path abbreviation Descendant Axis //
  • //ltfolder namegt finds a folder with the correct
    name that is a descendant (child, grandchild,
    etc.) of previous step
  • Ex /Places//CWRU, /People//CWRU,
    //Squirrels

25
Keyword Relevancy Scores
  • Support score title, notes, comments, inherited
  • User feedback score passive, active
  • Combining scores from multiple keywords
  • AND find average keyword score
  • OR find max keyword score

26
WordNet Overview
  • "WordNet is an online lexical reference system
    whose design is inspired by current
    psycholinguistic theories of human lexical
    memory. English nouns, verbs, adjectives and
    adverbs are organized into synonym sets, each
    representing one underlying lexical concept.
    Different relations link the synonym sets."
    (http//wordnet.princeton.edu/)

27
WordNet Features
  • Nouns are organized into hierarchies
  • is-a
  • part-of
  • member-of
  • substance-of
  • Word synonyms grouped by meaning (synsets) can be
    used to find similar words

28
WordNet Integration
  • In progress Loading WordNet hierarchies into
    PicWeb to augment my ad-hoc hierarchies
  • Browse WordNet with implied keyword image search
  • Find images containing keywords of a synset
  • Provide UI to allow user to add images to the
    WordNet hierarchy folder(s)
  • Integration into RisuQL
  • synonymltkeyword(s)gt (keyword search for
    synsets, and search using the keywords associated
    with the synsets)
  • Could be expanded to other WordNet link types,
    such as antonym, etc.
Write a Comment
User Comments (0)
About PowerShow.com