Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax

Description:

Amazon.com. Apartments.com. Cars.com. 411localte.com. MetaQuerier. 3 ... 'Lego'-like building blocks: Pattern of elements composed into conditions ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 23
Provided by: ZhenZ7
Category:

less

Transcript and Presenter's Notes

Title: Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax


1
Understanding Web Query Interfaces Best-Efforts
Parsing with Hidden Syntax
  • Zhen Zhang, Bin He and Kevin C. Chang

2
MetaQuerier Goals Exploring and integrating the
deep Web
FIND sources
QUERY sources
  • Integrator
  • source selection
  • schema integration
  • query mediation
  • Explorer
  • source discovery
  • source modeling
  • source indexing

Cars.com
Amazon.com
411localte.com
Apartments.com
The Deep Web Databases on the Web
3
Problem Source capability extraction Or, query
interface understanding.
Book sources
Music sources
4
Form understanding What are the essential tasks?
  • Output all the conditions, for each
  • Grouping elements (into query conditions)
  • Tagging elements with their semantic roles

attribute
operator
value
5
Demo summary
Query form
Understanding form structure
Multiple interpretations
6
Certainly not a trivial task - Recall the
butterfly ballot in U.S. Election 2000.
Even just grouping can be hard!
7
Baseline approach? The problem seems to be rather
heuristic in nature
  • There seem to be no clear criteria, but only
    fuzzy heuristics
  • Grouping is hard it is often n-ary
  • Heuristic Group two elements if they are close
  • But
  • Tagging is hard no semantic labeling in HTML
    forms
  • Heuristic Tag the closest text as the
    attribute
  • But
  • We need many such heuristics!
  • Goal A principled mechanism to encode and use
    the various heuristics systematically?

8
Our observation concerted structures of QI
  • Condition pattern as building blocks
  • Convergence condition patterns

9
Our insight Cope with form complexity by their
composition patterns.
  • Lego-like building blocks
  • Pattern of elements composed into conditions
  • Pattern of conditions composed into a form
  • So, how to realize our divide-and-conquer idea?
    Any computation paradigm?

Source
Q-Form
Lego Building Blocks
?
Semantic Structure
10
Our Hypothesis Existence of Hidden-Syntax
  • Query-form creation is guided by hidden syntax

Semantic Structure (Query Conditions)
Presentation (Query Interface)
Attr title Operator title words,. Value
string
Parsing is thus a principled mechanism for the
inverse
11
This language paradigm enables principled
solution to a seemingly heuristic problem
  • Essential notions Grammar and Parser
  • Grammar Pattern specification
  • Declarative
  • No need to hard-code heuristics
  • Collective
  • Capture both micro and macro patterns
  • Parser Pattern recognition
  • Global
  • Coherently interpret an entire query form
  • Systematic
  • Systematically assembles the building blocks

12
However, the hidden-syntax hypothesis itself
entails challenges in its realization
  • Hidden syntax is only hypothetical
  • We must derive a grammar in its place
  • What should be captured in a derived grammar?
  • 2P-Grammar Production Preference
  • productions for patterns preferences for their
    precedence
  • Derived grammar is secondary to any input
  • Inherently incomplete and ambiguous
  • What should be the machinery of a soft parser?
  • Best-effort Parser
  • multiple, maximal-partial parse trees

13
Our Paradigm Best-Effort Visual Language
Parsing Framework
Input HTML query form
2P Grammar
Preferences
Productions
BE-Parser
Ambiguity Resolution Error Handling
X
Output semantic structure
14
Grammar Layout based
Traditional grammar (Sequential based 1-D)
Our grammar (Layout based 2-D)
Presentation
3 5
TextCond - left(TextAttr, TextVal) Ú
above(TextAttr, TextVal) Ù above(TextVal,
TextOp)
E - E E, or E - sequential(E, , E)
Grammar
15
Parser Logic programming style
  • Traditional parsing
  • Scan input sequentially
  • Our parsing
  • Nonlinear input
  • Arbitrary constraints

Parse trees
. . .
16
Thats not all complications of hypothetical
syntax
  • Hidden syntax is only hypothetical !

Ambiguous
Incomplete
Grammar
Parser
Multiple parse trees
Partial parse trees
17
Ambiguity
TextCond Below(Attr,Selection)
  • Grammar
  • Preferences to capture the conventional
    precedence
  • eg. RButton TextCond
  • Parser
  • Just-in-time pruning by preference
  • Multiple trees possible

RButton Left(radio,text))
18
Incompleteness
  • Grammar
  • Cannot capture all patterns
  • Parser
  • Cannot interpret entire query interfaces
  • Interpret as much as possible
  • Greedily choose the maximum parse trees
  • Reasoning they look at big picture and consider
    more context

19
Error Handling Best-effort parser can output
multiple and partial parse trees
  • Union all the conditions interpreted by all the
    parse trees.
  • Report both conflicts and missing errors

Parsing
Union
20
Experiment How a global grammar will do?
  • Global grammar
  • Derived from Basic captures 21 patterns
  • 82 productions, 39 non-terminals, 16 terminals
  • Datasets
  • Basic 3 domains (Airfare, Autos, Books) 150
    sources
  • NewSource same domains, 30 sources
  • NewDomain 6 new domains (Music, ), 42 sources
  • Random 30 sources (from invisible-web.net)
  • Correctness judgment
  • Number of correctly identified (grouping and
    tagging) conditions

21
Conclusion Syntactic Parsing for Interface
Understanding
  • Query interface understanding by syntactic
    parsing with hidden grammars
  • Insight
  • Exploit how semantics connects to presentation,
    in a syntactic way
  • Future work
  • Constructing grammar automatically
  • Developing more sophisticated preference
    framework
  • Extending the framework to other applications

22
Thank you !
  • For more information
  • Online demo at MetaQuerier project Web site
  • http//metaquerier.cs.uiuc.edu
  • Invite you to our MetaQuerier demo in the
    afternoon
Write a Comment
User Comments (0)
About PowerShow.com