XPath and Beyond: Formal Foundations - PowerPoint PPT Presentation

About This Presentation
Title:

XPath and Beyond: Formal Foundations

Description:

XPath and Beyond: Formal Foundations. Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA ... mutual induction (duality between paths-qualifiers) ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 36
Provided by: jeanyvesvi
Category:

less

Transcript and Presenter's Notes

Title: XPath and Beyond: Formal Foundations


1
XPath and Beyond Formal Foundations
  • Jean-Yves Vion-Dury Xerox Research Centre
    Europe / INRIA

Pierre Genevès INRIA
2
Roadmap Part 1
  • XPath a cornerstone of the XML architecture
  • Theory and Engineering
  • Some key problems
  • The trends around XPath theoretical studies
  • A Logic Based approach
  • Mathematical Characterization
  • Why using the Coq Proof Assistant ?

3
XPath a cornerstone of the XML architecture
  • Expresses both node selection and/or structural
    properties
  • Currently used in XSLT, XQuery, XML Schema,
    XLink, XPointer,
  • XPath is elegant, compact, effective and powerful
  • Claim will be increasingly used and studied in
    the future
  • Indexing large document bases
  • Checking integrity constraints / global
    structural properties
  • Linking increasing document volumes

4
Theory and Engineering in Computer Sciences
  • Some decades ago, some theoretical studies
    prepared engineering
  • The relational algebra enabled a huge market
    around data storage and access
  • Information Theory prepared digital processing
    (networks, image and sound processing,
    compression algorithms,)
  • Linguistic, Logic and Formal mathematics prepared
    programming languages
  • A Strange situation today around documents
  • W3C Standardization activities produce
    specifications, and many problems remain open
  • Some theoreticians try to capture problems and to
    understand underlying issues, long after the
    publication of the specifications!
  • This induces new difficulties and requires
    different approaches
  • In order to deal with low level issues, closed
    from implementations
  • In order to face complexity of systems

5
Some Key Problems around XPath
  • Formal semantics definition
  • Formal Model of Documents (trees, streams,
    graphs, strings,?)
  • Precise, useful and simple Denotational/Operationa
    l semantics
  • Type checking
  • Constraints on Document structure (tree grammars,
    graph grammars, pattern matching)
  • Valid/Invalid Path expression with respect to a
    particular schema
  • Rewriting path expressions
  • In order to customize compilation/interpretation
  • Normalization
  • Optimization
  • Reduction of the complexity of suitable models
  • Simplifying expressions while preserving
    semantics
  • Equivalence p1 p2
  • gives a fundamental understanding of the
    language
  • Containment p1 p2
  • Gives an even more fundamental view
  • Key inference If p is a key for a schema S, then
    all p such that p p are keys too

6
Linking Key Problems around XPath
  • Invalid expression and containment
  • p ??
  • Rewriting and equivalence
  • (p1 p2)/p -gt p1/p p2/p and
  • (p1 p2)/p p1/p p2/p
  • Optimization and containment
  • If p1 p2 then (p1 p2)/p -gt p2/p
  • Equivalence and containment
  • p1 p2 iff p1 p2 and p2 p1
  • Containment and type checking
  • Structural constraints can be captured in XPath
    expression
  • Structural Constraint satisfaction can thus be
    checked

7
The problem of containment (expression)
8
The problem of typed containment (expression)
9
The Trends around XPath Theoretical Studies
10
A Logic Based Approach
  • A set of axioms to reason on terms comparison
  • As opposed to model based approaches
  • A partial equivalence relation to minimize the
    axiom set
  • fully congruent (e.g. p1 p2 and p1p3
    implies p3 p2)
  • Theorems for simplifying the containment proofs
  • E.g. reflexivity, transitivity
  • Drawback syntactic level
  • more combinatorial as opposed to model based
    approaches
  • Advantage syntactic level
  • more extensible, provided the previous point is
    addressed
  • Gives more indication on the underlying issues
    due to language peculiarities

11
XPath abstract syntax (Wadler99,Olteanu01)
12
Denotational semantics (Wadler99Olteanu01)
13
Denotational semantics (Wadler99Olteanu01)
14
Denotational semantics (Wadler99Olteanu01)
15
Basic axioms
16
Union Intersection
17
Qualifiers
18
The equivalence relation ( Olteanu01)
19
Using equivalence in proofs
20
Mathematical Characterization
  • Soundness of the equivalence
  • Soundness of rules (e.g.)
  • Completeness of rule system (e.g.)

21
Why Using the Coq Proof Assistant ?
  • Coq http//coq.inria.fr is a Proof Assistant
    based on the Calculus of Inductive Constructions
  • Higher Order Logic
  • Constructive Logic
  • Typed
  • To address the complexity problem related to
    proofs
  • To benefit from the help of the Proof Assistant
    in case analysis
  • To maintain all the mathematical architecture
    along exploratory work
  • To work in a rigorous frame
  • To produce rock solid and readable results
  • The challenge
  • Require powerful data structure modelling
    capabilities
  • Learning Coq is an additional difficulty !
  • Developing a proof in Coq is more demanding
  • But
  • Coq is quite mature now (v8.0, 25 years of
    research !) and very expressive

22
Roadmap Part 2
  • Modelling XPath using inductive constructions
  • Formal Semantics and interpretations
  • Interpreter based on the denotational semantics
  • A relational semantics for XPath
  • Modelling the containment relation
  • Using the proof system containment checking
  • Current work on characterization
  • Methodology and expected outcomes

23
Modelling XPath using inductive constructions
  • Paths are defined inductively
  • void (?), top (?) are atoms
  • / ? are binary constructors
  • involves qualifiers
  • _true, _false are atoms
  • and, or, not constructors
  • leq (?) a cross-inductive definition
  • Functional notation, example
  • a/bc
  • slash a (qualif b c)

24
Interpreter based on the denotational semantics
  • Evaluates a path p from the context node x of the
    tree t
  • The evaluation of a path returns a set of nodes
  • Cross-Recursive and terminating functions
  • The evaluation of a qualifier returns a boolean

25
Need for a logic-based semantics
  • The classical semantics describes an interpreter
    that computes nodesets
  • This computational vision leads to useless
    complexity in proofs
  • Is there another way to capture XPath Semantics?

26
A Relational Semantics for XPath
  • An Interpretation of paths in First-Order Logic
  • A path is translated into a dyadic formula
  • Rp holds for all pairs (x,y) of nodes such that y
    is accessed from x through the path p.
  • Advantages
  • interpretations of paths and qualifiers are
    unified
  • Direct translation in Coq

Sem math du papier
27
Modelling the containment relation (1)
  • A binary logical relation Ple
  • Gathers all containment rules in a single
    inductive construction
  • Suited for using Coqs built-in tactics
    (constructor, inversion)

28
Modelling the containment relation (2)
  • The containment relation for paths
  • Is inductive
  • Is defined using its dual relation ? for
    qualifiers (Qipl)

29
Using the proof system Containment Checking
  • We have modelled
  • XPath terms
  • Their interpretation
  • The containment relation (that gathers our
    containment axioms)
  • We can now check containment facts with the proof
    engine
  • Demo of a tactical which proves the fact
  • .//b ./descendantb
  • Underlying goal extend the tactical in order to
    automatize the checking of all containment facts

30
Proving Properties Characterization
  • Proving the equivalence of semantics (done)
  • Current work proving the validity of our
    axiomatization
  • Soundness
  • Completeness
  • Finding relevant induction schemes
  • mutual induction (duality between
    paths-qualifiers)
  • Induction on a measure of the term complexity
  • Finding generic and modular Coq tactics (to
    reduce combinatorial issues)

31
Methodology and Possible outcomes
Extend the fragment
Inductive Relation Ple
Fix wrong rules
Add missing rules
Not Sound
Sound
Incomplete
Complete
Intrinsically Incomplete
Undecidable
Decidable
why?
Algorithm
Incomplete Algorithm
Undecidable
Decidable
why?
32
Conclusion
  • We proposed a Logic based framework for static
    analysis of XPath
  • Modelling with inductive constructions (XPath
    terms and interpretations, Containment Relation)
  • Preliminary result a simpler semantics
  • Ongoing Work on Characterization

33
Backup slides
  • Applications

34
Some Applications (1)
  • Optimization of XPath queries
  • Detecting contradictions (p void)
  • Eliminating redundancies
  • Example
  • //a/b/c and descendantb /descendanta/b
    /c
  • /b/c gt descendantb
  • An optimization not currently achieved at runtime
    by XPath engines

Xalan C
35
Some Applications (2)
  • Static Analysis of XPath host languages
  • Example XSLT
  • Checking XSLT stylesheets
  • Optimization of XSLT stylesheets
  • Extending XPath expressive power with an
    inclusion constraint pp1 ? p2
  • Integrity Constraint-Checking
  • id(//book/_at_authors) ? //persons/_at_name
  • Transformation languages strongly based on XPath
Write a Comment
User Comments (0)
About PowerShow.com