Slides courtesy of Serge Abiteboul - PowerPoint PPT Presentation

Loading...

PPT – Slides courtesy of Serge Abiteboul PowerPoint presentation | free to download - id: 81d5b6-MGY4Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Slides courtesy of Serge Abiteboul

Description:

Typing semistructured data Slides courtesy of Serge Abiteboul Web Data Management Typing semistructured data – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 111
Provided by: Serge82
Learn more at: http://www.cs.bgu.ac.il
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Slides courtesy of Serge Abiteboul


1
Typing semistructured data
  • Slides courtesy of Serge Abiteboul
  • Web Data Management

2
Organization
  • Motivations
  • Automata
  • Automata on words
  • Ranked tree automata
  • Unranked tree automata
  • Automata and monadic second-order logic
  • Automata to compute
  • XML typing DTD, XML schema
  • Graphs and bisimulation

3
Motivation
4
XML typing
  • Not compulsory
  • Simplify writing software for XML
  • Improve interoperability between programs
  • Improve storage and performance
  • Ease of querying data guide
  • Simplify data protection
  • Reject illegal update like relational
    dependencies

5
Improve storage
Lower-bound schema
Store rest in overflow graph
6
Improve performance
select X.title from Bib._ X where X..zip
12345
select X.title from Bib.book X where
X.address.zip 12345
7
Type checking
  • Who checks
  • XML editor check that the data conforms to its
    type
  • XML exchange, e.g., with Web service
  • Server when delivering the data
  • Client/application when receiving it
  • Dynamic verification after the data is produced
  • Static verification verification of the program
    that generates the data

8
Static verification
  • Input input type T and code of function f
  • f is Xquery, Xpath, XSLT, etc.
  • Verification of T
  • Is it true that ?dT, f(d)T ?
  • Type inference
  • Find the smallest T such that ?dT, f(d)T
  • Rapidly undecidable because of joins

9
Example
  • for p in doc("parts.xml)//partcolorred"
  • return ltpartgt
  • ltnamegtp/name/text()lt/namegt
  • ltdescgtp/desc/node()lt/descgt
  • lt/partgt
  • Result type
  • (part (name (string) desc (any) )
  • If the type of parts.xml//part/desc is string
  • (part (name (string) desc (string) )

10
Difficulty
  • for X in Input, Y in Input do print ( ltb/gt
  • Input lta/gt lta/gt
  • Result ltb/gt ltb/gt ltb/gt ltb/gt
  • Problem bi ? in2 for n 0 cannot be
    described in XML schema
  • There is no  best  result
  • b
  • ? b2 b
  • ? b2 b4b
  • ? b2 b4 b9b

11
Why tree automata?
  • XML unranked trees
  • No theory for XML
  • Rich theory for strings Automata
  • Extend to
  • rich theory for ranked trees Tree automata
  • Nice algorithms
  • Nice theorems
  • Can this carry to unranked trees and XML?
  • Yes!

12
From strings to trees
a
a
a
b
b
b
b
b
a
b
b
b
b
b
b
a
b
a
a
a
a
b
a
a
b
b
b
b
Word Binary tree
Unranked tree automata Finite State
Ranked tree automata no bound on number of
children Automata
13
Why not then use unranked tree automata?
  • Missing practical gadgets
  • Complexity of verification
  • Goal typing at reasonable cost

14
Automata
  • Automata on words

15
Finite state automata on words
Transitions
Alphabet
State
Initial state
Accepting states
16
Nondeterministic automaton Example
a b a a b -
a b a -
q0
q0
q0
q0
q0
q0
q0
q0
q0
q0
q2
q1
q1
q1
q1
q1
OK
KO
17
Reminder
  • Deterministic
  • No ? transition
  • No alternative transitions such as
  • Determinization
  • It is possible to obtain an equivalent
    deterministic automaton
  • State of new automaton set of states of the
    original one
  • Possible exponential blow-up
  • Minimization
  • Limitations cannot do
  • Context-free languages
  • Essential tool e.g., lexical analysis

18
Reminder (2)
  • L(A) set of words accepted by automata A
  • Regular languages
  • Can be described by regular expressions, e.g.
    a(bc)d
  • Closed under complement
  • Closed under union, intersection
  • Product automata with states (s,s)
  • where s is from A and s is from A

19
Automata on words versus trees
a
Bottom up
Top down
Left to right
b
b
b
b
a
a
b
b
a
a
Right to left
a
b
No difference
Differences
20
Automata
  • Automata on ranked trees

21
Binary tree automata
  • Parallel evaluation
  • For leaves
  • For other nodes

q2
a
Bottom up
q1
q
b
b
b
a
b
a
q
q
q
q
a
b
q
q
22
Bottom-up tree automata
  • Bottom-up if a node labeled a has its children
    in states q, q then the node moves
    nondeterministically to state r or r
  • Accepts is the root is in some state in F
  • Not deterministic if alternatives or
    ?-transitions

23
Example deterministic bottom-up
24
Boolean circuit evaluation
OK
25
Regular tree language set of trees accepted by
a bottom-up tree automaton
26
Regular tree languages
  • The following are equivalent
  • L is a regular tree language
  • L is accepted by a nondeterministic bottom-up
    automaton
  • L is accepted by a deterministic bottom-up
    automaton
  • L is accepted by a nondeterministic top-down
    automaton
  • Deterministic top-down is weaker

27
Top-down tree automata
  • Top-down if a node labeled a is in state q,
    then its left child moves to state q (right to
    q)
  • Accepts is all leaves are is in states in F
  • Not deterministic if

28
Why deterministic top-down is weaker?
  • Consider the language
  • L f(a,b), f(b,a)
  • It can be accepted by a bottom-up TA
  • Exercise write a BUTA A such that L L(A)
  • Suppose that B is a deterministic top-down TA
    with L L(B)
  • Exercise Show that B also accepts f(a,a)
  • A contradiction
  • Fact No deterministic top-down tree automata
    accepts L

29
Ranked trees automata Properties
  • Like for words
  • Determinization
  • Minimization
  • Closed under
  • Complement
  • Intersection
  • Union

30
But
  • XML documents are unranked
  • book (intro,section,conclusion)

31
Automata
  • Automata on unranked tree

32
Unranked tree automata
Issue represent an infinite set of
transitions Solution a regular language
33
Unranked tree automata (2)
  • Rule
  • Meaning if the states of the children of some
    node labeled a form a word in L(Q), this node
    moves to some state in r1,,rm

34
Building on ranked trees
a
a
b
b
a
b
b
b
a
b
b
b
a
b
b
b
a
b
  • Ranked tree FirstChild-NextSibling
  • F encoding into a ranked tree
  • F is a bijection
  • F-1 decoding

35
Building on bottom-up ranked trees (2)
  • For each Unranked TA A, there is a Ranked TA
    accepting F(L(A))
  • For each Ranked TA A, there is an unranked TA
    accepting F-1(L(A))
  • Both are easy to construct
  • Consequence Unranked TA are closed under union,
    intersection, complement

36
Determinization
  • Determinization always possible for bottom-up
  • Can we use the FirstChild-NextSibling encoding
  • No it does not preserve determinism

37
Top-down?
  • This is more delicate
  • Transition ?(a,q)A(a,q)
  • The state of the automata A(a,q) when reading the
    labels of the children of a node labeled a
    determines the states of the children of that
    node
  • Accepts if all the leaves are in accepting state

38
Boolean circuit evaluation
v
A tree is accepted if, for some possible run, the
states of all leaves are final
v
v
v
0
1
0
v
1
v
0
1
1
1
v
1
v
1
1
0
1
39
Automata
  • Automata and
  • monadic second-order logic

40
Monadic second-order logic
  • Representation of a tree as a logical structure
  • E(1,2), E(1,3) E(3,9)
  • S(2,3), S(3,4), S(4,5)S(8,9)
  • a(1), a(4), a(8)
  • b(2), b(3), b(5), b(6), b(7), b(9)

41
Monadic second-order logic
  • E(1,2), E(1,3) E(3,9)
  • S(2,3), S(3,4), S(4,5)S(8,9)
  • a(1), a(4), a(8)
  • b(2), b(3), b(5), b(6), b(7), b(9)
  • MSO syntax

Quantification over a set variable
Set variable
42
Example of MSO
  • Each a node has a b-descendant
  • This corresponds to the formula
  • For each node x labeled a each set X that
    (?)?contains x and that (?) is closed under
    descendant, X contains some y labeled b

43
Bridge
  • Theorem for a set L of trees, the following are
    equivalent
  • L L(A) for some bottom-up tree automata A
  • i.e. L is definable with bottom-tree automata
  • L T T satisfies ? for some MSO formula ?
  • i.e. L is definable in MSO

44
XML typing
  • DTDs

45
DTD
  • Describe the children of a node of a label a by a
    regular expression
  • Syntax
  • lt!ELEMENT populationdata (continent) gt
  • lt!ELEMENT continent (name, country) gt
  • lt!ELEMENT country (name, province)gt
  • lt!ELEMENT province (name, city) gt
  • lt!ELEMENT city (name, pop) gt
  • lt!ELEMENT name (PCDATA) gt
  • lt!ELEMENT pop (PCDATA) gt

46
DTD and deterministism
  • Regular expressions in DTD should be
    deterministic
  • Complicated definition
  • Intuition the corresponding automata should be
    deterministic
  • (ab)a is not
  • When reading ltagt, one cannot tell whether it is
    an a from (ab) or if it is the a of the end
  • (ba)(ba) is an equivalent expression that is
    deterministic

47
Very efficient validation
  • It suffices to verify for each node a that the
    word formed by the labels of its children is
    accepted by the finite state automata Aa
  • Possible to type check the document while
    scanning it, e.g. with SAX parser

48
Very efficient validation (2)
  • lt!ELEMENT a ( b c ) gt
  • lt!ELEMENT b ( d ) gt

ltagtltbgtltd/gtltd/gtlt/bgtltc/gtlt/agt
a
b
c
d
d
s
t
u
Aa
s
t
b
c
s
t
u
Accept
d
s
t
Ab
d
49
Warning
  • The previous example can be checked with a simple
    automata on words
  • But not the following one
  • lt!ELEMENT part ( part ) gt
  • The stack is needed for accepting
  • ltagtltagtlt/agtlt/agt
  • n ltagt n lt/agt

50
Some bad news for DTD
  • Not closed under union
  • DTD1
  • lt!ELEMENT used( ad) gt
  • lt!ELEMENT ad ( year, brand )gt
  • DTD2
  • lt!ELEMENT new( ad) gt
  • lt!ELEMENT ad ( brand )gt
  • L(DTD1) ? L(DTD2) cannot be described by a DTD
    but can be described easily by a tree automata
  • Problem with the type of ad that depends of its
    parent
  • Also not closed under complement
  • Limited expressive power

51
Car example continued
  • The best DTD we can choose does not distinguish
    between ads for used and new cars
  • lt!ELEMENT ad (year?, brand) gt

Car
Used
New
Brand
Year
Brand
Renault
2008
BMW
52
Decoupled types in XML schema
  • Each type corresponds to a label, not conversely
  • car car ( used new )
  • used used (ad1)
  • new new (ad2)
  • ad1 ad (year, brand)
  • ad2 ad (brand)
  • The tags are in green type names in blue
  • Nice closure properties
  • Many other  gadgets in XML schemas

53
XML typing
  • XML Schemas

54
XML Schema
  • Often criticized unnecessarily complicated
  • Boosted by Web services
  • Richer than DTD decoupled types
  • Deterministic top-down tree automata (close to)
  • XML schemas are extensible
  • Many other useful functionalities
  • Namespaces
  • Atomic types
  • Integrity constraints, etc.

55
An XML schema is an XML document
  • Since it is an XML syntax, it can use XML tools
  • Editor
  • Type checker
  • Etc.
  • The type of all XML schemas can be described with
    an XML schema

56
lt?xml version"1.0" encoding"utf-8"?gt
ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch
ema" targetnamespace"http//www.net-
language.com"gt ltxselement name"book"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"title" type"xsstring"/gt
ltxselement name"author"
type"xsstring"/gt ltxselement
name"character"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"name"
type"xsstring"/gt ltxselement
name"friend-of" type"xsstring"
minOccurs"0" maxOccurs"unbounded"/gt
ltxselement name"since"
type"xsdate"/gt ltxselement
name"qualification" type"xsstring"/gt
lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt ltxsattribute
name"isbn" type"xsstring"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xsschemagt
57
Simple elements and atomic types
  • Definition ltxselement name"xxx"
    type"yyy"/gt
  • with common types
  • xsstring xsdecimal xsinteger xsboolean
    xsdate xstime
  • Examples
  • ltxselement name"lastname" type"xsstring"/gt
  • ltxselement name"age" type"xsinteger"/gt
  • ltxselement name"dateborn" type"xsdate"/gt
  • Instances of such elements
  • ltlastnamegtRefsneslt/lastnamegt
  • ltagegt34lt/agegt
  • ltdateborngt1968-03-27lt/dateborngt

58
Attributs
  • Definition ltxsattribute name"xxx"
    type"yyy"/gt
  • Example
  • ltxsattribute name"lang" type"xsstring"/gt 
  • Instance of such attribute
  • ltlastname lang"EN"gtSmithlt/lastnamegt

59
Complex elements
  • Empty element
  • ltproduct pid"1345"/gt
  • Contains only other elements
  • ltemployeegt ltfirstnamegtJohnlt/firstnamegt
    ltlastnamegtSmithlt/lastnamegt lt/employeegt
  • Contains only text
  • ltfood type"dessert"gtIce creamlt/foodgt
  • Contains both elements and text
  • ltdescriptiongt It happened on ltdate
    lang"norwegian"gt 03.03.99lt/dategt ....
    lt/descriptiongt

60
Restriction of simple elements
  • ltxselement name"age"gt
  • ltxssimpleTypegt
  • ltxsrestriction base"xsinteger"gt
    ltxsminInclusive value"0"/gt
  • ltxsmaxInclusive value"100"/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • lt/xselementgt
  • Other restrictions enumerated types, patterns,
    etc.

61
Restriction on complex elements
  • ltxselement name"person"gt
  • ltxscomplexTypegt
  • ltxssequencegt
  • ltxselement name"firstname" type"xsstring"/gt
    ltxselement name"lastname" type"xsstring"/gt
    lt/xssequencegt
  • lt/xscomplexTypegt
  • lt/xselementgt

62
Possible to name a type
  • ltxselement name"employee"gt
  • ltxscomplexTypegt ltxssequencegt ltxselement
    name"firstname" type"xsstring"/gt ltxselement
    name"lastname" type"xsstring"/gt lt/xssequencegt
  • lt/xscomplexTypegt
  • lt/xselementgt
  • Only the "employee" element can use the specified
    complex type
  • (ltsequencegt indicates an order on child
    elements)
  • Alternative
  • ltxselement name"employee" type"personinfo" /gt
  • ltxscomplexType name"personinfo"gt
  • ltxssequencegt ltxselement name"firstname"
    type"xsstring"/gt ltxselement name"lastname"
    type"xsstring"/gt lt/xssequencegt
  • lt/xscomplexTypegt

63
Other gadgets
  • Import of types associated to a namespace
  • ltimport nameSpace "http// ..."
  • schemaLocation "http// ..."
    /gt
  • Possible to include an existing schema
  • ltinclude schemaLocation"http// ..."/gt
  • Possible to extend/redefine an existing schema
  • ltredefine schemaLocation"http// ..."/gt
  • .... Extensions ...
  • lt/redefinegt

64
Example a DTD
  • lt!ELEMENT EMAIL (TO, FROM, CC, BCC, SUBJECT?,
    BODY?)gt
  • lt!ATTLIST EMAIL
  • LANGUAGE (WesternGreekLatinUniversal)
    "Western"
  • ENCRYPTED CDATA IMPLIED
  • PRIORITY (NORMALLOWHIGH) "NORMAL"gt
  • lt!ELEMENT TO (PCDATA)gt
  • lt!ELEMENT FROM (PCDATA)gt
  • lt!ELEMENT CC (PCDATA)gt
  • lt!ELEMENT BCC (PCDATA)gt
  • lt!ATTLIST BCC
  • HIDDEN CDATA FIXED "TRUE"gt
  • lt!ELEMENT SUBJECT (PCDATA)gt
  • lt!ELEMENT BODY (PCDATA)gt
  • lt!ENTITY SIGNATURE "Bill"gt

65
The same in a variant of XML schema (more verbose)
  • lt?xml version"1.0" ?gt
  • ltSchema name"email" xmlns"urnschemas-microsoft
    -comxml-data"
  • xmlnsdt"urnschemas-micros
    oft-comdatatypes"gt
  • ltAttributeType name"language"
  • dttype"enumeration"
    dtvalues"Western Greek Latin Universal" /gt
  • ltAttributeType name"encrypted" /gt
  • ltAttributeType name"priority"
    dttype"enumeration" dtvalues"NORMAL LOW HIGH"
    /gt
  • ltAttributeType name"hidden" default"true" /gt
  • ltElementType name"to" content"textOnly" /gt
  • ltElementType name"from" content"textOnly" /gt
  • ltElementType name"cc" content"textOnly" /gt
  • ltElementType name"bcc" content"mixed"gt
  • ltattribute type"hidden" required"yes" /gt
  • lt/ElementTypegt
  • ltElementType name"subject" content"textOnly"
    /gt
  • ltElementType name"body" content"textOnly" /gt
  • ltElementType name"email" content"eltOnly"gt
  • ltattribute type"language" default"Western" /gt
  • ltattribute type"encrypted" /gt

66
Where to place XML schemas
Tree automata
Deterministic
. top-down tree automata
  • Some bizarre restriction
  • Inside an element, no two types with the same tag
  • Closer to DTDs than to tree automata
  • Efficient type validation

XML schema
DTD
67
Exercise coupled vs decoupled
  • Write a realistic DTD1 for new cars
  • With make, model, engine
  • Write a realistic DTD2 for used cars
  • Also year, miles, zipcode
  • Write an XML schema for L(DTD1) ? L(DTD2)
  • Using decoupled schema

68
Automata
  • Automata to compute

69
Another use of automata XPATH x in //a/b
b
(0)
a
a
a
b
a
b
x
x
b
NFA
DFA
70
Example //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
b
NFA
DFA
71
Example //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
b
NFA
DFA
72
Example //a/b
b
(0)
a
(01)
(01)
a
a
b
(02)
a
b
x
x
x
b
NFA
DFA
73
Example //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
74
Example //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
75
Example //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
76
Example //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
77
Example //a/b
b
(0)
a
(01)
(02)
a
a
b
x
a
b
x
x
x
b
NFA
DFA
78
Example //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
a
b
x
x
x
b
NFA
DFA
79
Example //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
(02)
a
b
x
x
x
b
x
NFA
DFA
80
Example //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
a
b
x
x
x
b
x
NFA
DFA
81
Example //a/b
b
(0)
a
(01)
(02)
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
82
Example //a/b
b
(0)
a
(01)
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
83
Example //a/b
b
(0)
a
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
84
Determinization exponential blow up
//a///b
85
Proposal k-pebble transducers
stack
milo,suciu,vianu
86
k-pebble transducers result
Capture a core aspect of Xquery but not the data
management part
87
Graphs and bisimulation
88
Graph
  • Graph semistructured data
  • Graph simulation
  • Graph bisimulation
  • Data guides

89
Semistructured data Labeled graph
  • Possibly a root in red

r
employee
employee
employee
employee
employee
employee
employee
employee
manages
manages
manages
manages
manages
p8
p1
p2
p3
p4
p5
p6
p7
managedby
managedby
managedby
managedby
managedby
worksfor
worksfor
worksfor
worksfor
worksfor
company
worksfor
worksfor
worksfor
c
90
Rooted graph
  • OEM Object Exchange Model
  • With ID-IDREF, XML is a graph model as well
  • Labeled (rooted) graph (E,r)
  • Set N of edges
  • A finite ternary relation E ? N?N?Label
  • E(s,t,l) there is an edge from s to t labeled l
  • r is a node in the graph

91
Equality revisited
  • 1,2,2,1,5 1,2,5
  • Ignores the order
  • For trees, if we ignore the order of siblings and
    use a set semantics

a
a
b
c
b
b
c
d
d
d
d
d
92
Simulation
  • A simulation ? of (E,r) with (E,r) is a
    relation between the nodes of E and E such that
  • ?(r,r)
  • if ?(s,s) and E(s,t,l) for some l then there
    exists t with ?(t,t) and E(s,t,l)
  • (we simulate a move in E by a move in E)

93
Bisimulation
  • Given ?, E, E,
  • ? is a bisimulation if
  • ? is a simulation of E with E and
  • ?-1 is a simulation of E with E

94
Examples
Not bisimulation
bisimulation
a
a
a
a
a
d
d
d
a
a
a
G G G They
all have the same paths from the root
95
A more complex example of graph bisimulation
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
employee
t1
t2
STRING
projects
96
Graph bisimulation
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
t1
employee
t1
t2
STRING
projects
97
Graph bisimulation
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
t1
employee
t1
t2
STRING
projects
98
Graph bisimulation
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
R
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
employee
t1
t2
STRING
projects
99
Graph bisimulation
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
R
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
100
Graph bisimulation
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
101
Graph bisimulation
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
102
Computing bisimulation in ptime
  • Start with ? N ? N (for N, N the set of
    nodes)
  • While there exists (x,x) in ? that violate the
    definition of simulation, remove (x,x) from ?
  • This computes the maximal bisimulation in ptime
  • (Note this maximal bisimulation exists because ?
    is a bisimulation, and if ?1, ?2 are
    bisimulation, ?1 ? ?2 is also one)

103
What does this have to do with typing?
  • Take a very complex graph E
  • How do you describe it?
  • By a smaller graph T that is a bisimulation of
    E
  • There may be several bisimulation with more and
    more details

104
Rough bisimulation
Root r
employee
company
employee
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
105
More precise one
Root r
employee
Employees p1,p1,p3,P4 p5,p6,p7,p8
company
manages
managedby
worksfor
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
106
Other typing data guide
  • See the graph as an automata with root as the
    start symbol and only accepting states
  • This graph accepts all the paths from the root
  • Obtain an equivalent, minimal, deterministic
    automata
  • This is the data guide for the graph
  • It can be used for describing the data
  • It can be used to support Graphical Query
    Interfaces

107
Data guide
root
  • Gives all the paths from the root
  • Automata minimization

108
What you should remember
  • Tree automata theoretical foundation for XML
  • Bottom-up tree automata are nice
  • Top-down and determinism together ? limitations
  • XML documents do not have to be typed
  • Typing may be very useful for XML
  • In particular for software managing XML data
  • DTD simple but limited
  • XML Schema more expressive but still limited
  • Graph data bisimulation is the answer

109
Merci
110
Bibliography
  • TATA the book, Tree Automata Techniques and
    Applications, tata.gforge.inria.fr/
  • The book on the topic and it is free
  • XML schema, see http//w3.org
  • http//www.w3schools.com/schema/
About PowerShow.com