Loading...

PPT – Slides courtesy of Serge Abiteboul PowerPoint presentation | free to download - id: 81d5b6-MGY4Z

The Adobe Flash plugin is needed to view this content

Typing semistructured data

- Slides courtesy of Serge Abiteboul
- Web Data Management

Organization

- Motivations
- Automata
- Automata on words
- Ranked tree automata
- Unranked tree automata
- Automata and monadic second-order logic
- Automata to compute
- XML typing DTD, XML schema
- Graphs and bisimulation

Motivation

XML typing

- Not compulsory
- Simplify writing software for XML
- Improve interoperability between programs
- Improve storage and performance
- Ease of querying data guide
- Simplify data protection
- Reject illegal update like relational

dependencies

Improve storage

Lower-bound schema

Store rest in overflow graph

Improve performance

select X.title from Bib._ X where X..zip

12345

select X.title from Bib.book X where

X.address.zip 12345

Type checking

- Who checks
- XML editor check that the data conforms to its

type - XML exchange, e.g., with Web service
- Server when delivering the data
- Client/application when receiving it
- Dynamic verification after the data is produced
- Static verification verification of the program

that generates the data

Static verification

- Input input type T and code of function f
- f is Xquery, Xpath, XSLT, etc.
- Verification of T
- Is it true that ?dT, f(d)T ?
- Type inference
- Find the smallest T such that ?dT, f(d)T
- Rapidly undecidable because of joins

Example

- for p in doc("parts.xml)//partcolorred"
- return ltpartgt
- ltnamegtp/name/text()lt/namegt
- ltdescgtp/desc/node()lt/descgt
- lt/partgt
- Result type
- (part (name (string) desc (any) )
- If the type of parts.xml//part/desc is string
- (part (name (string) desc (string) )

Difficulty

- for X in Input, Y in Input do print ( ltb/gt
- Input lta/gt lta/gt
- Result ltb/gt ltb/gt ltb/gt ltb/gt
- Problem bi ? in2 for n 0 cannot be

described in XML schema - There is no best result
- b
- ? b2 b
- ? b2 b4b
- ? b2 b4 b9b

Why tree automata?

- XML unranked trees
- No theory for XML
- Rich theory for strings Automata
- Extend to
- rich theory for ranked trees Tree automata
- Nice algorithms
- Nice theorems
- Can this carry to unranked trees and XML?
- Yes!

From strings to trees

a

a

a

b

b

b

b

b

a

b

b

b

b

b

b

a

b

a

a

a

a

b

a

a

b

b

b

b

Word Binary tree

Unranked tree automata Finite State

Ranked tree automata no bound on number of

children Automata

Why not then use unranked tree automata?

- Missing practical gadgets
- Complexity of verification
- Goal typing at reasonable cost

Automata

- Automata on words

Finite state automata on words

Transitions

Alphabet

State

Initial state

Accepting states

Nondeterministic automaton Example

a b a a b -

a b a -

q0

q0

q0

q0

q0

q0

q0

q0

q0

q0

q2

q1

q1

q1

q1

q1

OK

KO

Reminder

- Deterministic
- No ? transition
- No alternative transitions such as
- Determinization
- It is possible to obtain an equivalent

deterministic automaton - State of new automaton set of states of the

original one - Possible exponential blow-up
- Minimization
- Limitations cannot do
- Context-free languages
- Essential tool e.g., lexical analysis

Reminder (2)

- L(A) set of words accepted by automata A
- Regular languages
- Can be described by regular expressions, e.g.

a(bc)d - Closed under complement
- Closed under union, intersection
- Product automata with states (s,s)
- where s is from A and s is from A

Automata on words versus trees

a

Bottom up

Top down

Left to right

b

b

b

b

a

a

b

b

a

a

Right to left

a

b

No difference

Differences

Automata

- Automata on ranked trees

Binary tree automata

- Parallel evaluation
- For leaves
- For other nodes

q2

a

Bottom up

q1

q

b

b

b

a

b

a

q

q

q

q

a

b

q

q

Bottom-up tree automata

- Bottom-up if a node labeled a has its children

in states q, q then the node moves

nondeterministically to state r or r - Accepts is the root is in some state in F
- Not deterministic if alternatives or

?-transitions

Example deterministic bottom-up

Boolean circuit evaluation

OK

Regular tree language set of trees accepted by

a bottom-up tree automaton

Regular tree languages

- The following are equivalent
- L is a regular tree language
- L is accepted by a nondeterministic bottom-up

automaton - L is accepted by a deterministic bottom-up

automaton - L is accepted by a nondeterministic top-down

automaton - Deterministic top-down is weaker

Top-down tree automata

- Top-down if a node labeled a is in state q,

then its left child moves to state q (right to

q) - Accepts is all leaves are is in states in F
- Not deterministic if

Why deterministic top-down is weaker?

- Consider the language
- L f(a,b), f(b,a)
- It can be accepted by a bottom-up TA
- Exercise write a BUTA A such that L L(A)
- Suppose that B is a deterministic top-down TA

with L L(B) - Exercise Show that B also accepts f(a,a)
- A contradiction
- Fact No deterministic top-down tree automata

accepts L

Ranked trees automata Properties

- Like for words
- Determinization
- Minimization
- Closed under
- Complement
- Intersection
- Union

But

- XML documents are unranked
- book (intro,section,conclusion)

Automata

- Automata on unranked tree

Unranked tree automata

Issue represent an infinite set of

transitions Solution a regular language

Unranked tree automata (2)

- Rule
- Meaning if the states of the children of some

node labeled a form a word in L(Q), this node

moves to some state in r1,,rm

Building on ranked trees

a

a

b

b

a

b

b

b

a

b

b

b

a

b

b

b

a

b

- Ranked tree FirstChild-NextSibling
- F encoding into a ranked tree
- F is a bijection
- F-1 decoding

Building on bottom-up ranked trees (2)

- For each Unranked TA A, there is a Ranked TA

accepting F(L(A)) - For each Ranked TA A, there is an unranked TA

accepting F-1(L(A)) - Both are easy to construct
- Consequence Unranked TA are closed under union,

intersection, complement

Determinization

- Determinization always possible for bottom-up
- Can we use the FirstChild-NextSibling encoding
- No it does not preserve determinism

Top-down?

- This is more delicate
- Transition ?(a,q)A(a,q)
- The state of the automata A(a,q) when reading the

labels of the children of a node labeled a

determines the states of the children of that

node - Accepts if all the leaves are in accepting state

Boolean circuit evaluation

v

A tree is accepted if, for some possible run, the

states of all leaves are final

v

v

v

0

1

0

v

1

v

0

1

1

1

v

1

v

1

1

0

1

Automata

- Automata and
- monadic second-order logic

Monadic second-order logic

- Representation of a tree as a logical structure
- E(1,2), E(1,3) E(3,9)
- S(2,3), S(3,4), S(4,5)S(8,9)
- a(1), a(4), a(8)
- b(2), b(3), b(5), b(6), b(7), b(9)

Monadic second-order logic

- E(1,2), E(1,3) E(3,9)
- S(2,3), S(3,4), S(4,5)S(8,9)
- a(1), a(4), a(8)
- b(2), b(3), b(5), b(6), b(7), b(9)
- MSO syntax

Quantification over a set variable

Set variable

Example of MSO

- Each a node has a b-descendant
- This corresponds to the formula
- For each node x labeled a each set X that

(?)?contains x and that (?) is closed under

descendant, X contains some y labeled b

Bridge

- Theorem for a set L of trees, the following are

equivalent - L L(A) for some bottom-up tree automata A
- i.e. L is definable with bottom-tree automata
- L T T satisfies ? for some MSO formula ?
- i.e. L is definable in MSO

XML typing

- DTDs

DTD

- Describe the children of a node of a label a by a

regular expression - Syntax
- lt!ELEMENT populationdata (continent) gt
- lt!ELEMENT continent (name, country) gt
- lt!ELEMENT country (name, province)gt
- lt!ELEMENT province (name, city) gt
- lt!ELEMENT city (name, pop) gt
- lt!ELEMENT name (PCDATA) gt
- lt!ELEMENT pop (PCDATA) gt

DTD and deterministism

- Regular expressions in DTD should be

deterministic - Complicated definition
- Intuition the corresponding automata should be

deterministic - (ab)a is not
- When reading ltagt, one cannot tell whether it is

an a from (ab) or if it is the a of the end - (ba)(ba) is an equivalent expression that is

deterministic

Very efficient validation

- It suffices to verify for each node a that the

word formed by the labels of its children is

accepted by the finite state automata Aa - Possible to type check the document while

scanning it, e.g. with SAX parser

Very efficient validation (2)

- lt!ELEMENT a ( b c ) gt
- lt!ELEMENT b ( d ) gt

ltagtltbgtltd/gtltd/gtlt/bgtltc/gtlt/agt

a

b

c

d

d

s

t

u

Aa

s

t

b

c

s

t

u

Accept

d

s

t

Ab

d

Warning

- The previous example can be checked with a simple

automata on words - But not the following one
- lt!ELEMENT part ( part ) gt
- The stack is needed for accepting
- ltagtltagtlt/agtlt/agt
- n ltagt n lt/agt

Some bad news for DTD

- Not closed under union
- DTD1
- lt!ELEMENT used( ad) gt
- lt!ELEMENT ad ( year, brand )gt
- DTD2
- lt!ELEMENT new( ad) gt
- lt!ELEMENT ad ( brand )gt
- L(DTD1) ? L(DTD2) cannot be described by a DTD

but can be described easily by a tree automata - Problem with the type of ad that depends of its

parent - Also not closed under complement
- Limited expressive power

Car example continued

- The best DTD we can choose does not distinguish

between ads for used and new cars - lt!ELEMENT ad (year?, brand) gt

Car

Used

New

Brand

Year

Brand

Renault

2008

BMW

Decoupled types in XML schema

- Each type corresponds to a label, not conversely
- car car ( used new )
- used used (ad1)
- new new (ad2)
- ad1 ad (year, brand)
- ad2 ad (brand)
- The tags are in green type names in blue
- Nice closure properties
- Many other gadgets in XML schemas

XML typing

- XML Schemas

XML Schema

- Often criticized unnecessarily complicated
- Boosted by Web services
- Richer than DTD decoupled types
- Deterministic top-down tree automata (close to)
- XML schemas are extensible
- Many other useful functionalities
- Namespaces
- Atomic types
- Integrity constraints, etc.

An XML schema is an XML document

- Since it is an XML syntax, it can use XML tools
- Editor
- Type checker
- Etc.
- The type of all XML schemas can be described with

an XML schema

lt?xml version"1.0" encoding"utf-8"?gt

ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch

ema" targetnamespace"http//www.net-

language.com"gt ltxselement name"book"gt

ltxscomplexTypegt ltxssequencegt

ltxselement name"title" type"xsstring"/gt

ltxselement name"author"

type"xsstring"/gt ltxselement

name"character"

minOccurs"0" maxOccurs"unbounded"gt

ltxscomplexTypegt ltxssequencegt

ltxselement name"name"

type"xsstring"/gt ltxselement

name"friend-of" type"xsstring"

minOccurs"0" maxOccurs"unbounded"/gt

ltxselement name"since"

type"xsdate"/gt ltxselement

name"qualification" type"xsstring"/gt

lt/xssequencegt

lt/xscomplexTypegt lt/xselementgt

lt/xssequencegt ltxsattribute

name"isbn" type"xsstring"/gt

lt/xscomplexTypegt lt/xselementgt

lt/xsschemagt

Simple elements and atomic types

- Definition ltxselement name"xxx"

type"yyy"/gt - with common types
- xsstring xsdecimal xsinteger xsboolean

xsdate xstime - Examples
- ltxselement name"lastname" type"xsstring"/gt
- ltxselement name"age" type"xsinteger"/gt
- ltxselement name"dateborn" type"xsdate"/gt
- Instances of such elements
- ltlastnamegtRefsneslt/lastnamegt
- ltagegt34lt/agegt
- ltdateborngt1968-03-27lt/dateborngt

Attributs

- Definition ltxsattribute name"xxx"

type"yyy"/gt - Example
- ltxsattribute name"lang" type"xsstring"/gt
- Instance of such attribute
- ltlastname lang"EN"gtSmithlt/lastnamegt

Complex elements

- Empty element
- ltproduct pid"1345"/gt
- Contains only other elements
- ltemployeegt ltfirstnamegtJohnlt/firstnamegt

ltlastnamegtSmithlt/lastnamegt lt/employeegt - Contains only text
- ltfood type"dessert"gtIce creamlt/foodgt
- Contains both elements and text
- ltdescriptiongt It happened on ltdate

lang"norwegian"gt 03.03.99lt/dategt ....

lt/descriptiongt

Restriction of simple elements

- ltxselement name"age"gt
- ltxssimpleTypegt
- ltxsrestriction base"xsinteger"gt

ltxsminInclusive value"0"/gt - ltxsmaxInclusive value"100"/gt
- lt/xsrestrictiongt
- lt/xssimpleTypegt
- lt/xselementgt
- Other restrictions enumerated types, patterns,

etc.

Restriction on complex elements

- ltxselement name"person"gt
- ltxscomplexTypegt
- ltxssequencegt
- ltxselement name"firstname" type"xsstring"/gt

ltxselement name"lastname" type"xsstring"/gt

lt/xssequencegt - lt/xscomplexTypegt
- lt/xselementgt

Possible to name a type

- ltxselement name"employee"gt
- ltxscomplexTypegt ltxssequencegt ltxselement

name"firstname" type"xsstring"/gt ltxselement

name"lastname" type"xsstring"/gt lt/xssequencegt

- lt/xscomplexTypegt
- lt/xselementgt
- Only the "employee" element can use the specified

complex type - (ltsequencegt indicates an order on child

elements)

- Alternative
- ltxselement name"employee" type"personinfo" /gt
- ltxscomplexType name"personinfo"gt
- ltxssequencegt ltxselement name"firstname"

type"xsstring"/gt ltxselement name"lastname"

type"xsstring"/gt lt/xssequencegt - lt/xscomplexTypegt

Other gadgets

- Import of types associated to a namespace
- ltimport nameSpace "http// ..."
- schemaLocation "http// ..."

/gt - Possible to include an existing schema
- ltinclude schemaLocation"http// ..."/gt
- Possible to extend/redefine an existing schema
- ltredefine schemaLocation"http// ..."/gt
- .... Extensions ...
- lt/redefinegt

Example a DTD

- lt!ELEMENT EMAIL (TO, FROM, CC, BCC, SUBJECT?,

BODY?)gt - lt!ATTLIST EMAIL
- LANGUAGE (WesternGreekLatinUniversal)

"Western" - ENCRYPTED CDATA IMPLIED
- PRIORITY (NORMALLOWHIGH) "NORMAL"gt
- lt!ELEMENT TO (PCDATA)gt
- lt!ELEMENT FROM (PCDATA)gt
- lt!ELEMENT CC (PCDATA)gt
- lt!ELEMENT BCC (PCDATA)gt
- lt!ATTLIST BCC
- HIDDEN CDATA FIXED "TRUE"gt
- lt!ELEMENT SUBJECT (PCDATA)gt
- lt!ELEMENT BODY (PCDATA)gt
- lt!ENTITY SIGNATURE "Bill"gt

The same in a variant of XML schema (more verbose)

- lt?xml version"1.0" ?gt
- ltSchema name"email" xmlns"urnschemas-microsoft

-comxml-data" - xmlnsdt"urnschemas-micros

oft-comdatatypes"gt - ltAttributeType name"language"
- dttype"enumeration"

dtvalues"Western Greek Latin Universal" /gt - ltAttributeType name"encrypted" /gt
- ltAttributeType name"priority"

dttype"enumeration" dtvalues"NORMAL LOW HIGH"

/gt - ltAttributeType name"hidden" default"true" /gt
- ltElementType name"to" content"textOnly" /gt
- ltElementType name"from" content"textOnly" /gt
- ltElementType name"cc" content"textOnly" /gt
- ltElementType name"bcc" content"mixed"gt
- ltattribute type"hidden" required"yes" /gt
- lt/ElementTypegt
- ltElementType name"subject" content"textOnly"

/gt - ltElementType name"body" content"textOnly" /gt
- ltElementType name"email" content"eltOnly"gt
- ltattribute type"language" default"Western" /gt

- ltattribute type"encrypted" /gt

Where to place XML schemas

Tree automata

Deterministic

. top-down tree automata

- Some bizarre restriction
- Inside an element, no two types with the same tag
- Closer to DTDs than to tree automata
- Efficient type validation

XML schema

DTD

Exercise coupled vs decoupled

- Write a realistic DTD1 for new cars
- With make, model, engine
- Write a realistic DTD2 for used cars
- Also year, miles, zipcode
- Write an XML schema for L(DTD1) ? L(DTD2)
- Using decoupled schema

Automata

- Automata to compute

Another use of automata XPATH x in //a/b

b

(0)

a

a

a

b

a

b

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

a

a

b

a

b

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

(01)

a

a

b

a

b

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

(01)

a

a

b

(02)

a

b

x

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

(01)

a

a

b

a

b

x

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

a

a

b

a

b

x

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

(01)

a

a

b

a

b

x

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

a

a

b

a

b

x

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

(02)

a

a

b

x

a

b

x

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

(02)

a

a

b

x

(01)

a

b

x

x

x

b

NFA

DFA

Example //a/b

b

(0)

a

(01)

(02)

a

a

b

x

(01)

(02)

a

b

x

x

x

b

x

NFA

DFA

Example //a/b

b

(0)

a

(01)

(02)

a

a

b

x

(01)

a

b

x

x

x

b

x

NFA

DFA

Example //a/b

b

(0)

a

(01)

(02)

a

a

b

x

a

b

x

x

x

b

x

NFA

DFA

Example //a/b

b

(0)

a

(01)

a

a

b

x

a

b

x

x

x

b

x

NFA

DFA

Example //a/b

b

(0)

a

a

a

b

x

a

b

x

x

x

b

x

NFA

DFA

Determinization exponential blow up

//a///b

Proposal k-pebble transducers

stack

milo,suciu,vianu

k-pebble transducers result

Capture a core aspect of Xquery but not the data

management part

Graphs and bisimulation

Graph

- Graph semistructured data
- Graph simulation
- Graph bisimulation
- Data guides

Semistructured data Labeled graph

- Possibly a root in red

r

employee

employee

employee

employee

employee

employee

employee

employee

manages

manages

manages

manages

manages

p8

p1

p2

p3

p4

p5

p6

p7

managedby

managedby

managedby

managedby

managedby

worksfor

worksfor

worksfor

worksfor

worksfor

company

worksfor

worksfor

worksfor

c

Rooted graph

- OEM Object Exchange Model
- With ID-IDREF, XML is a graph model as well
- Labeled (rooted) graph (E,r)
- Set N of edges
- A finite ternary relation E ? N?N?Label
- E(s,t,l) there is an edge from s to t labeled l
- r is a node in the graph

Equality revisited

- 1,2,2,1,5 1,2,5
- Ignores the order
- For trees, if we ignore the order of siblings and

use a set semantics

a

a

b

c

b

b

c

d

d

d

d

d

Simulation

- A simulation ? of (E,r) with (E,r) is a

relation between the nodes of E and E such that - ?(r,r)
- if ?(s,s) and E(s,t,l) for some l then there

exists t with ?(t,t) and E(s,t,l) - (we simulate a move in E by a move in E)

Bisimulation

- Given ?, E, E,
- ? is a bisimulation if
- ? is a simulation of E with E and
- ?-1 is a simulation of E with E

Examples

Not bisimulation

bisimulation

a

a

a

a

a

d

d

d

a

a

a

G G G They

all have the same paths from the root

A more complex example of graph bisimulation

root

programmer

statistician

c1

c2

employee

employee

employee

project

e2

e3

e4

e1

workson

workson

leads

workson

workson

workson

consults

consults

workson

leads

leads

workson

leads

R

p3

p4

p5

p6

p9

p1

p2

p7

p8

"exercise"

"lecture"

"finance"

"adminstr."

"PR"

"undergrad"

"grad"

"postgrad"

"web"

programmer statistician

_

employee

t1

t2

STRING

projects

Graph bisimulation

root

programmer

statistician

c1

c2

employee

employee

employee

project

e2

e3

e4

e1

workson

workson

leads

workson

workson

workson

consults

consults

workson

leads

leads

workson

leads

R

p3

p4

p5

p6

p9

p1

p2

p7

p8

"exercise"

"lecture"

"finance"

"adminstr."

"PR"

"undergrad"

"grad"

"postgrad"

"web"

programmer statistician

_

t1

employee

t1

t2

STRING

projects

Graph bisimulation

root

programmer

statistician

c1

c2

employee

employee

employee

project

e2

e3

e4

e1

workson

workson

leads

workson

workson

workson

consults

consults

workson

leads

leads

workson

leads

R

p3

p4

p5

p6

p9

p1

p2

p7

p8

"exercise"

"lecture"

"finance"

"adminstr."

"PR"

"undergrad"

"grad"

"postgrad"

"web"

programmer statistician

_

t1

employee

t1

t2

STRING

projects

Graph bisimulation

root

programmer

statistician

c1

c2

employee

employee

employee

project

e2

e3

e4

e1

workson

workson

leads

workson

workson

workson

consults

consults

workson

leads

leads

workson

leads

p3

p4

p5

p6

p9

p1

p2

p7

p8

R

"exercise"

"lecture"

"finance"

"adminstr."

"PR"

"undergrad"

"grad"

"postgrad"

"web"

programmer statistician

_

employee

t1

t2

STRING

projects

Graph bisimulation

root

programmer

statistician

c1

c2

employee

employee

employee

project

e2

e3

e4

e1

workson

workson

leads

workson

workson

workson

consults

consults

R

workson

leads

leads

workson

leads

p3

p4

p5

p6

p9

p1

p2

p7

p8

"exercise"

"lecture"

"finance"

"adminstr."

"PR"

"undergrad"

"grad"

"postgrad"

"web"

programmer statistician

R

_

employee

t1

t2

STRING

projects

Graph bisimulation

root

programmer

statistician

c1

c2

employee

employee

employee

project

e2

e3

e4

e1

workson

workson

leads

workson

workson

workson

consults

consults

workson

leads

leads

workson

leads

p3

p4

p5

p6

p9

p1

p2

p7

p8

"exercise"

"lecture"

"finance"

"adminstr."

"PR"

"undergrad"

"grad"

"postgrad"

"web"

programmer statistician

R

_

employee

t1

t2

STRING

projects

Graph bisimulation

root

programmer

statistician

c1

c2

employee

employee

employee

project

e2

e3

e4

e1

workson

workson

leads

workson

workson

workson

consults

consults

workson

leads

leads

workson

leads

R

p3

p4

p5

p6

p9

p1

p2

p7

p8

"exercise"

"lecture"

"finance"

"adminstr."

"PR"

"undergrad"

"grad"

"postgrad"

"web"

programmer statistician

R

_

employee

t1

t2

STRING

projects

Computing bisimulation in ptime

- Start with ? N ? N (for N, N the set of

nodes) - While there exists (x,x) in ? that violate the

definition of simulation, remove (x,x) from ? - This computes the maximal bisimulation in ptime
- (Note this maximal bisimulation exists because ?

is a bisimulation, and if ?1, ?2 are

bisimulation, ?1 ? ?2 is also one)

What does this have to do with typing?

- Take a very complex graph E
- How do you describe it?
- By a smaller graph T that is a bisimulation of

E - There may be several bisimulation with more and

more details

Rough bisimulation

Root r

employee

company

employee

Bosses p1,p4,p6

Regulars p2,p3,p5,p7,p8

manages

managedby

worksfor

Company c

worksfor

More precise one

Root r

employee

Employees p1,p1,p3,P4 p5,p6,p7,p8

company

manages

managedby

worksfor

Bosses p1,p4,p6

Regulars p2,p3,p5,p7,p8

manages

managedby

worksfor

Company c

worksfor

Other typing data guide

- See the graph as an automata with root as the

start symbol and only accepting states - This graph accepts all the paths from the root
- Obtain an equivalent, minimal, deterministic

automata - This is the data guide for the graph
- It can be used for describing the data
- It can be used to support Graphical Query

Interfaces

Data guide

root

- Gives all the paths from the root
- Automata minimization

What you should remember

- Tree automata theoretical foundation for XML
- Bottom-up tree automata are nice
- Top-down and determinism together ? limitations
- XML documents do not have to be typed
- Typing may be very useful for XML
- In particular for software managing XML data
- DTD simple but limited
- XML Schema more expressive but still limited
- Graph data bisimulation is the answer

Merci

Bibliography

- TATA the book, Tree Automata Techniques and

Applications, tata.gforge.inria.fr/ - The book on the topic and it is free
- XML schema, see http//w3.org
- http//www.w3schools.com/schema/