Title: Slicing XML Documents
1A Program Slicing Based Method to Filter XML/DTD
documents
Josep F. Silva Galiana
2Contents
- Motivation
- Program Slicing
- XML
- DTD
- XSLT
- Slicing XML Documents
- Example
- Implementation
- Conclusions Future Work
Program Slicing
3Program Slicing
- Definition Program transformation to extract
the program statements that (potentially) affect
the values computed at some point of interest. - Origin Originally introduced by Weiser.
- Example
(1) read(n) (2) i1(3)
sum0(4) product1(5) while (iltn)
do begin(6) sumsumi(7)
productproducti(8) ii1
end(9) write(sum)(10) write(product)
Slicing Criterion (10, product)
4Program Slicing
- Definition Program transformation to extract
the program statements that (potentially) affect
the values computed at some point of interest. - Origin Originally introduced by Weiser.
- Example
(1) read(n) (2) i1(3)
sum0(4) product1(5) while (iltn)
do begin(6) sumsumi(7)
productproducti(8) ii1
end(9) write(sum)(10) write(product)
Slicing Criterion (10, product)
5Program Slicing
- Applications
- Debugging
- Code understanding
- Specialization
- etc.
- All the applications are based on the Program
Dependence Graphs (PDGs) (structure and behaviour
of programs)
What would happen if Program Slicing was applied
to a data structure? Would it be interesting?
6Contents
- Motivation
- Program Slicing
- XML
- DTD
- XSLT
- Slicing XML Documents
- Example
- Implementation
- Conclusions Future Work
XML
7XML
XML (eXtensible Markup Language)
- Origin XML was developed by an XML Working
Group formed under the auspices of the World Wide
Web Consortium (W3C) in 1996. - Structure Documents are trees composed by
ELEMENTS which contain attributes.
Example of XML document
8XML
DTD (Document Type Definition)
- Objective The purpose of a DTD is to define the
legal building blocks of an XML document. It
defines the document structure with a list of
legal elements. - Structure Documents are graphs composed by
ELEMENTS.
Example of DTD document
9DTD (Document Type Definition)
XML (eXtensible Markup Language)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name,
Surname)gt lt!ELEMENT Status ANYgt lt!ELEMENT Name
ANYgt lt!ELEMENT Surname ANYgt lt!ELEMENT Teaching
(Subject)gt lt!ELEMENT Subject (Name,
Sched, Course)gt lt!ELEMENT
Sched ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT
Research (Project)gt lt!ELEMENT Project
ANYgt lt!ATTLIST Project name CDATA
REQUIRED year CDATA REQUIRED budget CDATA
IMPLIED gt
10XML
XSLT (eXtensible Stylesheet Language
Transformations)
- Objective XSLT is a language for transforming
XML. - Structure An XSLT stylesheet specifies the
presentation of a class of XML documents by
describing how an instance of the class is
transformed into an XML document that uses a
formatting vocabulary, such as (X)HTML or XSL-FO - XSLT is a programming language
Example of XSLT document (Source Code)
Example of XSLT document (Result)
11Contents
- Motivation
- Program Slicing
- XML
- DTD
- XSLT
- Slicing XML Documents
- Example
- Implementation
- Conclusions Future Work
Slicing XML Documents
12Slicing XML Documents
- We see XML documents and DTDs as trees.
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
13Slicing XML Documents
- The Slicing Criterion is composed by a set of
nodes in the tree. - For each node in the slicing criterion, we
extract from the tree all those nodes that are in
the path from the root to the node.
Web Page (Original)
Web Page (Slice)
XML / DTD Forward / Backward
14Slicing XML Documents
- DTD backward slicing criterion.
Web Page (Original)
Web Page (Slice)
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
15Slicing XML Documents
- XML backward slicing criterion.
Web Page (Original)
Web Page (Slice)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
16Slicing XML Documents
- XML backward slicing criterion.
Web Page (Original)
Web Page (Slice)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
17Slicing XML Documents
- We distinguish between DTD and XML slicing
criterions. - XML slicing criterions are more fine-grained
than DTD slicing criterions - We distinguish between forward and backward
slices (or a combination).
Web Page (Original)
Web Page (Slice)
XML / DTD Forward / Backward
18Slicing XML Documents
- DTD backward slicing criterion.
Web Page (Original)
Web Page (Slice)
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
19Slicing XML Documents
- XML forward slicing criterion.
Web Page (Original)
Web Page (Slice)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
20Slicing XML Documents
Web Page (Original)
Web Page (Slice)
- XML backward-forward slicing criterion.
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
21Slicing XML Documents
- What happens with DTDs? Slices are well-formed,
but are they valid? - For each XML slice we produce a DTD slice and
viceversa - We guarantee that XML slices are valid with
respect to DTD slices. -
DTD document
DTD Slice document
Slicer
XML document
XML Slice document
Slicing Criterion
22Slicing XML Documents
- A simple slicing algorithm
-
23Slicing XML Documents
- In the case of a DTD criterion composed by a set
of positions C p1pn ? Pos(D), the algorithm
would be the same, except that the first loop
would be - For each v1.v2.().vn ? C do
- V V ? v1, v1.v2, , v1.v2.().vn
- W W ? v1i.v2j.().vnk
- Where v1.v2.().vn ? v and v1i.v2j.().vnk ?
X - Both algorithms produce valid XML and DTD slices
with respect to the slicing criterion
24Slicing XML Documents
The following theorem states the correctness of
the technique Theorem Let D be a well-formed
DTD and X a well-formed XML document valid with
respect to D. Given a slice D of D and a slice
X of X computed with an XML slicing criterion C,
and given a slice D of D and a slice X of X
computed with a DTD slicing criterion C,
then a) D is well-formed and X is valid with
respect to D b) D is well-formed and X is
valid with respect to D If all the elements in
C are of one of the types in C, then c) D
D d) X is a subtree of X
25Contents
- Motivation
- Program Slicing
- XML
- DTD
- XSLT
- Slicing XML Documents
- Example
- Implementation
- Conclusions Future Work
Implementation
26Implementation
We have implemented a prototype in Haskell.
Haskell provides us a formal basis with many
advantages for the manipulation of XML
documents. - The HaXml library. It allows us
to automatically translate XML or HTML documents
into a Haskell representation. In particular, we
use the following data structures that can
represent any XML/HTML document data Element
Elem Name Attribute Content data Attribute
(Name, Value) data Content CElem Element
CText String
27Implementation
From XML slices to Webpage slices
28Implementation
XSLT Implementation Guidelines XSLT documents
must generate the information and the
presentation elements under the same conditions
(i.e., the former is generated if and only if the
later is generated). Both the XML data and the
presentation labels are generated together. This
does not imposes any restriction on the power of
XSLT, since the same webpages can be generated.
On the contrary, this way of programming forces
the programmer to build transformations that
can be easily reused and maintained, because both
the information and presentation data depending
on the same condition are put together.
29Implementation
XSLT Implementation Guidelines
30Implementation
The implementation, some examples and other
material is publicly available at
www.dsic.upv.es/jsilva/xml
31Contents
- Motivation
- Program Slicing
- XML
- DTD
- XSLT
- Slicing XML Documents
- Example
- Implementation
- Conclusions Future Work
Conclusions Future Work
32Conclusions
- We proposed the application of program slicing
techniques to XML data structures - We defined an algorithm to slice XML and DTD
documents - XML and DTD slices that are well-formed and valid
- Previous slicers can be used with a modest
implementation effort - Slicing Web Pages
- The slicer can use XSLT in order to slice
webpages - We proposed some guidelines to generate XSLT
files - Future Work
- Migration to XML Schema
- New implementation based on XQuery