Slicing XML Documents - PowerPoint PPT Presentation

About This Presentation
Title:

Slicing XML Documents

Description:

statements that (potentially) affect the values computed at some point of interest. ... an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 33
Provided by: josepfranc
Category:
Tags: xml | documents | slicing | xsl

less

Transcript and Presenter's Notes

Title: Slicing XML Documents


1
A Program Slicing Based Method to Filter XML/DTD
documents
Josep F. Silva Galiana
2
Contents
  • Motivation
  • Program Slicing
  • XML
  • DTD
  • XSLT
  • Slicing XML Documents
  • Example
  • Implementation
  • Conclusions Future Work

Program Slicing
3
Program Slicing
  • Definition Program transformation to extract
    the program statements that (potentially) affect
    the values computed at some point of interest.
  • Origin Originally introduced by Weiser.
  • Example

(1) read(n) (2) i1(3)
sum0(4) product1(5) while (iltn)
do begin(6) sumsumi(7)
productproducti(8) ii1
end(9) write(sum)(10) write(product)
Slicing Criterion (10, product)
4
Program Slicing
  • Definition Program transformation to extract
    the program statements that (potentially) affect
    the values computed at some point of interest.
  • Origin Originally introduced by Weiser.
  • Example

(1) read(n) (2) i1(3)
sum0(4) product1(5) while (iltn)
do begin(6) sumsumi(7)
productproducti(8) ii1
end(9) write(sum)(10) write(product)
Slicing Criterion (10, product)
5
Program Slicing
  • Applications
  • Debugging
  • Code understanding
  • Specialization
  • etc.
  • All the applications are based on the Program
    Dependence Graphs (PDGs) (structure and behaviour
    of programs)

What would happen if Program Slicing was applied
to a data structure? Would it be interesting?
6
Contents
  • Motivation
  • Program Slicing
  • XML
  • DTD
  • XSLT
  • Slicing XML Documents
  • Example
  • Implementation
  • Conclusions Future Work

XML
7
XML
XML (eXtensible Markup Language)
  • Origin XML was developed by an XML Working
    Group formed under the auspices of the World Wide
    Web Consortium (W3C) in 1996.
  • Structure Documents are trees composed by
    ELEMENTS which contain attributes.

Example of XML document
8
XML
DTD (Document Type Definition)
  • Objective The purpose of a DTD is to define the
    legal building blocks of an XML document. It
    defines the document structure with a list of
    legal elements.
  • Structure Documents are graphs composed by
    ELEMENTS.

Example of DTD document
9
DTD (Document Type Definition)
XML (eXtensible Markup Language)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name,
Surname)gt lt!ELEMENT Status ANYgt lt!ELEMENT Name
ANYgt lt!ELEMENT Surname ANYgt lt!ELEMENT Teaching
(Subject)gt lt!ELEMENT Subject (Name,
Sched, Course)gt lt!ELEMENT
Sched ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT
Research (Project)gt lt!ELEMENT Project
ANYgt lt!ATTLIST Project name CDATA
REQUIRED year CDATA REQUIRED budget CDATA
IMPLIED gt
10
XML
XSLT (eXtensible Stylesheet Language
Transformations)
  • Objective XSLT is a language for transforming
    XML.
  • Structure An XSLT stylesheet specifies the
    presentation of a class of XML documents by
    describing how an instance of the class is
    transformed into an XML document that uses a
    formatting vocabulary, such as (X)HTML or XSL-FO
  • XSLT is a programming language

Example of XSLT document (Source Code)
Example of XSLT document (Result)
11
Contents
  • Motivation
  • Program Slicing
  • XML
  • DTD
  • XSLT
  • Slicing XML Documents
  • Example
  • Implementation
  • Conclusions Future Work

Slicing XML Documents
12
Slicing XML Documents
  • We see XML documents and DTDs as trees.

ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
13
Slicing XML Documents
  • The Slicing Criterion is composed by a set of
    nodes in the tree.
  • For each node in the slicing criterion, we
    extract from the tree all those nodes that are in
    the path from the root to the node.

Web Page (Original)
Web Page (Slice)
XML / DTD Forward / Backward
14
Slicing XML Documents
  • DTD backward slicing criterion.

Web Page (Original)
Web Page (Slice)
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
15
Slicing XML Documents
  • XML backward slicing criterion.

Web Page (Original)
Web Page (Slice)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
16
Slicing XML Documents
  • XML backward slicing criterion.

Web Page (Original)
Web Page (Slice)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
17
Slicing XML Documents
  • We distinguish between DTD and XML slicing
    criterions.
  • XML slicing criterions are more fine-grained
    than DTD slicing criterions
  • We distinguish between forward and backward
    slices (or a combination).

Web Page (Original)
Web Page (Slice)
XML / DTD Forward / Backward
18
Slicing XML Documents
  • DTD backward slicing criterion.

Web Page (Original)
Web Page (Slice)
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
lt!ELEMENT PersonalInfo (Contact,
Teaching, Research)gt lt!ELEMENT
Contact (Status, Name, Surname)gt lt!ELEMENT
Status ANYgt lt!ELEMENT Name ANYgt lt!ELEMENT Surname
ANYgt lt!ELEMENT Teaching (Subject)gt lt!ELEMENT
Subject (Name,
Sched, Course)gt lt!ELEMENT Sched
ANYgt lt!ELEMENT Course ANYgt lt!ELEMENT Research
(Project)gt lt!ELEMENT Project ANYgt lt!ATTLIST
Project name CDATA REQUIRED year CDATA
REQUIRED budget CDATA IMPLIED gt
19
Slicing XML Documents
  • XML forward slicing criterion.

Web Page (Original)
Web Page (Slice)
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
20
Slicing XML Documents
Web Page (Original)
Web Page (Slice)
  • XML backward-forward slicing criterion.

ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
ltPersonalInfogt ltContactgt ltStatusgt Professor
lt/Statusgt ltNamegt Ryan lt/Namegt ltSurnamegt
Gibson ltSurnamegt lt/Contactgt ltTeachinggt ltSubje
ctgt ltNamegt Logic lt/Namegt ltSchedgt Mon/Wed
16-18 lt/Schedgt ltCoursegt 4-Mathematics
lt/Coursegt lt/Subjectgt ltSubjectgt ltNamegt
Algebra lt/Namegt ltSchedgt Mon/Tur 11-13
lt/Schedgt ltCoursegt 3-Mathematics
lt/Coursegt lt/Subjectgt lt/Teachinggt ltResearc
hgt ltProject name SysLog year
2003-2004 budget 16000
/gt ... lt/Researchgt lt/PersonalInfogt
21
Slicing XML Documents
  • What happens with DTDs? Slices are well-formed,
    but are they valid?
  • For each XML slice we produce a DTD slice and
    viceversa
  • We guarantee that XML slices are valid with
    respect to DTD slices.

DTD document
DTD Slice document
Slicer
XML document
XML Slice document
Slicing Criterion
22
Slicing XML Documents
  • A simple slicing algorithm

23
Slicing XML Documents
  • In the case of a DTD criterion composed by a set
    of positions C p1pn ? Pos(D), the algorithm
    would be the same, except that the first loop
    would be
  • For each v1.v2.().vn ? C do
  • V V ? v1, v1.v2, , v1.v2.().vn
  • W W ? v1i.v2j.().vnk
  • Where v1.v2.().vn ? v and v1i.v2j.().vnk ?
    X
  • Both algorithms produce valid XML and DTD slices
    with respect to the slicing criterion

24
Slicing XML Documents
The following theorem states the correctness of
the technique Theorem Let D be a well-formed
DTD and X a well-formed XML document valid with
respect to D. Given a slice D of D and a slice
X of X computed with an XML slicing criterion C,
and given a slice D of D and a slice X of X
computed with a DTD slicing criterion C,
then a) D is well-formed and X is valid with
respect to D b) D is well-formed and X is
valid with respect to D If all the elements in
C are of one of the types in C, then c) D
D d) X is a subtree of X
25
Contents
  • Motivation
  • Program Slicing
  • XML
  • DTD
  • XSLT
  • Slicing XML Documents
  • Example
  • Implementation
  • Conclusions Future Work

Implementation
26
Implementation
We have implemented a prototype in Haskell.
Haskell provides us a formal basis with many
advantages for the manipulation of XML
documents. - The HaXml library. It allows us
to automatically translate XML or HTML documents
into a Haskell representation. In particular, we
use the following data structures that can
represent any XML/HTML document data Element
Elem Name Attribute Content data Attribute
(Name, Value) data Content CElem Element
CText String
27
Implementation
From XML slices to Webpage slices
28
Implementation
XSLT Implementation Guidelines XSLT documents
must generate the information and the
presentation elements under the same conditions
(i.e., the former is generated if and only if the
later is generated). Both the XML data and the
presentation labels are generated together. This
does not imposes any restriction on the power of
XSLT, since the same webpages can be generated.
On the contrary, this way of programming forces
the programmer to build transformations that
can be easily reused and maintained, because both
the information and presentation data depending
on the same condition are put together.
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation, some examples and other
material is publicly available at
www.dsic.upv.es/jsilva/xml
31
Contents
  • Motivation
  • Program Slicing
  • XML
  • DTD
  • XSLT
  • Slicing XML Documents
  • Example
  • Implementation
  • Conclusions Future Work

Conclusions Future Work
32
Conclusions
  • We proposed the application of program slicing
    techniques to XML data structures
  • We defined an algorithm to slice XML and DTD
    documents
  • XML and DTD slices that are well-formed and valid
  • Previous slicers can be used with a modest
    implementation effort
  • Slicing Web Pages
  • The slicer can use XSLT in order to slice
    webpages
  • We proposed some guidelines to generate XSLT
    files
  • Future Work
  • Migration to XML Schema
  • New implementation based on XQuery
Write a Comment
User Comments (0)
About PowerShow.com