Title: Chapter 1 The Semantic Web Vision
1Chapter 1The Semantic Web Vision
- Grigoris Antoniou
- Frank van Harmelen
2Lecture Outline
- Todays Web
- The Semantic Web Impact
- Semantic Web Technologies
- A Layered Approach
3Todays Web
- Most of todays Web content is suitable for human
consumption - Even Web content that is generated automatically
from databases is usually presented without the
original structural information found in
databases - Typical Web uses today peoples
- seeking and making use of information, searching
for and getting in touch with other people,
reviewing catalogs of online stores and ordering
products by filling out forms
4Keyword-Based Search Engines
- Current Web activities are not particularly well
supported by software tools - Except for keyword-based search engines (e.g.
Google, AltaVista, Yahoo) - The Web would not have been the huge success it
was, were it not for search engines
5Problems of Keyword-Based Search Engines
- High recall, low precision.
- Low or no recall
- Results are highly sensitive to vocabulary
- Results are single Web pages
- Human involvement is necessary to interpret and
combine results - Results of Web searches are not readily
accessible by other software tools
6The Key Problem of Todays Web
- The meaning of Web content is not
machine-accessible lack of semantics - It is simply difficult to distinguish the meaning
between these two sentences - I am a professor of computer science.
- I am a professor of computer science,
- you may think. Well, . . .
7The Semantic Web Approach
- Represent Web content in a form that is more
easily machine-processable. - Use intelligent techniques to take advantage of
these representations. - The Semantic Web will gradually evolve out of the
existing Web, it is not a competition to the
current WWW
8Lecture Outline
- Todays Web
- The Semantic Web Impact
- Semantic Web Technologies
- A Layered Approach
9The Semantic Web Impact Knowledge Management
- Knowledge management concerns itself with
acquiring, accessing, and maintaining knowledge
within an organization - Key activity of large businesses internal
knowledge as an intellectual asset - It is particularly important for international,
geographically dispersed organizations - Most information is currently available in a
weakly structured form (e.g. text, audio, video)
10Limitations of Current Knowledge Management
Technologies
- Searching information
- Keyword-based search engines
- Extracting information
- human involvement necessary for browsing,
retrieving, interpreting, combining - Maintaining information
- inconsistencies in terminology, outdated
information. - Viewing information
- Impossible to define views on Web knowledge
-
11Semantic Web Enabled Knowledge Management
- Knowledge will be organized in conceptual spaces
according to its meaning. - Automated tools for maintenance and knowledge
discovery - Semantic query answering
- Query answering over several documents
- Defining who may view certain parts of
information (even parts of documents) will be
possible.
12The Semantic Web Impact B2C Electronic
Commmerce
- A typical scenario user visits one or several
online shops, browses their offers, selects and
orders products. - Ideally humans would visit all, or all major
online stores but too time consuming - Shopbots are a useful tool
13Limitations of Shopbots
- They rely on wrappers extensive programming
required - Wrappers need to be reprogrammed when an online
store changes its outfit - Wrappers extract information based on textual
analysis - Error-prone
- Limited information extracted
14Semantic Web Enabled B2C Electronic Commerce
- Software agents that can interpret the product
information and the terms of service. - Pricing and product information, delivery and
privacy policies will be interpreted and compared
to the user requirements. - Information about the reputation of shops
- Sophisticated shopping agents will be able to
conduct automated negotiations
15The Semantic Web Impact B2B Electronic Commerce
- Greatest economic promise
- Currently relies mostly on EDI
- Isolated technology, understood only by experts
- Difficult to program and maintain, error-prone
- Each B2B communication requires separate
programming - Web appears to be perfect infrastructure
- But B2B not well supported by Web standards
16Semantic Web Enabled B2B Electronic Commerce
- Businesses enter partnerships without much
overhead - Differences in terminology will be resolved using
standard abstract domain models - Data will be interchanged using translation
services. - Auctioning, negotiations, and drafting contracts
will be carried out automatically (or
semi-automatically) by software agents
17Lecture Outline
- Todays Web
- The Semantic Web Impact
- Semantic Web Technologies
- A Layered Approach
18Semantic Web Technologies
- Explicit Metadata
- Ontologies
- Logic and Inference
- Agents
19On HTML
- Web content is currently formatted for human
readers rather than programs - HTML is the predominant language in which Web
pages are written (directly or using tools) - Vocabulary describes presentation
20An HTML Example
- lth1gtAgilitas Physiotherapy Centrelt/h1gt
- Welcome to the home page of the Agilitas
Physiotherapy Centre. Do - you feel pain? Have you had an injury? Let our
staff Lisa Davenport, - Kelly Townsend (our lovely secretary) and Steve
Matthews take care - of your body and soul.
- lth2gtConsultation hourslt/h2gt
- Mon 11am - 7pmltbrgt
- Tue 11am - 7pmltbrgt
- Wed 3pm - 7pmltbrgt
- Thu 11am - 7pmltbrgt
- Fri 11am - 3pmltpgt
- But note that we do not offer consultation during
the weeks of the - lta href". . ."gtState Of Originlt/agt games.
21Problems with HTML
- Humans have no problem with this
- Machines (software agents) do
- How distinguish therapists from the secretary,
- How determine exact consultation hours
- They would have to follow the link to the State
Of Origin games to find when they take place.
22A Better Representation
- ltcompanygt
- lttreatmentOfferedgtPhysiotherapylt/treatmentOffered
gt - ltcompanyNamegtAgilitas Physiotherapy
Centrelt/companyNamegt - ltstaffgt
- lttherapistgtLisa Davenportlt/therapistgt
- lttherapistgtSteve Matthewslt/therapistgt
- ltsecretarygtKelly Townsendlt/secretarygt
- lt/staffgt
- lt/companygt
23Explicit Metadata
- This representation is far more easily
processable by machines - Metadata data about data
- Metadata capture part of the meaning of data
- Semantic Web does not rely on text-based
manipulation, but rather on machine-processable
metadata
24Ontologies
- The term ontology originates from philosophy
- The study of the nature of existence
- Different meaning from computer science
- An ontology is an explicit and formal
specification of a conceptualization
25Typical Components of Ontologies
- Terms denote important concepts (classes of
objects) of the domain - e.g. professors, staff, students, courses,
departments - Relationships between these terms typically
class hierarchies - a class C to be a subclass of another class C' if
every object in C is also included in C' - e.g. all professors are staff members
-
26Further Components of Ontologies
- Properties
- e.g. X teaches Y
- Value restrictions
- e.g. only faculty members can teach courses
- Disjointness statements
- e.g. faculty and general staff are disjoint
- Logical relationships between objects
- e.g. every department must include at least 10
faculty
27Example of a Class Hierarchy
28The Role of Ontologies on the Web
- Ontologies provide a shared understanding of a
domain semantic interoperability - overcome differences in terminology
- mappings between ontologies
- Ontologies are useful for the organization and
navigation of Web sites
29The Role of Ontologies in Web Search
- Ontologies are useful for improving the accuracy
of Web searches - search engines can look for pages that refer to a
precise concept in an ontology - Web searches can exploit generalization/
specialization information - If a query fails to find any relevant documents,
the search engine may suggest to the user a more
general query. - If too many answers are retrieved, the search
engine may suggest to the user some
specializations.
30Web Ontology Languages
- RDF Schema
- RDF is a data model for objects and relations
between them - RDF Schema is a vocabulary description language
- Describes properties and classes of RDF resources
- Provides semantics for generalization hierarchies
of properties and classes
31Web Ontology Languages (2)
- OWL
- A richer ontology language
- relations between classes
- e.g., disjointness
- cardinality
- e.g. exactly one
- richer typing of properties
- characteristics of properties (e.g., symmetry)
32Logic and Inference
- Logic is the discipline that studies the
principles of reasoning - Formal languages for expressing knowledge
- Well-understood formal semantics
- Declarative knowledge we describe what holds
without caring about how it can be deduced - Automated reasoners can deduce (infer)
conclusions from the given knowledge
33An Inference Example
- prof(X) ? faculty(X)
- faculty(X) ? staff(X)
- prof(michael)
- We can deduce the following conclusions
- faculty(michael)
- staff(michael)
- prof(X) ? staff(X)
34Logic versus Ontologies
- The previous example involves knowledge typically
found in ontologies - Logic can be used to uncover ontological
knowledge that is implicitly given - It can also help uncover unexpected relationships
and inconsistencies - Logic is more general than ontologies
- It can also be used by intelligent agents for
making decisions and selecting courses of action
35Tradeoff between Expressive Power and
Computational Complexity
- The more expressive a logic is, the more
computationally expensive it becomes to draw
conclusions - Drawing certain conclusions may become impossible
if non-computability barriers are encountered. - Our previous examples involved rules If
conditions, then conclusion, and only finitely
many objects - This subset of logic is tractable and is
supported by efficient reasoning tools
36Inference and Explanations
- Explanations the series of inference steps can
be retraced - They increase users confidence in Semantic Web
agents Oh yeah? button - Activities between agents create or validate
proofs
37Typical Explanation Procedure
- Facts will typically be traced to some Web
addresses - The trust of the Web address will be verifiable
by agents - Rules may be a part of a shared commerce ontology
or the policy of the online shop
38Software Agents
- Software agents work autonomously and proactively
- They evolved out of object oriented and
compontent-based programming - A personal agent on the Semantic Web will
- receive some tasks and preferences from the
person - seek information from Web sources, communicate
with other agents - compare information about user requirements and
preferences, make certain choices - give answers to the user
39Intelligent Personal Agents
40Semantic Web Agent Technologies
- Metadata
- Identify and extract information from Web sources
- Ontologies
- Web searches, interpret retrieved information
- Communicate with other agents
- Logic
- Process retrieved information, draw conclusions
41Semantic Web Agent Technologies (2)
- Further technologies (orthogonal to the Semantic
Web technologies) - Agent communication languages
- Formal representation of beliefs, desires, and
intentions of agents - Creation and maintenance of user models.
42Lecture Outline
- Todays Web
- The Semantic Web Impact
- Semantic Web Technologies
- A Layered Approach
43A Layered Approach
- The development of the Semantic Web proceeds in
steps - Each step building a layer on top of another
- Principles
- Downward compatibility
- Upward partial understanding
44The Semantic Web Layer Tower
45Semantic Web Layers
- XML layer
- Syntactic basis
- RDF layer
- RDF basic data model for facts
- RDF Schema simple ontology language
- Ontology layer
- More expressive languages than RDF Schema
- Current Web standard OWL
46Semantic Web Layers (2)
- Logic layer
- enhance ontology languages further
- application-specific declarative knowledge
- Proof layer
- Proof generation, exchange, validation
- Trust layer
- Digital signatures
- recommendations, rating agencies .
47Book Outline
- Structured Web Documents in XML
- Describing Web Resources in RDF
- Web Ontology Language OWL
- Logic and Inference Rules
- Applications
- Ontology Engineering
- Conclusion and Outlook
48Chapter 2Structured Web Documents in XML
- Grigoris Antoniou
- Frank van Harmelen
49An HTML Example
- lth2gtNonmonotonic Reasoning Context-
- Dependent Reasoninglt/h2gt
- ltigtby ltbgtV. Mareklt/bgt and
- ltbgtM. Truszczynskilt/bgtlt/igtltbrgt
- Springer 1993ltbrgt
- ISBN 0387976892
50The Same Example in XML
- ltbookgt
- lttitlegtNonmonotonic Reasoning
Context- Dependent Reasoninglt/titlegt - ltauthorgtV. Mareklt/authorgt
- ltauthorgtM. Truszczynskilt/authorgt
- ltpublishergtSpringerlt/publishergt
- ltyeargt1993lt/yeargt
- ltISBNgt0387976892lt/ISBNgt
- lt/bookgt
51HTML versus XML Similarities
- Both use tags (e.g. lth2gt and lt/yeargt)
- Tags may be nested (tags within tags)
- Human users can read and interpret both HTML and
XML representations quite easily - But how about machines?
52Problems with Automated Interpretation of HTML
Documents
- An intelligent agent trying to retrieve the names
- of the authors of the book
- Authors names could appear immediately after the
title - or immediately after the word by
- Are there two authors?
- Or just one, called V. Marek and M.
Truszczynski?
53HTML vs XML Structural Information
- HTML documents do not contain structural
information pieces of the document and their
relationships. - XML more easily accessible to machines because
- Every piece of information is described.
- Relations are also defined through the nesting
structure. - E.g., the ltauthorgt tags appear within the ltbookgt
tags, so they describe properties of the
particular book.
54HTML vs XML Structural Information (2)
- A machine processing the XML document would be
able to deduce that - the author element refers to the enclosing book
element - rather than by proximity considerations
- XML allows the definition of constraints on
values - E.g. a year must be a number of four digits
-
55HTML vs XML Formatting
- The HTML representation provides more than the
XML representation - The formatting of the document is also described
- ?he main use of an HTML document is to display
information it must define formatting - XML separation of content from display
- same information can be displayed in different
ways
56HTML vs XML Another Example
- In HTML
- lth2gtRelationship matter-energylt/h2gt
- ltigt E M c2 lt/igt
- In XML
- ltequationgt
- ltmeaninggtRelationship matter
- energylt/meaninggt
- ltleftsidegt E lt/leftsidegt
- ltrightsidegt M c2 lt/rightsidegt
- lt/equationgt
57HTML vs XML Different Use of Tags
- In both HTML docs same tags
- In XML completely different
- HTML tags define display color, lists
- XML tags not fixed user definable tags
- XML meta markup language language for defining
markup languages
58XML Vocabularies
- Web applications must agree on common
vocabularies to communicate and collaborate - Communities and business sectors are defining
their specialized vocabularies - mathematics (MathML)
- bioinformatics (BSML)
- human resources (HRML)
-
59Lecture Outline
- Introduction
- Detailed Description of XML
- Structuring
- DTDs
- XML Schema
- Namespaces
- Accessing, querying XML documents XPath
- Transformations XSLT
60The XML Language
- An XML document consists of
- a prolog
- a number of elements
- an optional epilog (not discussed)
61Prolog of an XML Document
- The prolog consists of
- an XML declaration and
- an optional reference to external structuring
documents - lt?xml version"1.0" encoding"UTF-16"?gt
- lt!DOCTYPE book SYSTEM "book.dtd"gt
62XML Elements
- The things the XML document talks about
- E.g. books, authors, publishers
- An element consists of
- an opening tag
- the content
- a closing tag
- ltlecturergtDavid Billingtonlt/lecturergt
63XML Elements (2)
- Tag names can be chosen almost freely.
- The first character must be a letter, an
underscore, or a colon - No name may begin with the string xml in any
combination of cases - E.g. Xml, xML
64Content of XML Elements
- Content may be text, or other elements, or
nothing - ltlecturergt
- ltnamegtDavid Billingtonlt/namegt
- ltphonegt 61 - 7 - 3875 507 lt/phonegt
- lt/lecturergt
- If there is no content, then the element is
called empty it is abbreviated as follows - ltlecturer/gt for ltlecturergtlt/lecturergt
65XML Attributes
- An empty element is not necessarily meaningless
- It may have some properties in terms of
attributes - An attribute is a name-value pair inside the
opening tag of an element - ltlecturer name"David Billington" phone"61 - 7
- 3875 507"/gt
66XML Attributes An Example
- ltorder orderNo"23456" customer"John Smith"
- date"October 15, 2002"gt
- ltitem itemNo"a528" quantity"1"/gt
- ltitem itemNo"c817" quantity"3"/gt
- lt/ordergt
67The Same Example without Attributes
- ltordergt
- ltorderNogt23456lt/orderNogt
- ltcustomergtJohn Smithlt/customergt
- ltdategtOctober 15, 2002lt/dategt
- ltitemgt
- ltitemNogta528lt/itemNogt
- ltquantitygt1lt/quantitygt
- lt/itemgt
- ltitemgt
- ltitemNogtc817lt/itemNogt
- ltquantitygt3lt/quantitygt
- lt/itemgt
- lt/ordergt
68XML Elements vs Attributes
- Attributes can be replaced by elements
- When to use elements and when attributes is a
matter of taste - But attributes cannot be nested
69Further Components of XML Docs
- Comments
- A piece of text that is to be ignored by parser
- lt!-- This is a comment --gt
- Processing Instructions (PIs)
- Define procedural attachments
- lt?stylesheet type"text/css" href"mystyle.css"?gt
70Well-Formed XML Documents
- Syntactically correct documents
- Some syntactic rules
- Only one outermost element (called root element)
- Each element contains an opening and a
corresponding closing tag - Tags may not overlap
- ltauthorgtltnamegtLee Honglt/authorgtlt/namegt
- Attributes within an element have unique names
- Element and tag names must be permissible
71The Tree Model of XML Documents An Example
- ltemailgt
- ltheadgt
- ltfrom name"Michael Maher"
- address"michaelmaher_at_cs.gu.edu.au"/gt
- ltto name"Grigoris Antoniou"
- address"grigoris_at_cs.unibremen.de"/gt
- ltsubjectgtWhere is your draft?lt/subjectgt
- lt/headgt
- ltbodygt
- Grigoris, where is the draft of the paper you
promised me - last week?
- lt/bodygt
- lt/emailgt
72The Tree Model of XML Documents An Example (2)
73The Tree Model of XML Docs
- The tree representation of an XML document is an
ordered labeled tree - There is exactly one root
- There are no cycles
- Each non-root node has exactly one parent
- Each node has a label.
- The order of elements is important
- but the order of attributes is not important
74Lecture Outline
- Introduction
- Detailed Description of XML
- Structuring
- DTDs
- XML Schema
- Namespaces
- Accessing, querying XML documents XPath
- Transformations XSLT
75Structuring XML Documents
- Define all the element and attribute names that
may be used - Define the structure
- what values an attribute may take
- which elements may or must occur within other
elements, etc. - If such structuring information exists, the
document can be validated
76Structuring XML Dcuments (2)
- An XML document is valid if
- it is well-formed
- respects the structuring information it uses
- There are two ways of defining the structure of
XML documents - DTDs (the older and more restricted way)
- XML Schema (offers extended possibilities)
77DTD Element Type Definition
- ltlecturergt
- ltnamegtDavid Billingtonlt/namegt
- ltphonegt 61 - 7 - 3875 507 lt/phonegt
- lt/lecturergt
- DTD for above element (and all lecturer
elements) - lt!ELEMENT lecturer (name,phone)gt
- lt!ELEMENT name (PCDATA)gt
- lt!ELEMENT phone (PCDATA)gt
78The Meaning of the DTD
- The element types lecturer, name, and phone may
be used in the document - A lecturer element contains a name element and a
phone element, in that order (sequence) - A name element and a phone element may have any
content - In DTDs, PCDATA is the only atomic type for
elements
79DTD Disjunction in Element Type Definitions
- We express that a lecturer element contains
either a name element or a phone element as
follows - lt!ELEMENT lecturer (namephone)gt
- A lecturer element contains a name element and a
phone element in any order. - lt!ELEMENT lecturer((name,phone)(phone,name))gt
80Example of an XML Element
- ltorder orderNo"23456"
- customer"John Smith"
- date"October 15, 2002"gt
- ltitem itemNo"a528" quantity"1"/gt
- ltitem itemNo"c817" quantity"3"/gt
- lt/ordergt
81The Corresponding DTD
- lt!ELEMENT order (item)gt
- lt!ATTLIST order orderNo ID REQUIRED
- customer CDATA REQUIRED
- date CDATA REQUIREDgt
- lt!ELEMENT item EMPTYgt
- lt!ATTLIST item itemNo ID REQUIRED
- quantity CDATA REQUIRED
- comments CDATA IMPLIEDgt
82Comments on the DTD
- The item element type is defined to be empty
- (after item) is a cardinality operator
- ? appears zero times or once
- appears zero or more times
- appears one or more times
- No cardinality operator means exactly once
83Comments on the DTD (2)
- In addition to defining elements, we define
attributes - This is done in an attribute list containing
- Name of the element type to which the list
applies - A list of triplets of attribute name, attribute
type, and value type - Attribute name A name that may be used in an XML
document using a DTD
84DTD Attribute Types
- Similar to predefined data types, but limited
selection - The most important types are
- CDATA, a string (sequence of characters)
- ID, a name that is unique across the entire XML
document - IDREF, a reference to another element with an ID
attribute carrying the same value as the IDREF
attribute - IDREFS, a series of IDREFs
- (v1 . . . vn), an enumeration of all possible
values - Limitations no dates, number ranges etc.
85DTD Attribute Value Types
- REQUIRED
- Attribute must appear in every occurrence of the
element type in the XML document - IMPLIED
- The appearance of the attribute is optional
- FIXED "value"
- Every element must have this attribute
- "value"
- This specifies the default value for the
attribute
86Referencing with IDREF and IDREFS
- lt!ELEMENT family (person)gt
- lt!ELEMENT person (name)gt
- lt!ELEMENT name (PCDATA)gt
- lt!ATTLIST person id ID REQUIRED
- mother IDREF IMPLIED
- father IDREF IMPLIED
- children IDREFS IMPLIEDgt
87An XML Document Respecting the DTD
- ltfamilygt
- ltperson id"bob" mother"mary" father"peter"gt
- ltnamegtBob Marleylt/namegt
- lt/persongt
- ltperson id"bridget" mother"mary"gt
- ltnamegtBridget Joneslt/namegt
- lt/persongt
- ltperson id"mary" children"bob bridget"gt
- ltnamegtMary Poppinslt/namegt
- lt/persongt
- ltperson id"peter" children"bob"gt
- ltnamegtPeter Marleylt/namegt
- lt/persongt
- lt/familygt
88A DTD for an Email Element
- lt!ELEMENT email (head,body)gt
- lt!ELEMENT head (from,to,cc,subject)gt
- lt!ELEMENT from EMPTYgt
- lt!ATTLIST from name CDATA IMPLIED
- address CDATA REQUIREDgt
- lt!ELEMENT to EMPTYgt
- lt!ATTLIST to name CDATA IMPLIED
- address CDATA REQUIREDgt
89A DTD for an Email Element (2)
- lt!ELEMENT cc EMPTYgt
- lt!ATTLIST cc name CDATA IMPLIED
- address CDATA REQUIREDgt
- lt!ELEMENT subject (PCDATA)gt
- lt!ELEMENT body (text,attachment)gt
- lt!ELEMENT text (PCDATA)gt
- lt!ELEMENT attachment EMPTYgt
- lt!ATTLIST attachment
- encoding (mimebinhex) "mime"
- file CDATA REQUIREDgt
90Interesting Parts of the DTD
- A head element contains (in that order)
- a from element
- at least one to element
- zero or more cc elements
- a subject element
- In from, to, and cc elements
- the name attribute is not required
- the address attribute is always required
91Interesting Parts of the DTD (2)
- A body element contains
- a text element
- possibly followed by a number of attachment
elements - The encoding attribute of an attachment element
must have either the value mime or binhex - mime is the default value
92Remarks on DTDs
- A DTD can be interpreted as an Extended
Backus-Naur Form (EBNF) - lt!ELEMENT email (head,body)gt
- is equivalent to email head body
- Recursive definitions possible in DTDs
- lt!ELEMENT bintree
- ((bintree root bintree)emptytree)gt
93Lecture Outline
- Introduction
- Detailed Description of XML
- Structuring
- DTDs
- XML Schema
- Namespaces
- Accessing, querying XML documents XPath
- Transformations XSLT
94XML Schema
- Significantly richer language for defining the
structure of XML documents - Tts syntax is based on XML itself
- not necessary to write separate tools
- Reuse and refinement of schemas
- Expand or delete already existent schemas
- Sophisticated set of data types, compared to DTDs
(which only supports strings)
95XML Schema (2)
- An XML schema is an element with an opening tag
like - ltschema "http//www.w3.org/2000/10/XMLSchema"
- version"1.0"gt
- Structure of schema elements
- Element and attribute types using data types
96Element Types
- ltelement name"email"/gt
- ltelement name"head" minOccurs"1"
maxOccurs"1"/gt - ltelement name"to" minOccurs"1"/gt
- Cardinality constraints
- minOccurs"x" (default value 1)
- maxOccurs"x" (default value 1)
- Generalizations of ,?, offered by DTDs
97Attribute Types
- ltattribute name"id" type"ID use"required"/gt
- lt attribute name"speaks" type"Language"
- use"default" value"en"/gt
- Existence use"x", where x may be optional or
required - Default value use"x" value"...", where x may
be default or fixed
98Data Types
- There is a variety of built-in data types
- Numerical data types integer, Short etc.
- String types string, ID, IDREF, CDATA etc.
- Date and time data types time, Month etc.
- There are also user-defined data types
- simple data types, which cannot use elements or
attributes - complex data types, which can use these
99Data Types (2)
- Complex data types are defined from already
existing data types by defining some attributes
(if any) and using - sequence, a sequence of existing data type
elements (order is important) - all, a collection of elements that must appear
(order is not important) - choice, a collection of elements, of which one
will be chosen
100A Data Type Example
- ltcomplexType name"lecturerType"gt
- ltsequencegt
- ltelement name"firstname" type"string"
- minOccurs"0 maxOccurs"unbounded"/gt
- ltelement name"lastname" type"string"/gt
- lt/sequencegt
- ltattribute name"title" type"string"
use"optional"/gt - lt/complexTypegt
101Data Type Extension
- Already existing data types can be extended by
new elements or attributes. Example - ltcomplexType name"extendedLecturerType"gt
- ltextension base"lecturerType"gt
- ltsequencegt
- ltelement name"email" type"string"
- minOccurs"0" maxOccurs"1"/gt
- lt/sequencegt
- ltattribute name"rank" type"string"
use"required"/gt - lt/extensiongt
- lt/complexTypegt
102Resulting Data Type
- ltcomplexType name"extendedLecturerType"gt
- ltsequencegt
- ltelement name"firstname" type"string"
- minOccurs"0" maxOccurs"unbounded"/gt
- ltelement name"lastname" type"string"/gt
- ltelement name"email" type"string"
- minOccurs"0" maxOccurs"1"/gt
- lt/sequencegt
- ltattribute name"title" type"string"
use"optional"/gt - ltattribute name"rank" type"string"
use"required"/gt - lt/complexTypegt
103Data Type Extension (2)
- A hierarchical relationship exists between the
original and the extended type - Instances of the extended type are also instances
of the original type - They may contain additional information, but
neither less information, nor information of the
wrong type
104Data Type Restriction
- An existing data type may be restricted by adding
constraints on certain values - Restriction is not the opposite from extension
- Restriction is not achieved by deleting elements
or attributes - The following hierarchical relationship still
holds - Instances of the restricted type are also
instances of the original type - They satisfy at least the constraints of the
original type
105Example of Data Type Restriction
- ltcomplexType name"restrictedLecturerType"gt
- ltrestriction base"lecturerType"gt
- ltsequencegt
- ltelement name"firstname" type"string"
- minOccurs"1" maxOccurs"2"/gt
- lt/sequencegt
- ltattribute name"title" type"string"
- use"required"/gt
- lt/restrictiongt
- lt/complexTypegt
106Restriction of Simple Data Types
- ltsimpleType name"dayOfMonth"gt
- ltrestriction base"integer"gt
- ltminInclusive value"1"/gt
- ltmaxInclusive value"31"/gt
- lt/restrictiongt
- lt/simpleTypegt
107Data Type Restriction Enumeration
- ltsimpleType name"dayOfWeek"gt
- ltrestriction base"string"gt
- ltenumeration value"Mon"/gt
- ltenumeration value"Tue"/gt
- ltenumeration value"Wed"/gt
- ltenumeration value"Thu"/gt
- ltenumeration value"Fri"/gt
- ltenumeration value"Sat"/gt
- ltenumeration value"Sun"/gt
- lt/restrictiongt
- lt/simpleTypegt
108XML Schema The Email Example
- ltelement name"email" type"emailType"/gt
- ltcomplexType name"emailType"gt
- ltsequencegt
- ltelement name"head" type"headType"/gt
- ltelement name"body" type"bodyType"/gt
- lt/sequencegt
- lt/complexTypegt
109XML Schema The Email Example (2)
- ltcomplexType name"headType"gt
- ltsequencegt
- ltelement name"from" type"nameAddress"/gt
- ltelement name"to" type"nameAddress"
- minOccurs"1" maxOccurs"unbounded"/gt
- ltelement name"cc" type"nameAddress"
- minOccurs"0" maxOccurs"unbounded"/gt
- ltelement name"subject" type"string"/gt
- lt/sequencegt
- lt/complexTypegt
110XML Schema The Email Example (3)
- ltcomplexType name"nameAddress"gt
- ltattribute name"name" type"string"
use"optional"/gt - ltattribute name"address" type"string"
use"required"/gt - lt/complexTypegt
- Similar for bodyType
111Lecture Outline
- Introduction
- Detailed Description of XML
- Structuring
- DTDs
- XML Schema
- Namespaces
- Accessing, querying XML documents XPath
- Transformations XSLT
112Namespaces
- An XML document may use more than one DTD or
schema - Since each structuring document was developed
independently, name clashes may appear - The solution is to use a different prefix for
each DTD or schema - prefixname
113An Example
- ltvuinstructors xmlnsvu"http//www.vu.com/empDT
D" - xmlnsgu"http//www.gu.au/empDTD"
- xmlnsuky"http//www.uky.edu/empDTD"gt
- ltukyfaculty ukytitle"assistant professor"
- ukyname"John Smith"
- ukydepartment"Computer Science"/gt
- ltguacademicStaff gutitle"lecturer"
- guname"Mate Jones"
- guschool"Information Technology"/gt
- lt/vuinstructorsgt
114Namespace Declarations
- Namespaces are declared within an element and can
be used in that element and any of its children
(elements and attributes) - A namespace declaration has the form
- xmlnsprefix"location"
- location is the address of the DTD or schema
- If a prefix is not specified xmlns"location"
then the location is used by default
115Lecture Outline
- Introduction
- Detailed Description of XML
- Structuring
- DTDs
- XML Schema
- Namespaces
- Accessing, querying XML documents XPath
- Transformations XSLT
116Addressing and Querying XML Documents
- In relational databases, parts of a database can
be selected and retrieved using SQL - Same necessary for XML documents
- Query languages XQuery, XQL, XML-QL
- The central concept of XML query languages is a
path expression - Specifies how a node or a set of nodes, in the
tree representation of the XML document can be
reached
117XPath
- XPath is core for XML query languages
- Language for addressing parts of an XML document.
- It operates on the tree data model of XML
- It has a non-XML syntax
118Types of Path Expressions
- Absolute (starting at the root of the tree)
- Syntactically they begin with the symbol /
- It refers to the root of the document (situated
one level above the root element of the document) - Relative to a context node
119An XML Example
- ltlibrary location"Bremen"gt
- ltauthor name"Henry Wise"gt
- ltbook title"Artificial Intelligence"/gt
- ltbook title"Modern Web Services"/gt
- ltbook title"Theory of Computation"/gt
- lt/authorgt
- ltauthor name"William Smart"gt
- ltbook title"Artificial Intelligence"/gt
- lt/authorgt
- ltauthor name"Cynthia Singleton"gt
- ltbook title"The Semantic Web"/gt
- ltbook title"Browser Technology Revised"/gt
- lt/authorgt
- lt/librarygt
120Tree Representation
121Examples of Path Expressions in XPath
- Address all author elements
- /library/author
- Addresses all author elements that are children
of the library element node, which resides
immediately below the root - /t1/.../tn, where each ti1 is a child node of
ti, is a path through the tree representation
122Examples of Path Expressions in XPath (2)
- Address all author elements
- //author
- Here // says that we should consider all elements
in the document and check whether they are of
type author - This path expression addresses all author
elements anywhere in the document
123Examples of Path Expressions in XPath (3)
- Address the location attribute nodes within
library element nodes - /library/_at_location
- The symbol _at_ is used to denote attribute nodes
124Examples of Path Expressions in XPath (4)
- Address all title attribute nodes within book
elements anywhere in the document, which have the
value Artificial Intelligence - //book/_at_title"Artificial Intelligence"
125Examples of Path Expressions in XPath (5)
- Address all books with title Artificial
Intelligence - /book_at_title"Artificial Intelligence"
- Test within square brackets a filter expression
- It restricts the set of addressed nodes.
- Difference with query 4.
- Query 5 addresses book elements, the title of
which satisfies a certain condition. - Query 4 collects title attribute nodes of book
elements
126Tree Representation of Query 4
127Tree Representation of Query 5
128Examples of Path Expressions in XPath (6)
- Address the first author element node in the XML
document - //author1
- Address the last book element within the first
author element node in the document - //author1/booklast()
- Address all book element nodes without a title
attribute - //booknot _at_title
129General Form of Path Expressions
- A path expression consists of a series of steps,
separated by slashes - A step consists of
- An axis specifier,
- A node test, and
- An optional predicate
130General Form of Path Expressions (2)
- An axis specifier determines the tree
relationship between the nodes to be addressed
and the context node - E.g. parent, ancestor, child (the default),
sibling, attribute node - // is such an axis specifier descendant or self
131General Form of Path Expressions (3)
- A node test specifies which nodes to address
- The most common node tests are element names
- E.g., addresses all element nodes
- comment() addresses all comment nodes
132General Form of Path Expressions (4)
- Predicates (or filter expressions) are optional
and are used to refine the set of addressed nodes - E.g., the expression 1 selects the first node
- position()last() selects the last node
- position() mod 2 0 selects the even nodes
- XPath has a more complicated full syntax.
- We have only presented the abbreviated syntax
133Lecture Outline
- Introduction
- Detailed Description of XML
- Structuring
- DTDs
- XML Schema
- Namespaces
- Accessing, querying XML documents XPath
- Transformations XSLT
134Displaying XML Documents
- ltauthorgt
- ltnamegtGrigoris Antonioult/namegt
- ltaffiliationgtUniversity of Bremenlt/affiliationgt
- ltemailgtga_at_tzi.delt/emailgt
- lt/authorgt
- may be displayed in different ways
- Grigoris Antoniou Grigoris Antoniou
- University of Bremen University of Bremen
- ga_at_tzi.de ga_at_tzi.de
135Style Sheets
- Style sheets can be written in various languages
- E.g. CSS2 (cascading style sheets level 2)
- XSL (extensible stylesheet language)
- XSL includes
- a transformation language (XSLT)
- a formatting language
- Both are XML applications
136XSL Transformations (XSLT)
- XSLT specifies rules with which an input XML
document is transformed to - another XML document
- an HTML document
- plain text
- The output document may use the same DTD or
schema, or a completely different vocabulary - XSLT can be used independently of the formatting
language
137XSLT (2)
- Move data and metadata from one XML
representation to another - XSLT is chosen when applications that use
different DTDs or schemas need to communicate - XSLT can be used for machine processing of
content without any regard to displaying the
information for people to read. - In the following we use XSLT only to display XML
documents
138XSLT Transformation into HTML
- ltxsltemplate match"/author"gt
- lthtmlgt
- ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
- ltbody bgcolor"white"gt
- ltbgtltxslvalue-of select"name"/gtlt/bgtltbrgt
- ltxslvalue-of select"affiliation"/gtltbrgt
- ltigtltxslvalue-of select"email"/gtlt/igt
- lt/bodygt
- lt/htmlgt
- lt/xsltemplategt
139Style Sheet Output
- lthtmlgt
- ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
- ltbody bgcolor"white"gt
- ltbgtGrigoris Antonioult/bgtltbrgt
- University of Bremenltbrgt
- ltigtga_at_tzi.delt/igt
- lt/bodygt
- lt/htmlgt
140Observations About XSLT
- XSLT documents are XML documents
- XSLT resides on top of XML
- The XSLT document defines a template
- In this case an HTML document, with some
placeholders for content to be inserted - xslvalue-of retrieves the value of an element
and copies it into the output document - It places some content into the template
141A Template
- lthtmlgt
- ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
- ltbody bgcolor"white"gt
- ltbgt...lt/bgtltbrgt
- ...ltbrgt
- ltigt...lt/igt
- lt/bodygt
- lt/htmlgt
142Auxiliary Templates
- We have an XML document with details of several
authors - It is a waste of effort to treat each author
element separately - In such cases, a special template is defined for
author elements, which is used by the main
template
143Example of an Auxiliary Template
- ltauthorsgt
- ltauthorgt
- ltnamegtGrigoris Antonioult/namegt
- ltaffiliationgtUniversity of Bremenlt/affiliationgt
- ltemailgtga_at_tzi.delt/emailgt
- lt/authorgt
- ltauthorgt
- ltnamegtDavid Billingtonlt/namegt
- ltaffiliationgtGriffith Universitylt/affiliationgt
- ltemailgtdavid_at_gu.edu.netlt/emailgt
- lt/authorgt
- lt/authorsgt
144Example of an Auxiliary Template (2)
- ltxsltemplate match"/"gt
- lthtmlgt
- ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
- ltbody bgcolor"white"gt
- ltxslapply-templates select"authors"/gt
- lt!-- Apply templates for AUTHORS children
--gt - lt/bodygt
- lt/htmlgt
- lt/xsltemplategt
145Example of an Auxiliary Template (3)
- ltxsltemplate match"authors"gt
- ltxslapply-templates select"author"/gt
- lt/xsltemplategt
- ltxsltemplate match"author"gt
- lth2gtltxslvalue-of select"name"/gtlt/h2gt
- Affiliationltxslvalue-of
- select"affiliation"/gtltbrgt
- Email ltxslvalue-of select"email"/gt
- ltpgt
- lt/xsltemplategt
146Multiple Authors Output
- lthtmlgt
- ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
- ltbody bgcolor"white"gt
- lth2gtGrigoris Antonioult/h2gt
- Affiliation University of Bremenltbrgt
- Email ga_at_tzi.de
- ltpgt
- lth2gtDavid Billingtonlt/h2gt
- Affiliation Griffith Universityltbrgt
- Email david_at_gu.edu.net
- ltpgt
- lt/bodygt
- lt/htmlgt
147Explanation of the Example
- xslapply-templates element causes all children
of the context node to be matched against the
selected path expression - E.g., if the current template applies to /, then
the element xslapply-templates applies to the
root element - I.e. the authors element (/ is located above the
root element) - If the current context node is the authors
element, then the element xslapply-templates
select"author" causes the template for the
author elements to be applied to all author
children of the authors element
148Explanation of the Example (2)
- It is good practice to define a template for each
element type in the document - Even if no specific processing is applied to
certain elements, the xslapply-templates element
should be used - E.g. authors
- In this way, we work from the root to the leaves
of the tree, and all templates are applied
149Processing XML Attributes
- Suppose we wish to transform to itself the
element - ltperson firstname"John" lastname"Woo"/gt
- Wrong solution
- ltxsltemplate match"person"gt
- ltperson firstname"ltxslvalue-of
select"_at_firstname"gt" - lastname"ltxslvalue-of select"_at_lastname"gt"/gt
- lt/xsltemplategt
150Processing XML Attributes (2)
- Not well-formed because tags are not allowed
within the values of attributes - We wish to add attribute values into template
- ltxsltemplate match"person"gt
- ltperson firstname"_at_firstname"
- lastname"_at_lastname"/gt
- lt/xsltemplategt
151Transforming an XML Document to Another
152Transforming an XML Document to Another (2)
- ltxsltemplate match"/"gt
- lt?xml version"1.0" encoding"UTF-16"?gt
- ltauthorsgt
- ltxslapply-templates select"authors"/gt
- lt/authorsgt
- lt/xsltemplategt
- ltxsltemplate match"authors"gt
- ltauthorgt
- ltxslapply-templates select"author"/gt
- lt/authorgt
- lt/xsltemplategt
153Transforming an XML Document to Another (3)
- ltxsltemplate match"author"gt
- ltnamegtltxslvalue-of select"name"/gtlt/namegt
- ltcontactgt
- ltinstitutiongt
- ltxslvalue-of select"affiliation"/gt
- lt/institutiongt
- ltemailgtltxslvalue-of select"email"/gtlt/emailgt
- lt/contactgt
- lt/xsltemplategt
154Summary
- XML is a metalanguage that allows users to define
markup - XML separates content and structure from
formatting - XML is the de facto standard for the
representation and exchange of structured
information on the Web - XML is supported by query languages
155Points for Discussion in Subsequent Chapters
- The nesting of tags does not have standard
meaning - The semantics of XML documents is not accessible
to machines, only to people - Collaboration and exchange are supported if there
is underlying shared understanding of the
vocabulary - XML is well-suited for close collaboration, where
domain- or community-based vocabularies are used - It is not so well-suited for global communication.
156Chapter 3Describing Web Resources in RDF
- Grigoris Antoniou
- Frank van Harmelen
157Lecture Outline
- Basic Ideas of RDF
- XML-based Syntax of RDF
- Basic Concepts of RDF Schema
- ?he Language of RDF Schema
- The Namespaces of RDF and RDF Schema
- Axiomatic Semantics for RDF and RDFS
- Direct Semantics based on Inference Rules
- Querying of RDF/RDFS Documents using RQL
158Drawbacks of XML
- XML is a universal metalanguage for defining
markup - It provides a uniform framework for interchange
of data and metadata between applications - However, XML does not provide any means of
talking about the semantics (meaning) of data - E.g., there is no intended meaning associated
with the nesting of tags - It is up to each application to interpret the
nesting.
159Nesting of Tags in XML
- David Billington is a lecturer of Discrete Maths
- ltcourse name"Discrete Maths"gt
- ltlecturergtDavid Billingtonlt/lecturergt
- lt/coursegt
- ltlecturer name"David Billington"gt
- ltteachesgtDiscrete Mathslt/teachesgt
- lt/lecturergt
- Opposite nesting, same information!
160Basic Ideas of RDF
- Basic building block object-attribute-value
triple - It is called a statement
- Sentence about Billington is such a statement
- RDF has been given a syntax in XML
- This syntax inherits the benefits of XML
- Other syntactic representations of RDF possible
161Basic Ideas of RDF (2)
- The fundamental concepts of RDF are
- resources
- properties
- statements
162Resources
- We can think of a resource as an object, a
thing we want to talk about - E.g. authors, books, publishers, places, people,
hotels - Every resource has a URI, a Universal Resource
Identifier - A URI can be
- a URL (Web address) or
- some other kind of unique identifier
163Properties
- Properties are a special kind of resources
- They describe relations between resources
- E.g. written by, age, title, etc.
- Properties are also identified by URIs
- Advantages of using URIs
- ? global, worldwide, unique naming scheme
- Reduces the homonym problem of distributed data
representation -
164Statements
- Statements assert the properties of resources
- A statement is an object-attribute-value triple
- It consists of a resource, a property, and a
value - Values can be resources or literals
- Literals are atomic values (strings)
165Three Views of a Statement
- A triple
- A piece of a graph
- A piece of XML code
- Thus an RDF document can be viewed as
- A set of triples
- A graph (semantic net)
- An XML document
166Statements as Triples
- (David Billington,
- http//www.mydomain.org/site-owner,
- http//www.cit.gu.edu.au/db)
- The triple (x,P,y) can be considered as a logical
formula P(x,y) - Binary predicate P relates object x to object y
- RDF offers only binary predicates (properties)
167XML Vocabularies
- A directed graph with labeled nodes and arcs
- from the resource (the subject of the statement)
- to the value (the object of the statement)
- Known in AI as a semantic net
- The value of a statement may be a resource
- ?t may be linked to other resources
168A Set of Triples as a Semantic Net
169Statements in XML Syntax
- Graphs are a powerful tool for human
understanding but - The Semantic Web vision requires
machine-accessible and machine-processable
representations - There is a 3rd representation based on XML
- But XML is not a part of the RDF data model
- E.g. serialisation of XML is irrelevant for RDF
170Statements in XML (2)
- ltrdfRDF
- xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synta
x-ns" - xmlnsmydomain"http//www.mydomain.org/my-rdf-ns
"gt -
- ltrdfDescription
- rdfabout"http//www.cit.gu.edu.au/db"gt
- ltmydomainsite-ownergt
- David Billington
- lt/mydomainsite-ownergt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
171Statements in XML (3)
- An RDF document is represented by an XML element
with the tag rdfRDF - The content of this element is a number of
descriptions, which use rdfDescription tags. - E