Chapter 1 The Semantic Web Vision - PowerPoint PPT Presentation

View by Category
About This Presentation

Chapter 1 The Semantic Web Vision


Chapter 1 The Semantic Web Vision Grigoris Antoniou Frank van Harmelen Chapter 1 A Semantic Web Primer * * Chapter 5 A Semantic Web Primer * Brokered Trade Brokered ... – PowerPoint PPT presentation

Number of Views:279
Avg rating:3.0/5.0
Slides: 164
Provided by: ICS112


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Chapter 1 The Semantic Web Vision

Chapter 1The Semantic Web Vision
  • Grigoris Antoniou
  • Frank van Harmelen

Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

Todays Web
  • Most of todays Web content is suitable for human
  • Even Web content that is generated automatically
    from databases is usually presented without the
    original structural information found in
  • Typical Web uses today peoples
  • seeking and making use of information, searching
    for and getting in touch with other people,
    reviewing catalogs of online stores and ordering
    products by filling out forms

Keyword-Based Search Engines
  • Current Web activities are not particularly well
    supported by software tools
  • Except for keyword-based search engines (e.g.
    Google, AltaVista, Yahoo)
  • The Web would not have been the huge success it
    was, were it not for search engines

Problems of Keyword-Based Search Engines
  • High recall, low precision.
  • Low or no recall
  • Results are highly sensitive to vocabulary
  • Results are single Web pages
  • Human involvement is necessary to interpret and
    combine results
  • Results of Web searches are not readily
    accessible by other software tools

The Key Problem of Todays Web
  • The meaning of Web content is not
    machine-accessible lack of semantics
  • It is simply difficult to distinguish the meaning
    between these two sentences
  • I am a professor of computer science.
  • I am a professor of computer science,
  • you may think. Well, . . .

The Semantic Web Approach
  • Represent Web content in a form that is more
    easily machine-processable.
  • Use intelligent techniques to take advantage of
    these representations.
  • The Semantic Web will gradually evolve out of the
    existing Web, it is not a competition to the
    current WWW

Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

The Semantic Web Impact Knowledge Management
  • Knowledge management concerns itself with
    acquiring, accessing, and maintaining knowledge
    within an organization
  • Key activity of large businesses internal
    knowledge as an intellectual asset
  • It is particularly important for international,
    geographically dispersed organizations
  • Most information is currently available in a
    weakly structured form (e.g. text, audio, video)

Limitations of Current Knowledge Management
  • Searching information
  • Keyword-based search engines
  • Extracting information
  • human involvement necessary for browsing,
    retrieving, interpreting, combining
  • Maintaining information
  • inconsistencies in terminology, outdated
  • Viewing information
  • Impossible to define views on Web knowledge

Semantic Web Enabled Knowledge Management
  • Knowledge will be organized in conceptual spaces
    according to its meaning.
  • Automated tools for maintenance and knowledge
  • Semantic query answering
  • Query answering over several documents
  • Defining who may view certain parts of
    information (even parts of documents) will be

The Semantic Web Impact B2C Electronic
  • A typical scenario user visits one or several
    online shops, browses their offers, selects and
    orders products.
  • Ideally humans would visit all, or all major
    online stores but too time consuming
  • Shopbots are a useful tool

Limitations of Shopbots
  • They rely on wrappers extensive programming
  • Wrappers need to be reprogrammed when an online
    store changes its outfit
  • Wrappers extract information based on textual
  • Error-prone
  • Limited information extracted

Semantic Web Enabled B2C Electronic Commerce
  • Software agents that can interpret the product
    information and the terms of service.
  • Pricing and product information, delivery and
    privacy policies will be interpreted and compared
    to the user requirements.
  • Information about the reputation of shops
  • Sophisticated shopping agents will be able to
    conduct automated negotiations

The Semantic Web Impact B2B Electronic Commerce
  • Greatest economic promise
  • Currently relies mostly on EDI
  • Isolated technology, understood only by experts
  • Difficult to program and maintain, error-prone
  • Each B2B communication requires separate
  • Web appears to be perfect infrastructure
  • But B2B not well supported by Web standards

Semantic Web Enabled B2B Electronic Commerce
  • Businesses enter partnerships without much
  • Differences in terminology will be resolved using
    standard abstract domain models
  • Data will be interchanged using translation
  • Auctioning, negotiations, and drafting contracts
    will be carried out automatically (or
    semi-automatically) by software agents

Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

Semantic Web Technologies
  • Explicit Metadata
  • Ontologies
  • Logic and Inference
  • Agents

  • Web content is currently formatted for human
    readers rather than programs
  • HTML is the predominant language in which Web
    pages are written (directly or using tools)
  • Vocabulary describes presentation

An HTML Example
  • lth1gtAgilitas Physiotherapy Centrelt/h1gt
  • Welcome to the home page of the Agilitas
    Physiotherapy Centre. Do
  • you feel pain? Have you had an injury? Let our
    staff Lisa Davenport,
  • Kelly Townsend (our lovely secretary) and Steve
    Matthews take care
  • of your body and soul.
  • lth2gtConsultation hourslt/h2gt
  • Mon 11am - 7pmltbrgt
  • Tue 11am - 7pmltbrgt
  • Wed 3pm - 7pmltbrgt
  • Thu 11am - 7pmltbrgt
  • Fri 11am - 3pmltpgt
  • But note that we do not offer consultation during
    the weeks of the
  • lta href". . ."gtState Of Originlt/agt games.

Problems with HTML
  • Humans have no problem with this
  • Machines (software agents) do
  • How distinguish therapists from the secretary,
  • How determine exact consultation hours
  • They would have to follow the link to the State
    Of Origin games to find when they take place.

A Better Representation
  • ltcompanygt
  • lttreatmentOfferedgtPhysiotherapylt/treatmentOffered
  • ltcompanyNamegtAgilitas Physiotherapy
  • ltstaffgt
  • lttherapistgtLisa Davenportlt/therapistgt
  • lttherapistgtSteve Matthewslt/therapistgt
  • ltsecretarygtKelly Townsendlt/secretarygt
  • lt/staffgt
  • lt/companygt

Explicit Metadata
  • This representation is far more easily
    processable by machines
  • Metadata data about data
  • Metadata capture part of the meaning of data
  • Semantic Web does not rely on text-based
    manipulation, but rather on machine-processable

  • The term ontology originates from philosophy
  • The study of the nature of existence
  • Different meaning from computer science
  • An ontology is an explicit and formal
    specification of a conceptualization

Typical Components of Ontologies
  • Terms denote important concepts (classes of
    objects) of the domain
  • e.g. professors, staff, students, courses,
  • Relationships between these terms typically
    class hierarchies
  • a class C to be a subclass of another class C' if
    every object in C is also included in C'
  • e.g. all professors are staff members

Further Components of Ontologies
  • Properties
  • e.g. X teaches Y
  • Value restrictions
  • e.g. only faculty members can teach courses
  • Disjointness statements
  • e.g. faculty and general staff are disjoint
  • Logical relationships between objects
  • e.g. every department must include at least 10

Example of a Class Hierarchy

The Role of Ontologies on the Web
  • Ontologies provide a shared understanding of a
    domain semantic interoperability
  • overcome differences in terminology
  • mappings between ontologies
  • Ontologies are useful for the organization and
    navigation of Web sites

The Role of Ontologies in Web Search
  • Ontologies are useful for improving the accuracy
    of Web searches
  • search engines can look for pages that refer to a
    precise concept in an ontology
  • Web searches can exploit generalization/
    specialization information
  • If a query fails to find any relevant documents,
    the search engine may suggest to the user a more
    general query.
  • If too many answers are retrieved, the search
    engine may suggest to the user some

Web Ontology Languages
  • RDF Schema
  • RDF is a data model for objects and relations
    between them
  • RDF Schema is a vocabulary description language
  • Describes properties and classes of RDF resources
  • Provides semantics for generalization hierarchies
    of properties and classes

Web Ontology Languages (2)
  • OWL
  • A richer ontology language
  • relations between classes
  • e.g., disjointness
  • cardinality
  • e.g. exactly one
  • richer typing of properties
  • characteristics of properties (e.g., symmetry)

Logic and Inference
  • Logic is the discipline that studies the
    principles of reasoning
  • Formal languages for expressing knowledge
  • Well-understood formal semantics
  • Declarative knowledge we describe what holds
    without caring about how it can be deduced
  • Automated reasoners can deduce (infer)
    conclusions from the given knowledge

An Inference Example
  • prof(X) ? faculty(X)
  • faculty(X) ? staff(X)
  • prof(michael)
  • We can deduce the following conclusions
  • faculty(michael)
  • staff(michael)
  • prof(X) ? staff(X)

Logic versus Ontologies
  • The previous example involves knowledge typically
    found in ontologies
  • Logic can be used to uncover ontological
    knowledge that is implicitly given
  • It can also help uncover unexpected relationships
    and inconsistencies
  • Logic is more general than ontologies
  • It can also be used by intelligent agents for
    making decisions and selecting courses of action

Tradeoff between Expressive Power and
Computational Complexity
  • The more expressive a logic is, the more
    computationally expensive it becomes to draw
  • Drawing certain conclusions may become impossible
    if non-computability barriers are encountered.
  • Our previous examples involved rules If
    conditions, then conclusion, and only finitely
    many objects
  • This subset of logic is tractable and is
    supported by efficient reasoning tools

Inference and Explanations
  • Explanations the series of inference steps can
    be retraced
  • They increase users confidence in Semantic Web
    agents Oh yeah? button
  • Activities between agents create or validate

Typical Explanation Procedure
  • Facts will typically be traced to some Web
  • The trust of the Web address will be verifiable
    by agents
  • Rules may be a part of a shared commerce ontology
    or the policy of the online shop

Software Agents
  • Software agents work autonomously and proactively
  • They evolved out of object oriented and
    compontent-based programming
  • A personal agent on the Semantic Web will
  • receive some tasks and preferences from the
  • seek information from Web sources, communicate
    with other agents
  • compare information about user requirements and
    preferences, make certain choices
  • give answers to the user

Intelligent Personal Agents
Semantic Web Agent Technologies
  • Metadata
  • Identify and extract information from Web sources
  • Ontologies
  • Web searches, interpret retrieved information
  • Communicate with other agents
  • Logic
  • Process retrieved information, draw conclusions

Semantic Web Agent Technologies (2)
  • Further technologies (orthogonal to the Semantic
    Web technologies)
  • Agent communication languages
  • Formal representation of beliefs, desires, and
    intentions of agents
  • Creation and maintenance of user models.

Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

A Layered Approach
  • The development of the Semantic Web proceeds in
  • Each step building a layer on top of another
  • Principles
  • Downward compatibility
  • Upward partial understanding

The Semantic Web Layer Tower
Semantic Web Layers
  • XML layer
  • Syntactic basis
  • RDF layer
  • RDF basic data model for facts
  • RDF Schema simple ontology language
  • Ontology layer
  • More expressive languages than RDF Schema
  • Current Web standard OWL

Semantic Web Layers (2)
  • Logic layer
  • enhance ontology languages further
  • application-specific declarative knowledge
  • Proof layer
  • Proof generation, exchange, validation
  • Trust layer
  • Digital signatures
  • recommendations, rating agencies .

Book Outline
  1. Structured Web Documents in XML
  2. Describing Web Resources in RDF
  3. Web Ontology Language OWL
  4. Logic and Inference Rules
  5. Applications
  6. Ontology Engineering
  7. Conclusion and Outlook

Chapter 2Structured Web Documents in XML
  • Grigoris Antoniou
  • Frank van Harmelen

An HTML Example
  • lth2gtNonmonotonic Reasoning Context-
  • Dependent Reasoninglt/h2gt
  • ltigtby ltbgtV. Mareklt/bgt and
  • ltbgtM. Truszczynskilt/bgtlt/igtltbrgt
  • Springer 1993ltbrgt
  • ISBN 0387976892

The Same Example in XML
  • ltbookgt
  • lttitlegtNonmonotonic Reasoning
    Context- Dependent Reasoninglt/titlegt
  • ltauthorgtV. Mareklt/authorgt
  • ltauthorgtM. Truszczynskilt/authorgt
  • ltpublishergtSpringerlt/publishergt
  • ltyeargt1993lt/yeargt
  • ltISBNgt0387976892lt/ISBNgt
  • lt/bookgt

HTML versus XML Similarities
  • Both use tags (e.g. lth2gt and lt/yeargt)
  • Tags may be nested (tags within tags)
  • Human users can read and interpret both HTML and
    XML representations quite easily
  • But how about machines?

Problems with Automated Interpretation of HTML
  • An intelligent agent trying to retrieve the names
  • of the authors of the book
  • Authors names could appear immediately after the
  • or immediately after the word by
  • Are there two authors?
  • Or just one, called V. Marek and M.

HTML vs XML Structural Information
  • HTML documents do not contain structural
    information pieces of the document and their
  • XML more easily accessible to machines because
  • Every piece of information is described.
  • Relations are also defined through the nesting
  • E.g., the ltauthorgt tags appear within the ltbookgt
    tags, so they describe properties of the
    particular book.

HTML vs XML Structural Information (2)
  • A machine processing the XML document would be
    able to deduce that
  • the author element refers to the enclosing book
  • rather than by proximity considerations
  • XML allows the definition of constraints on
  • E.g. a year must be a number of four digits

HTML vs XML Formatting
  • The HTML representation provides more than the
    XML representation
  • The formatting of the document is also described
  • ?he main use of an HTML document is to display
    information it must define formatting
  • XML separation of content from display
  • same information can be displayed in different

HTML vs XML Another Example
  • In HTML
  • lth2gtRelationship matter-energylt/h2gt
  • ltigt E M c2 lt/igt
  • In XML
  • ltequationgt
  • ltmeaninggtRelationship matter
  • energylt/meaninggt
  • ltleftsidegt E lt/leftsidegt
  • ltrightsidegt M c2 lt/rightsidegt
  • lt/equationgt

HTML vs XML Different Use of Tags
  • In both HTML docs same tags
  • In XML completely different
  • HTML tags define display color, lists
  • XML tags not fixed user definable tags
  • XML meta markup language language for defining
    markup languages

XML Vocabularies
  • Web applications must agree on common
    vocabularies to communicate and collaborate
  • Communities and business sectors are defining
    their specialized vocabularies
  • mathematics (MathML)
  • bioinformatics (BSML)
  • human resources (HRML)

Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

The XML Language
  • An XML document consists of
  • a prolog
  • a number of elements
  • an optional epilog (not discussed)

Prolog of an XML Document
  • The prolog consists of
  • an XML declaration and
  • an optional reference to external structuring
  • lt?xml version"1.0" encoding"UTF-16"?gt
  • lt!DOCTYPE book SYSTEM "book.dtd"gt

XML Elements
  • The things the XML document talks about
  • E.g. books, authors, publishers
  • An element consists of
  • an opening tag
  • the content
  • a closing tag
  • ltlecturergtDavid Billingtonlt/lecturergt

XML Elements (2)
  • Tag names can be chosen almost freely.
  • The first character must be a letter, an
    underscore, or a colon
  • No name may begin with the string xml in any
    combination of cases
  • E.g. Xml, xML

Content of XML Elements
  • Content may be text, or other elements, or
  • ltlecturergt
  • ltnamegtDavid Billingtonlt/namegt
  • ltphonegt 61 - 7 - 3875 507 lt/phonegt
  • lt/lecturergt
  • If there is no content, then the element is
    called empty it is abbreviated as follows
  • ltlecturer/gt for ltlecturergtlt/lecturergt

XML Attributes
  • An empty element is not necessarily meaningless
  • It may have some properties in terms of
  • An attribute is a name-value pair inside the
    opening tag of an element
  • ltlecturer name"David Billington" phone"61 - 7
    - 3875 507"/gt

XML Attributes An Example
  • ltorder orderNo"23456" customer"John Smith"
  • date"October 15, 2002"gt
  • ltitem itemNo"a528" quantity"1"/gt
  • ltitem itemNo"c817" quantity"3"/gt
  • lt/ordergt

The Same Example without Attributes
  • ltordergt
  • ltorderNogt23456lt/orderNogt
  • ltcustomergtJohn Smithlt/customergt
  • ltdategtOctober 15, 2002lt/dategt
  • ltitemgt
  • ltitemNogta528lt/itemNogt
  • ltquantitygt1lt/quantitygt
  • lt/itemgt
  • ltitemgt
  • ltitemNogtc817lt/itemNogt
  • ltquantitygt3lt/quantitygt
  • lt/itemgt
  • lt/ordergt

XML Elements vs Attributes
  • Attributes can be replaced by elements
  • When to use elements and when attributes is a
    matter of taste
  • But attributes cannot be nested

Further Components of XML Docs
  • Comments
  • A piece of text that is to be ignored by parser
  • lt!-- This is a comment --gt
  • Processing Instructions (PIs)
  • Define procedural attachments
  • lt?stylesheet type"text/css" href"mystyle.css"?gt

Well-Formed XML Documents
  • Syntactically correct documents
  • Some syntactic rules
  • Only one outermost element (called root element)
  • Each element contains an opening and a
    corresponding closing tag
  • Tags may not overlap
  • ltauthorgtltnamegtLee Honglt/authorgtlt/namegt
  • Attributes within an element have unique names
  • Element and tag names must be permissible

The Tree Model of XML Documents An Example
  • ltemailgt
  • ltheadgt
  • ltfrom name"Michael Maher"
  • address""/gt
  • ltto name"Grigoris Antoniou"
  • address""/gt
  • ltsubjectgtWhere is your draft?lt/subjectgt
  • lt/headgt
  • ltbodygt
  • Grigoris, where is the draft of the paper you
    promised me
  • last week?
  • lt/bodygt
  • lt/emailgt

The Tree Model of XML Documents An Example (2)
The Tree Model of XML Docs
  • The tree representation of an XML document is an
    ordered labeled tree
  • There is exactly one root
  • There are no cycles
  • Each non-root node has exactly one parent
  • Each node has a label.
  • The order of elements is important
  • but the order of attributes is not important

Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

Structuring XML Documents
  • Define all the element and attribute names that
    may be used
  • Define the structure
  • what values an attribute may take
  • which elements may or must occur within other
    elements, etc.
  • If such structuring information exists, the
    document can be validated

Structuring XML Dcuments (2)
  • An XML document is valid if
  • it is well-formed
  • respects the structuring information it uses
  • There are two ways of defining the structure of
    XML documents
  • DTDs (the older and more restricted way)
  • XML Schema (offers extended possibilities)

DTD Element Type Definition
  • ltlecturergt
  • ltnamegtDavid Billingtonlt/namegt
  • ltphonegt 61 - 7 - 3875 507 lt/phonegt
  • lt/lecturergt
  • DTD for above element (and all lecturer
  • lt!ELEMENT lecturer (name,phone)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT phone (PCDATA)gt

The Meaning of the DTD
  • The element types lecturer, name, and phone may
    be used in the document
  • A lecturer element contains a name element and a
    phone element, in that order (sequence)
  • A name element and a phone element may have any
  • In DTDs, PCDATA is the only atomic type for

DTD Disjunction in Element Type Definitions
  • We express that a lecturer element contains
    either a name element or a phone element as
  • lt!ELEMENT lecturer (namephone)gt
  • A lecturer element contains a name element and a
    phone element in any order.
  • lt!ELEMENT lecturer((name,phone)(phone,name))gt

Example of an XML Element
  • ltorder orderNo"23456"
  • customer"John Smith"
  • date"October 15, 2002"gt
  • ltitem itemNo"a528" quantity"1"/gt
  • ltitem itemNo"c817" quantity"3"/gt
  • lt/ordergt

The Corresponding DTD
  • lt!ELEMENT order (item)gt
  • lt!ATTLIST order orderNo ID REQUIRED
  • customer CDATA REQUIRED
  • lt!ELEMENT item EMPTYgt
  • lt!ATTLIST item itemNo ID REQUIRED
  • quantity CDATA REQUIRED
  • comments CDATA IMPLIEDgt

Comments on the DTD
  • The item element type is defined to be empty
  • (after item) is a cardinality operator
  • ? appears zero times or once
  • appears zero or more times
  • appears one or more times
  • No cardinality operator means exactly once

Comments on the DTD (2)
  • In addition to defining elements, we define
  • This is done in an attribute list containing
  • Name of the element type to which the list
  • A list of triplets of attribute name, attribute
    type, and value type
  • Attribute name A name that may be used in an XML
    document using a DTD

DTD Attribute Types
  • Similar to predefined data types, but limited
  • The most important types are
  • CDATA, a string (sequence of characters)
  • ID, a name that is unique across the entire XML
  • IDREF, a reference to another element with an ID
    attribute carrying the same value as the IDREF
  • IDREFS, a series of IDREFs
  • (v1 . . . vn), an enumeration of all possible
  • Limitations no dates, number ranges etc.

DTD Attribute Value Types
  • Attribute must appear in every occurrence of the
    element type in the XML document
  • The appearance of the attribute is optional
  • FIXED "value"
  • Every element must have this attribute
  • "value"
  • This specifies the default value for the

Referencing with IDREF and IDREFS
  • lt!ELEMENT family (person)gt
  • lt!ELEMENT person (name)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ATTLIST person id ID REQUIRED
  • mother IDREF IMPLIED
  • father IDREF IMPLIED
  • children IDREFS IMPLIEDgt

An XML Document Respecting the DTD
  • ltfamilygt
  • ltperson id"bob" mother"mary" father"peter"gt
  • ltnamegtBob Marleylt/namegt
  • lt/persongt
  • ltperson id"bridget" mother"mary"gt
  • ltnamegtBridget Joneslt/namegt
  • lt/persongt
  • ltperson id"mary" children"bob bridget"gt
  • ltnamegtMary Poppinslt/namegt
  • lt/persongt
  • ltperson id"peter" children"bob"gt
  • ltnamegtPeter Marleylt/namegt
  • lt/persongt
  • lt/familygt

A DTD for an Email Element
  • lt!ELEMENT email (head,body)gt
  • lt!ELEMENT head (from,to,cc,subject)gt
  • lt!ELEMENT from EMPTYgt
  • lt!ATTLIST from name CDATA IMPLIED
  • address CDATA REQUIREDgt
  • lt!ELEMENT to EMPTYgt
  • address CDATA REQUIREDgt

A DTD for an Email Element (2)
  • lt!ELEMENT cc EMPTYgt
  • address CDATA REQUIREDgt
  • lt!ELEMENT subject (PCDATA)gt
  • lt!ELEMENT body (text,attachment)gt
  • lt!ELEMENT text (PCDATA)gt
  • lt!ELEMENT attachment EMPTYgt
  • lt!ATTLIST attachment
  • encoding (mimebinhex) "mime"

Interesting Parts of the DTD
  • A head element contains (in that order)
  • a from element
  • at least one to element
  • zero or more cc elements
  • a subject element
  • In from, to, and cc elements
  • the name attribute is not required
  • the address attribute is always required

Interesting Parts of the DTD (2)
  • A body element contains
  • a text element
  • possibly followed by a number of attachment
  • The encoding attribute of an attachment element
    must have either the value mime or binhex
  • mime is the default value

Remarks on DTDs
  • A DTD can be interpreted as an Extended
    Backus-Naur Form (EBNF)
  • lt!ELEMENT email (head,body)gt
  • is equivalent to email head body
  • Recursive definitions possible in DTDs
  • lt!ELEMENT bintree
  • ((bintree root bintree)emptytree)gt

Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

XML Schema
  • Significantly richer language for defining the
    structure of XML documents
  • Tts syntax is based on XML itself
  • not necessary to write separate tools
  • Reuse and refinement of schemas
  • Expand or delete already existent schemas
  • Sophisticated set of data types, compared to DTDs
    (which only supports strings)

XML Schema (2)
  • An XML schema is an element with an opening tag
  • ltschema "http//"
  • version"1.0"gt
  • Structure of schema elements
  • Element and attribute types using data types

Element Types
  • ltelement name"email"/gt
  • ltelement name"head" minOccurs"1"
  • ltelement name"to" minOccurs"1"/gt
  • Cardinality constraints
  • minOccurs"x" (default value 1)
  • maxOccurs"x" (default value 1)
  • Generalizations of ,?, offered by DTDs

Attribute Types
  • ltattribute name"id" type"ID use"required"/gt
  • lt attribute name"speaks" type"Language"
  • use"default" value"en"/gt
  • Existence use"x", where x may be optional or
  • Default value use"x" value"...", where x may
    be default or fixed

Data Types
  • There is a variety of built-in data types
  • Numerical data types integer, Short etc.
  • String types string, ID, IDREF, CDATA etc.
  • Date and time data types time, Month etc.
  • There are also user-defined data types
  • simple data types, which cannot use elements or
  • complex data types, which can use these

Data Types (2)
  • Complex data types are defined from already
    existing data types by defining some attributes
    (if any) and using
  • sequence, a sequence of existing data type
    elements (order is important)
  • all, a collection of elements that must appear
    (order is not important)
  • choice, a collection of elements, of which one
    will be chosen

A Data Type Example
  • ltcomplexType name"lecturerType"gt
  • ltsequencegt
  • ltelement name"firstname" type"string"
  • minOccurs"0 maxOccurs"unbounded"/gt
  • ltelement name"lastname" type"string"/gt
  • lt/sequencegt
  • ltattribute name"title" type"string"
  • lt/complexTypegt

Data Type Extension
  • Already existing data types can be extended by
    new elements or attributes. Example
  • ltcomplexType name"extendedLecturerType"gt
  • ltextension base"lecturerType"gt
  • ltsequencegt
  • ltelement name"email" type"string"
  • minOccurs"0" maxOccurs"1"/gt
  • lt/sequencegt
  • ltattribute name"rank" type"string"
  • lt/extensiongt
  • lt/complexTypegt

Resulting Data Type
  • ltcomplexType name"extendedLecturerType"gt
  • ltsequencegt
  • ltelement name"firstname" type"string"
  • minOccurs"0" maxOccurs"unbounded"/gt
  • ltelement name"lastname" type"string"/gt
  • ltelement name"email" type"string"
  • minOccurs"0" maxOccurs"1"/gt
  • lt/sequencegt
  • ltattribute name"title" type"string"
  • ltattribute name"rank" type"string"
  • lt/complexTypegt

Data Type Extension (2)
  • A hierarchical relationship exists between the
    original and the extended type
  • Instances of the extended type are also instances
    of the original type
  • They may contain additional information, but
    neither less information, nor information of the
    wrong type

Data Type Restriction
  • An existing data type may be restricted by adding
    constraints on certain values
  • Restriction is not the opposite from extension
  • Restriction is not achieved by deleting elements
    or attributes
  • The following hierarchical relationship still
  • Instances of the restricted type are also
    instances of the original type
  • They satisfy at least the constraints of the
    original type

Example of Data Type Restriction
  • ltcomplexType name"restrictedLecturerType"gt
  • ltrestriction base"lecturerType"gt
  • ltsequencegt
  • ltelement name"firstname" type"string"
  • minOccurs"1" maxOccurs"2"/gt
  • lt/sequencegt
  • ltattribute name"title" type"string"
  • use"required"/gt
  • lt/restrictiongt
  • lt/complexTypegt

Restriction of Simple Data Types
  • ltsimpleType name"dayOfMonth"gt
  • ltrestriction base"integer"gt
  • ltminInclusive value"1"/gt
  • ltmaxInclusive value"31"/gt
  • lt/restrictiongt
  • lt/simpleTypegt

Data Type Restriction Enumeration
  • ltsimpleType name"dayOfWeek"gt
  • ltrestriction base"string"gt
  • ltenumeration value"Mon"/gt
  • ltenumeration value"Tue"/gt
  • ltenumeration value"Wed"/gt
  • ltenumeration value"Thu"/gt
  • ltenumeration value"Fri"/gt
  • ltenumeration value"Sat"/gt
  • ltenumeration value"Sun"/gt
  • lt/restrictiongt
  • lt/simpleTypegt

XML Schema The Email Example
  • ltelement name"email" type"emailType"/gt
  • ltcomplexType name"emailType"gt
  • ltsequencegt
  • ltelement name"head" type"headType"/gt
  • ltelement name"body" type"bodyType"/gt
  • lt/sequencegt
  • lt/complexTypegt

XML Schema The Email Example (2)
  • ltcomplexType name"headType"gt
  • ltsequencegt
  • ltelement name"from" type"nameAddress"/gt
  • ltelement name"to" type"nameAddress"
  • minOccurs"1" maxOccurs"unbounded"/gt
  • ltelement name"cc" type"nameAddress"
  • minOccurs"0" maxOccurs"unbounded"/gt
  • ltelement name"subject" type"string"/gt
  • lt/sequencegt
  • lt/complexTypegt

XML Schema The Email Example (3)
  • ltcomplexType name"nameAddress"gt
  • ltattribute name"name" type"string"
  • ltattribute name"address" type"string"
  • lt/complexTypegt
  • Similar for bodyType

Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

  • An XML document may use more than one DTD or
  • Since each structuring document was developed
    independently, name clashes may appear
  • The solution is to use a different prefix for
    each DTD or schema
  • prefixname

An Example
  • ltvuinstructors xmlnsvu"http//
  • xmlnsgu"http//"
  • xmlnsuky"http//"gt
  • ltukyfaculty ukytitle"assistant professor"
  • ukyname"John Smith"
  • ukydepartment"Computer Science"/gt
  • ltguacademicStaff gutitle"lecturer"
  • guname"Mate Jones"
  • guschool"Information Technology"/gt
  • lt/vuinstructorsgt

Namespace Declarations
  • Namespaces are declared within an element and can
    be used in that element and any of its children
    (elements and attributes)
  • A namespace declaration has the form
  • xmlnsprefix"location"
  • location is the address of the DTD or schema
  • If a prefix is not specified xmlns"location"
    then the location is used by default

Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

Addressing and Querying XML Documents
  • In relational databases, parts of a database can
    be selected and retrieved using SQL
  • Same necessary for XML documents
  • Query languages XQuery, XQL, XML-QL
  • The central concept of XML query languages is a
    path expression
  • Specifies how a node or a set of nodes, in the
    tree representation of the XML document can be

  • XPath is core for XML query languages
  • Language for addressing parts of an XML document.
  • It operates on the tree data model of XML
  • It has a non-XML syntax

Types of Path Expressions
  • Absolute (starting at the root of the tree)
  • Syntactically they begin with the symbol /
  • It refers to the root of the document (situated
    one level above the root element of the document)
  • Relative to a context node

An XML Example
  • ltlibrary location"Bremen"gt
  • ltauthor name"Henry Wise"gt
  • ltbook title"Artificial Intelligence"/gt
  • ltbook title"Modern Web Services"/gt
  • ltbook title"Theory of Computation"/gt
  • lt/authorgt
  • ltauthor name"William Smart"gt
  • ltbook title"Artificial Intelligence"/gt
  • lt/authorgt
  • ltauthor name"Cynthia Singleton"gt
  • ltbook title"The Semantic Web"/gt
  • ltbook title"Browser Technology Revised"/gt
  • lt/authorgt
  • lt/librarygt

Tree Representation
Examples of Path Expressions in XPath
  • Address all author elements
  • /library/author
  • Addresses all author elements that are children
    of the library element node, which resides
    immediately below the root
  • /t1/.../tn, where each ti1 is a child node of
    ti, is a path through the tree representation

Examples of Path Expressions in XPath (2)
  • Address all author elements
  • //author
  • Here // says that we should consider all elements
    in the document and check whether they are of
    type author
  • This path expression addresses all author
    elements anywhere in the document

Examples of Path Expressions in XPath (3)
  • Address the location attribute nodes within
    library element nodes
  • /library/_at_location
  • The symbol _at_ is used to denote attribute nodes

Examples of Path Expressions in XPath (4)
  • Address all title attribute nodes within book
    elements anywhere in the document, which have the
    value Artificial Intelligence
  • //book/_at_title"Artificial Intelligence"

Examples of Path Expressions in XPath (5)
  • Address all books with title Artificial
  • /book_at_title"Artificial Intelligence"
  • Test within square brackets a filter expression
  • It restricts the set of addressed nodes.
  • Difference with query 4.
  • Query 5 addresses book elements, the title of
    which satisfies a certain condition.
  • Query 4 collects title attribute nodes of book

Tree Representation of Query 4
Tree Representation of Query 5
Examples of Path Expressions in XPath (6)
  • Address the first author element node in the XML
  • //author1
  • Address the last book element within the first
    author element node in the document
  • //author1/booklast()
  • Address all book element nodes without a title
  • //booknot _at_title

General Form of Path Expressions
  • A path expression consists of a series of steps,
    separated by slashes
  • A step consists of
  • An axis specifier,
  • A node test, and
  • An optional predicate

General Form of Path Expressions (2)
  • An axis specifier determines the tree
    relationship between the nodes to be addressed
    and the context node
  • E.g. parent, ancestor, child (the default),
    sibling, attribute node
  • // is such an axis specifier descendant or self

General Form of Path Expressions (3)
  • A node test specifies which nodes to address
  • The most common node tests are element names
  • E.g., addresses all element nodes
  • comment() addresses all comment nodes

General Form of Path Expressions (4)
  • Predicates (or filter expressions) are optional
    and are used to refine the set of addressed nodes
  • E.g., the expression 1 selects the first node
  • position()last() selects the last node
  • position() mod 2 0 selects the even nodes
  • XPath has a more complicated full syntax.
  • We have only presented the abbreviated syntax

Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

Displaying XML Documents
  • ltauthorgt
  • ltnamegtGrigoris Antonioult/namegt
  • ltaffiliationgtUniversity of Bremenlt/affiliationgt
  • ltemailgtga_at_tzi.delt/emailgt
  • lt/authorgt
  • may be displayed in different ways
  • Grigoris Antoniou Grigoris Antoniou
  • University of Bremen University of Bremen

Style Sheets
  • Style sheets can be written in various languages
  • E.g. CSS2 (cascading style sheets level 2)
  • XSL (extensible stylesheet language)
  • XSL includes
  • a transformation language (XSLT)
  • a formatting language
  • Both are XML applications

XSL Transformations (XSLT)
  • XSLT specifies rules with which an input XML
    document is transformed to
  • another XML document
  • an HTML document
  • plain text
  • The output document may use the same DTD or
    schema, or a completely different vocabulary
  • XSLT can be used independently of the formatting

XSLT (2)
  • Move data and metadata from one XML
    representation to another
  • XSLT is chosen when applications that use
    different DTDs or schemas need to communicate
  • XSLT can be used for machine processing of
    content without any regard to displaying the
    information for people to read.
  • In the following we use XSLT only to display XML

XSLT Transformation into HTML
  • ltxsltemplate match"/author"gt
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgtltxslvalue-of select"name"/gtlt/bgtltbrgt
  • ltxslvalue-of select"affiliation"/gtltbrgt
  • ltigtltxslvalue-of select"email"/gtlt/igt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt

Style Sheet Output
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgtGrigoris Antonioult/bgtltbrgt
  • University of Bremenltbrgt
  • ltigtga_at_tzi.delt/igt
  • lt/bodygt
  • lt/htmlgt

Observations About XSLT
  • XSLT documents are XML documents
  • XSLT resides on top of XML
  • The XSLT document defines a template
  • In this case an HTML document, with some
    placeholders for content to be inserted
  • xslvalue-of retrieves the value of an element
    and copies it into the output document
  • It places some content into the template

A Template
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ...ltbrgt
  • lt/bodygt
  • lt/htmlgt

Auxiliary Templates
  • We have an XML document with details of several
  • It is a waste of effort to treat each author
    element separately
  • In such cases, a special template is defined for
    author elements, which is used by the main

Example of an Auxiliary Template
  • ltauthorsgt
  • ltauthorgt
  • ltnamegtGrigoris Antonioult/namegt
  • ltaffiliationgtUniversity of Bremenlt/affiliationgt
  • ltemailgtga_at_tzi.delt/emailgt
  • lt/authorgt
  • ltauthorgt
  • ltnamegtDavid Billingtonlt/namegt
  • ltaffiliationgtGriffith Universitylt/affiliationgt
  • lt/authorgt
  • lt/authorsgt

Example of an Auxiliary Template (2)
  • ltxsltemplate match"/"gt
  • lthtmlgt
  • ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltxslapply-templates select"authors"/gt
  • lt!-- Apply templates for AUTHORS children
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt

Example of an Auxiliary Template (3)
  • ltxsltemplate match"authors"gt
  • ltxslapply-templates select"author"/gt
  • lt/xsltemplategt
  • ltxsltemplate match"author"gt
  • lth2gtltxslvalue-of select"name"/gtlt/h2gt
  • Affiliationltxslvalue-of
  • select"affiliation"/gtltbrgt
  • Email ltxslvalue-of select"email"/gt
  • ltpgt
  • lt/xsltemplategt

Multiple Authors Output
  • lthtmlgt
  • ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • lth2gtGrigoris Antonioult/h2gt
  • Affiliation University of Bremenltbrgt
  • Email
  • ltpgt
  • lth2gtDavid Billingtonlt/h2gt
  • Affiliation Griffith Universityltbrgt
  • Email
  • ltpgt
  • lt/bodygt
  • lt/htmlgt

Explanation of the Example
  • xslapply-templates element causes all children
    of the context node to be matched against the
    selected path expression
  • E.g., if the current template applies to /, then
    the element xslapply-templates applies to the
    root element
  • I.e. the authors element (/ is located above the
    root element)
  • If the current context node is the authors
    element, then the element xslapply-templates
    select"author" causes the template for the
    author elements to be applied to all author
    children of the authors element

Explanation of the Example (2)
  • It is good practice to define a template for each
    element type in the document
  • Even if no specific processing is applied to
    certain elements, the xslapply-templates element
    should be used
  • E.g. authors
  • In this way, we work from the root to the leaves
    of the tree, and all templates are applied

Processing XML Attributes
  • Suppose we wish to transform to itself the
  • ltperson firstname"John" lastname"Woo"/gt
  • Wrong solution
  • ltxsltemplate match"person"gt
  • ltperson firstname"ltxslvalue-of
  • lastname"ltxslvalue-of select"_at_lastname"gt"/gt
  • lt/xsltemplategt

Processing XML Attributes (2)
  • Not well-formed because tags are not allowed
    within the values of attributes
  • We wish to add attribute values into template
  • ltxsltemplate match"person"gt
  • ltperson firstname"_at_firstname"
  • lastname"_at_lastname"/gt
  • lt/xsltemplategt

Transforming an XML Document to Another
Transforming an XML Document to Another (2)
  • ltxsltemplate match"/"gt
  • lt?xml version"1.0" encoding"UTF-16"?gt
  • ltauthorsgt
  • ltxslapply-templates select"authors"/gt
  • lt/authorsgt
  • lt/xsltemplategt
  • ltxsltemplate match"authors"gt
  • ltauthorgt
  • ltxslapply-templates select"author"/gt
  • lt/authorgt
  • lt/xsltemplategt

Transforming an XML Document to Another (3)
  • ltxsltemplate match"author"gt
  • ltnamegtltxslvalue-of select"name"/gtlt/namegt
  • ltcontactgt
  • ltinstitutiongt
  • ltxslvalue-of select"affiliation"/gt
  • lt/institutiongt
  • ltemailgtltxslvalue-of select"email"/gtlt/emailgt
  • lt/contactgt
  • lt/xsltemplategt

  • XML is a metalanguage that allows users to define
  • XML separates content and structure from
  • XML is the de facto standard for the
    representation and exchange of structured
    information on the Web
  • XML is supported by query languages

Points for Discussion in Subsequent Chapters
  • The nesting of tags does not have standard
  • The semantics of XML documents is not accessible
    to machines, only to people
  • Collaboration and exchange are supported if there
    is underlying shared understanding of the
  • XML is well-suited for close collaboration, where
    domain- or community-based vocabularies are used
  • It is not so well-suited for global communication.

Chapter 3Describing Web Resources in RDF
  • Grigoris Antoniou
  • Frank van Harmelen

Lecture Outline
  1. Basic Ideas of RDF
  2. XML-based Syntax of RDF
  3. Basic Concepts of RDF Schema
  4. ?he Language of RDF Schema
  5. The Namespaces of RDF and RDF Schema
  6. Axiomatic Semantics for RDF and RDFS
  7. Direct Semantics based on Inference Rules
  8. Querying of RDF/RDFS Documents using RQL

Drawbacks of XML
  • XML is a universal metalanguage for defining
  • It provides a uniform framework for interchange
    of data and metadata between applications
  • However, XML does not provide any means of
    talking about the semantics (meaning) of data
  • E.g., there is no intended meaning associated
    with the nesting of tags
  • It is up to each application to interpret the

Nesting of Tags in XML
  • David Billington is a lecturer of Discrete Maths
  • ltcourse name"Discrete Maths"gt
  • ltlecturergtDavid Billingtonlt/lecturergt
  • lt/coursegt
  • ltlecturer name"David Billington"gt
  • ltteachesgtDiscrete Mathslt/teachesgt
  • lt/lecturergt
  • Opposite nesting, same information!

Basic Ideas of RDF
  • Basic building block object-attribute-value
  • It is called a statement
  • Sentence about Billington is such a statement
  • RDF has been given a syntax in XML
  • This syntax inherits the benefits of XML
  • Other syntactic representations of RDF possible

Basic Ideas of RDF (2)
  • The fundamental concepts of RDF are
  • resources
  • properties
  • statements

  • We can think of a resource as an object, a
    thing we want to talk about
  • E.g. authors, books, publishers, places, people,
  • Every resource has a URI, a Universal Resource
  • A URI can be
  • a URL (Web address) or
  • some other kind of unique identifier

  • Properties are a special kind of resources
  • They describe relations between resources
  • E.g. written by, age, title, etc.
  • Properties are also identified by URIs
  • Advantages of using URIs
  • ? global, worldwide, unique naming scheme
  • Reduces the homonym problem of distributed data

  • Statements assert the properties of resources
  • A statement is an object-attribute-value triple
  • It consists of a resource, a property, and a
  • Values can be resources or literals
  • Literals are atomic values (strings)

Three Views of a Statement
  • A triple
  • A piece of a graph
  • A piece of XML code
  • Thus an RDF document can be viewed as
  • A set of triples
  • A graph (semantic net)
  • An XML document

Statements as Triples
  • (David Billington,
  • http//,
  • http//
  • The triple (x,P,y) can be considered as a logical
    formula P(x,y)
  • Binary predicate P relates object x to object y
  • RDF offers only binary predicates (properties)

XML Vocabularies
  • A directed graph with labeled nodes and arcs
  • from the resource (the subject of the statement)
  • to the value (the object of the statement)
  • Known in AI as a semantic net
  • The value of a statement may be a resource
  • ?t may be linked to other resources

A Set of Triples as a Semantic Net
Statements in XML Syntax
  • Graphs are a powerful tool for human
    understanding but
  • The Semantic Web vision requires
    machine-accessible and machine-processable
  • There is a 3rd representation based on XML
  • But XML is not a part of the RDF data model
  • E.g. serialisation of XML is irrelevant for RDF

Statements in XML (2)
  • ltrdfRDF
  • xmlnsrdf"http//
  • xmlnsmydomain"http//
  • ltrdfDescription
  • rdfabout"http//"gt
  • ltmydomainsite-ownergt
  • David Billington
  • lt/mydomainsite-ownergt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

Statements in XML (3)
  • An RDF document is represented by an XML element
    with the tag rdfRDF
  • The content of this element is a number of
    descriptions, which use rdfDescription tags.
  • E