Chapter 1 The Semantic Web Vision - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Chapter 1 The Semantic Web Vision

Description:

Chapter 1 The Semantic Web Vision Grigoris Antoniou Frank van Harmelen Chapter 1 A Semantic Web Primer * * Chapter 5 A Semantic Web Primer * Brokered Trade Brokered ... – PowerPoint PPT presentation

Number of Views:279
Avg rating:3.0/5.0
Slides: 164
Provided by: ICS112
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Chapter 1 The Semantic Web Vision


1
Chapter 1The Semantic Web Vision
  • Grigoris Antoniou
  • Frank van Harmelen

2
Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

3
Todays Web
  • Most of todays Web content is suitable for human
    consumption
  • Even Web content that is generated automatically
    from databases is usually presented without the
    original structural information found in
    databases
  • Typical Web uses today peoples
  • seeking and making use of information, searching
    for and getting in touch with other people,
    reviewing catalogs of online stores and ordering
    products by filling out forms

4
Keyword-Based Search Engines
  • Current Web activities are not particularly well
    supported by software tools
  • Except for keyword-based search engines (e.g.
    Google, AltaVista, Yahoo)
  • The Web would not have been the huge success it
    was, were it not for search engines

5
Problems of Keyword-Based Search Engines
  • High recall, low precision.
  • Low or no recall
  • Results are highly sensitive to vocabulary
  • Results are single Web pages
  • Human involvement is necessary to interpret and
    combine results
  • Results of Web searches are not readily
    accessible by other software tools

6
The Key Problem of Todays Web
  • The meaning of Web content is not
    machine-accessible lack of semantics
  • It is simply difficult to distinguish the meaning
    between these two sentences
  • I am a professor of computer science.
  • I am a professor of computer science,
  • you may think. Well, . . .

7
The Semantic Web Approach
  • Represent Web content in a form that is more
    easily machine-processable.
  • Use intelligent techniques to take advantage of
    these representations.
  • The Semantic Web will gradually evolve out of the
    existing Web, it is not a competition to the
    current WWW

8
Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

9
The Semantic Web Impact Knowledge Management
  • Knowledge management concerns itself with
    acquiring, accessing, and maintaining knowledge
    within an organization
  • Key activity of large businesses internal
    knowledge as an intellectual asset
  • It is particularly important for international,
    geographically dispersed organizations
  • Most information is currently available in a
    weakly structured form (e.g. text, audio, video)

10
Limitations of Current Knowledge Management
Technologies
  • Searching information
  • Keyword-based search engines
  • Extracting information
  • human involvement necessary for browsing,
    retrieving, interpreting, combining
  • Maintaining information
  • inconsistencies in terminology, outdated
    information.
  • Viewing information
  • Impossible to define views on Web knowledge

11
Semantic Web Enabled Knowledge Management
  • Knowledge will be organized in conceptual spaces
    according to its meaning.
  • Automated tools for maintenance and knowledge
    discovery
  • Semantic query answering
  • Query answering over several documents
  • Defining who may view certain parts of
    information (even parts of documents) will be
    possible.

12
The Semantic Web Impact B2C Electronic
Commmerce
  • A typical scenario user visits one or several
    online shops, browses their offers, selects and
    orders products.
  • Ideally humans would visit all, or all major
    online stores but too time consuming
  • Shopbots are a useful tool

13
Limitations of Shopbots
  • They rely on wrappers extensive programming
    required
  • Wrappers need to be reprogrammed when an online
    store changes its outfit
  • Wrappers extract information based on textual
    analysis
  • Error-prone
  • Limited information extracted

14
Semantic Web Enabled B2C Electronic Commerce
  • Software agents that can interpret the product
    information and the terms of service.
  • Pricing and product information, delivery and
    privacy policies will be interpreted and compared
    to the user requirements.
  • Information about the reputation of shops
  • Sophisticated shopping agents will be able to
    conduct automated negotiations

15
The Semantic Web Impact B2B Electronic Commerce
  • Greatest economic promise
  • Currently relies mostly on EDI
  • Isolated technology, understood only by experts
  • Difficult to program and maintain, error-prone
  • Each B2B communication requires separate
    programming
  • Web appears to be perfect infrastructure
  • But B2B not well supported by Web standards

16
Semantic Web Enabled B2B Electronic Commerce
  • Businesses enter partnerships without much
    overhead
  • Differences in terminology will be resolved using
    standard abstract domain models
  • Data will be interchanged using translation
    services.
  • Auctioning, negotiations, and drafting contracts
    will be carried out automatically (or
    semi-automatically) by software agents

17
Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

18
Semantic Web Technologies
  • Explicit Metadata
  • Ontologies
  • Logic and Inference
  • Agents

19
On HTML
  • Web content is currently formatted for human
    readers rather than programs
  • HTML is the predominant language in which Web
    pages are written (directly or using tools)
  • Vocabulary describes presentation

20
An HTML Example
  • lth1gtAgilitas Physiotherapy Centrelt/h1gt
  • Welcome to the home page of the Agilitas
    Physiotherapy Centre. Do
  • you feel pain? Have you had an injury? Let our
    staff Lisa Davenport,
  • Kelly Townsend (our lovely secretary) and Steve
    Matthews take care
  • of your body and soul.
  • lth2gtConsultation hourslt/h2gt
  • Mon 11am - 7pmltbrgt
  • Tue 11am - 7pmltbrgt
  • Wed 3pm - 7pmltbrgt
  • Thu 11am - 7pmltbrgt
  • Fri 11am - 3pmltpgt
  • But note that we do not offer consultation during
    the weeks of the
  • lta href". . ."gtState Of Originlt/agt games.

21
Problems with HTML
  • Humans have no problem with this
  • Machines (software agents) do
  • How distinguish therapists from the secretary,
  • How determine exact consultation hours
  • They would have to follow the link to the State
    Of Origin games to find when they take place.

22
A Better Representation
  • ltcompanygt
  • lttreatmentOfferedgtPhysiotherapylt/treatmentOffered
    gt
  • ltcompanyNamegtAgilitas Physiotherapy
    Centrelt/companyNamegt
  • ltstaffgt
  • lttherapistgtLisa Davenportlt/therapistgt
  • lttherapistgtSteve Matthewslt/therapistgt
  • ltsecretarygtKelly Townsendlt/secretarygt
  • lt/staffgt
  • lt/companygt

23
Explicit Metadata
  • This representation is far more easily
    processable by machines
  • Metadata data about data
  • Metadata capture part of the meaning of data
  • Semantic Web does not rely on text-based
    manipulation, but rather on machine-processable
    metadata

24
Ontologies
  • The term ontology originates from philosophy
  • The study of the nature of existence
  • Different meaning from computer science
  • An ontology is an explicit and formal
    specification of a conceptualization

25
Typical Components of Ontologies
  • Terms denote important concepts (classes of
    objects) of the domain
  • e.g. professors, staff, students, courses,
    departments
  • Relationships between these terms typically
    class hierarchies
  • a class C to be a subclass of another class C' if
    every object in C is also included in C'
  • e.g. all professors are staff members

26
Further Components of Ontologies
  • Properties
  • e.g. X teaches Y
  • Value restrictions
  • e.g. only faculty members can teach courses
  • Disjointness statements
  • e.g. faculty and general staff are disjoint
  • Logical relationships between objects
  • e.g. every department must include at least 10
    faculty

27
Example of a Class Hierarchy

28
The Role of Ontologies on the Web
  • Ontologies provide a shared understanding of a
    domain semantic interoperability
  • overcome differences in terminology
  • mappings between ontologies
  • Ontologies are useful for the organization and
    navigation of Web sites

29
The Role of Ontologies in Web Search
  • Ontologies are useful for improving the accuracy
    of Web searches
  • search engines can look for pages that refer to a
    precise concept in an ontology
  • Web searches can exploit generalization/
    specialization information
  • If a query fails to find any relevant documents,
    the search engine may suggest to the user a more
    general query.
  • If too many answers are retrieved, the search
    engine may suggest to the user some
    specializations.

30
Web Ontology Languages
  • RDF Schema
  • RDF is a data model for objects and relations
    between them
  • RDF Schema is a vocabulary description language
  • Describes properties and classes of RDF resources
  • Provides semantics for generalization hierarchies
    of properties and classes

31
Web Ontology Languages (2)
  • OWL
  • A richer ontology language
  • relations between classes
  • e.g., disjointness
  • cardinality
  • e.g. exactly one
  • richer typing of properties
  • characteristics of properties (e.g., symmetry)

32
Logic and Inference
  • Logic is the discipline that studies the
    principles of reasoning
  • Formal languages for expressing knowledge
  • Well-understood formal semantics
  • Declarative knowledge we describe what holds
    without caring about how it can be deduced
  • Automated reasoners can deduce (infer)
    conclusions from the given knowledge

33
An Inference Example
  • prof(X) ? faculty(X)
  • faculty(X) ? staff(X)
  • prof(michael)
  • We can deduce the following conclusions
  • faculty(michael)
  • staff(michael)
  • prof(X) ? staff(X)

34
Logic versus Ontologies
  • The previous example involves knowledge typically
    found in ontologies
  • Logic can be used to uncover ontological
    knowledge that is implicitly given
  • It can also help uncover unexpected relationships
    and inconsistencies
  • Logic is more general than ontologies
  • It can also be used by intelligent agents for
    making decisions and selecting courses of action

35
Tradeoff between Expressive Power and
Computational Complexity
  • The more expressive a logic is, the more
    computationally expensive it becomes to draw
    conclusions
  • Drawing certain conclusions may become impossible
    if non-computability barriers are encountered.
  • Our previous examples involved rules If
    conditions, then conclusion, and only finitely
    many objects
  • This subset of logic is tractable and is
    supported by efficient reasoning tools

36
Inference and Explanations
  • Explanations the series of inference steps can
    be retraced
  • They increase users confidence in Semantic Web
    agents Oh yeah? button
  • Activities between agents create or validate
    proofs

37
Typical Explanation Procedure
  • Facts will typically be traced to some Web
    addresses
  • The trust of the Web address will be verifiable
    by agents
  • Rules may be a part of a shared commerce ontology
    or the policy of the online shop

38
Software Agents
  • Software agents work autonomously and proactively
  • They evolved out of object oriented and
    compontent-based programming
  • A personal agent on the Semantic Web will
  • receive some tasks and preferences from the
    person
  • seek information from Web sources, communicate
    with other agents
  • compare information about user requirements and
    preferences, make certain choices
  • give answers to the user

39
Intelligent Personal Agents
40
Semantic Web Agent Technologies
  • Metadata
  • Identify and extract information from Web sources
  • Ontologies
  • Web searches, interpret retrieved information
  • Communicate with other agents
  • Logic
  • Process retrieved information, draw conclusions

41
Semantic Web Agent Technologies (2)
  • Further technologies (orthogonal to the Semantic
    Web technologies)
  • Agent communication languages
  • Formal representation of beliefs, desires, and
    intentions of agents
  • Creation and maintenance of user models.

42
Lecture Outline
  1. Todays Web
  2. The Semantic Web Impact
  3. Semantic Web Technologies
  4. A Layered Approach

43
A Layered Approach
  • The development of the Semantic Web proceeds in
    steps
  • Each step building a layer on top of another
  • Principles
  • Downward compatibility
  • Upward partial understanding

44
The Semantic Web Layer Tower
45
Semantic Web Layers
  • XML layer
  • Syntactic basis
  • RDF layer
  • RDF basic data model for facts
  • RDF Schema simple ontology language
  • Ontology layer
  • More expressive languages than RDF Schema
  • Current Web standard OWL

46
Semantic Web Layers (2)
  • Logic layer
  • enhance ontology languages further
  • application-specific declarative knowledge
  • Proof layer
  • Proof generation, exchange, validation
  • Trust layer
  • Digital signatures
  • recommendations, rating agencies .

47
Book Outline
  1. Structured Web Documents in XML
  2. Describing Web Resources in RDF
  3. Web Ontology Language OWL
  4. Logic and Inference Rules
  5. Applications
  6. Ontology Engineering
  7. Conclusion and Outlook

48
Chapter 2Structured Web Documents in XML
  • Grigoris Antoniou
  • Frank van Harmelen

49
An HTML Example
  • lth2gtNonmonotonic Reasoning Context-
  • Dependent Reasoninglt/h2gt
  • ltigtby ltbgtV. Mareklt/bgt and
  • ltbgtM. Truszczynskilt/bgtlt/igtltbrgt
  • Springer 1993ltbrgt
  • ISBN 0387976892

50
The Same Example in XML
  • ltbookgt
  • lttitlegtNonmonotonic Reasoning
    Context- Dependent Reasoninglt/titlegt
  • ltauthorgtV. Mareklt/authorgt
  • ltauthorgtM. Truszczynskilt/authorgt
  • ltpublishergtSpringerlt/publishergt
  • ltyeargt1993lt/yeargt
  • ltISBNgt0387976892lt/ISBNgt
  • lt/bookgt

51
HTML versus XML Similarities
  • Both use tags (e.g. lth2gt and lt/yeargt)
  • Tags may be nested (tags within tags)
  • Human users can read and interpret both HTML and
    XML representations quite easily
  • But how about machines?

52
Problems with Automated Interpretation of HTML
Documents
  • An intelligent agent trying to retrieve the names
  • of the authors of the book
  • Authors names could appear immediately after the
    title
  • or immediately after the word by
  • Are there two authors?
  • Or just one, called V. Marek and M.
    Truszczynski?

53
HTML vs XML Structural Information
  • HTML documents do not contain structural
    information pieces of the document and their
    relationships.
  • XML more easily accessible to machines because
  • Every piece of information is described.
  • Relations are also defined through the nesting
    structure.
  • E.g., the ltauthorgt tags appear within the ltbookgt
    tags, so they describe properties of the
    particular book.

54
HTML vs XML Structural Information (2)
  • A machine processing the XML document would be
    able to deduce that
  • the author element refers to the enclosing book
    element
  • rather than by proximity considerations
  • XML allows the definition of constraints on
    values
  • E.g. a year must be a number of four digits

55
HTML vs XML Formatting
  • The HTML representation provides more than the
    XML representation
  • The formatting of the document is also described
  • ?he main use of an HTML document is to display
    information it must define formatting
  • XML separation of content from display
  • same information can be displayed in different
    ways

56
HTML vs XML Another Example
  • In HTML
  • lth2gtRelationship matter-energylt/h2gt
  • ltigt E M c2 lt/igt
  • In XML
  • ltequationgt
  • ltmeaninggtRelationship matter
  • energylt/meaninggt
  • ltleftsidegt E lt/leftsidegt
  • ltrightsidegt M c2 lt/rightsidegt
  • lt/equationgt

57
HTML vs XML Different Use of Tags
  • In both HTML docs same tags
  • In XML completely different
  • HTML tags define display color, lists
  • XML tags not fixed user definable tags
  • XML meta markup language language for defining
    markup languages

58
XML Vocabularies
  • Web applications must agree on common
    vocabularies to communicate and collaborate
  • Communities and business sectors are defining
    their specialized vocabularies
  • mathematics (MathML)
  • bioinformatics (BSML)
  • human resources (HRML)

59
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

60
The XML Language
  • An XML document consists of
  • a prolog
  • a number of elements
  • an optional epilog (not discussed)

61
Prolog of an XML Document
  • The prolog consists of
  • an XML declaration and
  • an optional reference to external structuring
    documents
  • lt?xml version"1.0" encoding"UTF-16"?gt
  • lt!DOCTYPE book SYSTEM "book.dtd"gt

62
XML Elements
  • The things the XML document talks about
  • E.g. books, authors, publishers
  • An element consists of
  • an opening tag
  • the content
  • a closing tag
  • ltlecturergtDavid Billingtonlt/lecturergt

63
XML Elements (2)
  • Tag names can be chosen almost freely.
  • The first character must be a letter, an
    underscore, or a colon
  • No name may begin with the string xml in any
    combination of cases
  • E.g. Xml, xML

64
Content of XML Elements
  • Content may be text, or other elements, or
    nothing
  • ltlecturergt
  • ltnamegtDavid Billingtonlt/namegt
  • ltphonegt 61 - 7 - 3875 507 lt/phonegt
  • lt/lecturergt
  • If there is no content, then the element is
    called empty it is abbreviated as follows
  • ltlecturer/gt for ltlecturergtlt/lecturergt

65
XML Attributes
  • An empty element is not necessarily meaningless
  • It may have some properties in terms of
    attributes
  • An attribute is a name-value pair inside the
    opening tag of an element
  • ltlecturer name"David Billington" phone"61 - 7
    - 3875 507"/gt

66
XML Attributes An Example
  • ltorder orderNo"23456" customer"John Smith"
  • date"October 15, 2002"gt
  • ltitem itemNo"a528" quantity"1"/gt
  • ltitem itemNo"c817" quantity"3"/gt
  • lt/ordergt

67
The Same Example without Attributes
  • ltordergt
  • ltorderNogt23456lt/orderNogt
  • ltcustomergtJohn Smithlt/customergt
  • ltdategtOctober 15, 2002lt/dategt
  • ltitemgt
  • ltitemNogta528lt/itemNogt
  • ltquantitygt1lt/quantitygt
  • lt/itemgt
  • ltitemgt
  • ltitemNogtc817lt/itemNogt
  • ltquantitygt3lt/quantitygt
  • lt/itemgt
  • lt/ordergt

68
XML Elements vs Attributes
  • Attributes can be replaced by elements
  • When to use elements and when attributes is a
    matter of taste
  • But attributes cannot be nested

69
Further Components of XML Docs
  • Comments
  • A piece of text that is to be ignored by parser
  • lt!-- This is a comment --gt
  • Processing Instructions (PIs)
  • Define procedural attachments
  • lt?stylesheet type"text/css" href"mystyle.css"?gt

70
Well-Formed XML Documents
  • Syntactically correct documents
  • Some syntactic rules
  • Only one outermost element (called root element)
  • Each element contains an opening and a
    corresponding closing tag
  • Tags may not overlap
  • ltauthorgtltnamegtLee Honglt/authorgtlt/namegt
  • Attributes within an element have unique names
  • Element and tag names must be permissible

71
The Tree Model of XML Documents An Example
  • ltemailgt
  • ltheadgt
  • ltfrom name"Michael Maher"
  • address"michaelmaher_at_cs.gu.edu.au"/gt
  • ltto name"Grigoris Antoniou"
  • address"grigoris_at_cs.unibremen.de"/gt
  • ltsubjectgtWhere is your draft?lt/subjectgt
  • lt/headgt
  • ltbodygt
  • Grigoris, where is the draft of the paper you
    promised me
  • last week?
  • lt/bodygt
  • lt/emailgt

72
The Tree Model of XML Documents An Example (2)
73
The Tree Model of XML Docs
  • The tree representation of an XML document is an
    ordered labeled tree
  • There is exactly one root
  • There are no cycles
  • Each non-root node has exactly one parent
  • Each node has a label.
  • The order of elements is important
  • but the order of attributes is not important

74
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

75
Structuring XML Documents
  • Define all the element and attribute names that
    may be used
  • Define the structure
  • what values an attribute may take
  • which elements may or must occur within other
    elements, etc.
  • If such structuring information exists, the
    document can be validated

76
Structuring XML Dcuments (2)
  • An XML document is valid if
  • it is well-formed
  • respects the structuring information it uses
  • There are two ways of defining the structure of
    XML documents
  • DTDs (the older and more restricted way)
  • XML Schema (offers extended possibilities)

77
DTD Element Type Definition
  • ltlecturergt
  • ltnamegtDavid Billingtonlt/namegt
  • ltphonegt 61 - 7 - 3875 507 lt/phonegt
  • lt/lecturergt
  • DTD for above element (and all lecturer
    elements)
  • lt!ELEMENT lecturer (name,phone)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT phone (PCDATA)gt

78
The Meaning of the DTD
  • The element types lecturer, name, and phone may
    be used in the document
  • A lecturer element contains a name element and a
    phone element, in that order (sequence)
  • A name element and a phone element may have any
    content
  • In DTDs, PCDATA is the only atomic type for
    elements

79
DTD Disjunction in Element Type Definitions
  • We express that a lecturer element contains
    either a name element or a phone element as
    follows
  • lt!ELEMENT lecturer (namephone)gt
  • A lecturer element contains a name element and a
    phone element in any order.
  • lt!ELEMENT lecturer((name,phone)(phone,name))gt

80
Example of an XML Element
  • ltorder orderNo"23456"
  • customer"John Smith"
  • date"October 15, 2002"gt
  • ltitem itemNo"a528" quantity"1"/gt
  • ltitem itemNo"c817" quantity"3"/gt
  • lt/ordergt

81
The Corresponding DTD
  • lt!ELEMENT order (item)gt
  • lt!ATTLIST order orderNo ID REQUIRED
  • customer CDATA REQUIRED
  • date CDATA REQUIREDgt
  • lt!ELEMENT item EMPTYgt
  • lt!ATTLIST item itemNo ID REQUIRED
  • quantity CDATA REQUIRED
  • comments CDATA IMPLIEDgt

82
Comments on the DTD
  • The item element type is defined to be empty
  • (after item) is a cardinality operator
  • ? appears zero times or once
  • appears zero or more times
  • appears one or more times
  • No cardinality operator means exactly once

83
Comments on the DTD (2)
  • In addition to defining elements, we define
    attributes
  • This is done in an attribute list containing
  • Name of the element type to which the list
    applies
  • A list of triplets of attribute name, attribute
    type, and value type
  • Attribute name A name that may be used in an XML
    document using a DTD

84
DTD Attribute Types
  • Similar to predefined data types, but limited
    selection
  • The most important types are
  • CDATA, a string (sequence of characters)
  • ID, a name that is unique across the entire XML
    document
  • IDREF, a reference to another element with an ID
    attribute carrying the same value as the IDREF
    attribute
  • IDREFS, a series of IDREFs
  • (v1 . . . vn), an enumeration of all possible
    values
  • Limitations no dates, number ranges etc.

85
DTD Attribute Value Types
  • REQUIRED
  • Attribute must appear in every occurrence of the
    element type in the XML document
  • IMPLIED
  • The appearance of the attribute is optional
  • FIXED "value"
  • Every element must have this attribute
  • "value"
  • This specifies the default value for the
    attribute

86
Referencing with IDREF and IDREFS
  • lt!ELEMENT family (person)gt
  • lt!ELEMENT person (name)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ATTLIST person id ID REQUIRED
  • mother IDREF IMPLIED
  • father IDREF IMPLIED
  • children IDREFS IMPLIEDgt

87
An XML Document Respecting the DTD
  • ltfamilygt
  • ltperson id"bob" mother"mary" father"peter"gt
  • ltnamegtBob Marleylt/namegt
  • lt/persongt
  • ltperson id"bridget" mother"mary"gt
  • ltnamegtBridget Joneslt/namegt
  • lt/persongt
  • ltperson id"mary" children"bob bridget"gt
  • ltnamegtMary Poppinslt/namegt
  • lt/persongt
  • ltperson id"peter" children"bob"gt
  • ltnamegtPeter Marleylt/namegt
  • lt/persongt
  • lt/familygt

88
A DTD for an Email Element
  • lt!ELEMENT email (head,body)gt
  • lt!ELEMENT head (from,to,cc,subject)gt
  • lt!ELEMENT from EMPTYgt
  • lt!ATTLIST from name CDATA IMPLIED
  • address CDATA REQUIREDgt
  • lt!ELEMENT to EMPTYgt
  • lt!ATTLIST to name CDATA IMPLIED
  • address CDATA REQUIREDgt

89
A DTD for an Email Element (2)
  • lt!ELEMENT cc EMPTYgt
  • lt!ATTLIST cc name CDATA IMPLIED
  • address CDATA REQUIREDgt
  • lt!ELEMENT subject (PCDATA)gt
  • lt!ELEMENT body (text,attachment)gt
  • lt!ELEMENT text (PCDATA)gt
  • lt!ELEMENT attachment EMPTYgt
  • lt!ATTLIST attachment
  • encoding (mimebinhex) "mime"
  • file CDATA REQUIREDgt

90
Interesting Parts of the DTD
  • A head element contains (in that order)
  • a from element
  • at least one to element
  • zero or more cc elements
  • a subject element
  • In from, to, and cc elements
  • the name attribute is not required
  • the address attribute is always required

91
Interesting Parts of the DTD (2)
  • A body element contains
  • a text element
  • possibly followed by a number of attachment
    elements
  • The encoding attribute of an attachment element
    must have either the value mime or binhex
  • mime is the default value

92
Remarks on DTDs
  • A DTD can be interpreted as an Extended
    Backus-Naur Form (EBNF)
  • lt!ELEMENT email (head,body)gt
  • is equivalent to email head body
  • Recursive definitions possible in DTDs
  • lt!ELEMENT bintree
  • ((bintree root bintree)emptytree)gt

93
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

94
XML Schema
  • Significantly richer language for defining the
    structure of XML documents
  • Tts syntax is based on XML itself
  • not necessary to write separate tools
  • Reuse and refinement of schemas
  • Expand or delete already existent schemas
  • Sophisticated set of data types, compared to DTDs
    (which only supports strings)

95
XML Schema (2)
  • An XML schema is an element with an opening tag
    like
  • ltschema "http//www.w3.org/2000/10/XMLSchema"
  • version"1.0"gt
  • Structure of schema elements
  • Element and attribute types using data types

96
Element Types
  • ltelement name"email"/gt
  • ltelement name"head" minOccurs"1"
    maxOccurs"1"/gt
  • ltelement name"to" minOccurs"1"/gt
  • Cardinality constraints
  • minOccurs"x" (default value 1)
  • maxOccurs"x" (default value 1)
  • Generalizations of ,?, offered by DTDs

97
Attribute Types
  • ltattribute name"id" type"ID use"required"/gt
  • lt attribute name"speaks" type"Language"
  • use"default" value"en"/gt
  • Existence use"x", where x may be optional or
    required
  • Default value use"x" value"...", where x may
    be default or fixed

98
Data Types
  • There is a variety of built-in data types
  • Numerical data types integer, Short etc.
  • String types string, ID, IDREF, CDATA etc.
  • Date and time data types time, Month etc.
  • There are also user-defined data types
  • simple data types, which cannot use elements or
    attributes
  • complex data types, which can use these

99
Data Types (2)
  • Complex data types are defined from already
    existing data types by defining some attributes
    (if any) and using
  • sequence, a sequence of existing data type
    elements (order is important)
  • all, a collection of elements that must appear
    (order is not important)
  • choice, a collection of elements, of which one
    will be chosen

100
A Data Type Example
  • ltcomplexType name"lecturerType"gt
  • ltsequencegt
  • ltelement name"firstname" type"string"
  • minOccurs"0 maxOccurs"unbounded"/gt
  • ltelement name"lastname" type"string"/gt
  • lt/sequencegt
  • ltattribute name"title" type"string"
    use"optional"/gt
  • lt/complexTypegt

101
Data Type Extension
  • Already existing data types can be extended by
    new elements or attributes. Example
  • ltcomplexType name"extendedLecturerType"gt
  • ltextension base"lecturerType"gt
  • ltsequencegt
  • ltelement name"email" type"string"
  • minOccurs"0" maxOccurs"1"/gt
  • lt/sequencegt
  • ltattribute name"rank" type"string"
    use"required"/gt
  • lt/extensiongt
  • lt/complexTypegt

102
Resulting Data Type
  • ltcomplexType name"extendedLecturerType"gt
  • ltsequencegt
  • ltelement name"firstname" type"string"
  • minOccurs"0" maxOccurs"unbounded"/gt
  • ltelement name"lastname" type"string"/gt
  • ltelement name"email" type"string"
  • minOccurs"0" maxOccurs"1"/gt
  • lt/sequencegt
  • ltattribute name"title" type"string"
    use"optional"/gt
  • ltattribute name"rank" type"string"
    use"required"/gt
  • lt/complexTypegt

103
Data Type Extension (2)
  • A hierarchical relationship exists between the
    original and the extended type
  • Instances of the extended type are also instances
    of the original type
  • They may contain additional information, but
    neither less information, nor information of the
    wrong type

104
Data Type Restriction
  • An existing data type may be restricted by adding
    constraints on certain values
  • Restriction is not the opposite from extension
  • Restriction is not achieved by deleting elements
    or attributes
  • The following hierarchical relationship still
    holds
  • Instances of the restricted type are also
    instances of the original type
  • They satisfy at least the constraints of the
    original type

105
Example of Data Type Restriction
  • ltcomplexType name"restrictedLecturerType"gt
  • ltrestriction base"lecturerType"gt
  • ltsequencegt
  • ltelement name"firstname" type"string"
  • minOccurs"1" maxOccurs"2"/gt
  • lt/sequencegt
  • ltattribute name"title" type"string"
  • use"required"/gt
  • lt/restrictiongt
  • lt/complexTypegt

106
Restriction of Simple Data Types
  • ltsimpleType name"dayOfMonth"gt
  • ltrestriction base"integer"gt
  • ltminInclusive value"1"/gt
  • ltmaxInclusive value"31"/gt
  • lt/restrictiongt
  • lt/simpleTypegt

107
Data Type Restriction Enumeration
  • ltsimpleType name"dayOfWeek"gt
  • ltrestriction base"string"gt
  • ltenumeration value"Mon"/gt
  • ltenumeration value"Tue"/gt
  • ltenumeration value"Wed"/gt
  • ltenumeration value"Thu"/gt
  • ltenumeration value"Fri"/gt
  • ltenumeration value"Sat"/gt
  • ltenumeration value"Sun"/gt
  • lt/restrictiongt
  • lt/simpleTypegt

108
XML Schema The Email Example
  • ltelement name"email" type"emailType"/gt
  • ltcomplexType name"emailType"gt
  • ltsequencegt
  • ltelement name"head" type"headType"/gt
  • ltelement name"body" type"bodyType"/gt
  • lt/sequencegt
  • lt/complexTypegt

109
XML Schema The Email Example (2)
  • ltcomplexType name"headType"gt
  • ltsequencegt
  • ltelement name"from" type"nameAddress"/gt
  • ltelement name"to" type"nameAddress"
  • minOccurs"1" maxOccurs"unbounded"/gt
  • ltelement name"cc" type"nameAddress"
  • minOccurs"0" maxOccurs"unbounded"/gt
  • ltelement name"subject" type"string"/gt
  • lt/sequencegt
  • lt/complexTypegt

110
XML Schema The Email Example (3)
  • ltcomplexType name"nameAddress"gt
  • ltattribute name"name" type"string"
    use"optional"/gt
  • ltattribute name"address" type"string"
    use"required"/gt
  • lt/complexTypegt
  • Similar for bodyType

111
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

112
Namespaces
  • An XML document may use more than one DTD or
    schema
  • Since each structuring document was developed
    independently, name clashes may appear
  • The solution is to use a different prefix for
    each DTD or schema
  • prefixname

113
An Example
  • ltvuinstructors xmlnsvu"http//www.vu.com/empDT
    D"
  • xmlnsgu"http//www.gu.au/empDTD"
  • xmlnsuky"http//www.uky.edu/empDTD"gt
  • ltukyfaculty ukytitle"assistant professor"
  • ukyname"John Smith"
  • ukydepartment"Computer Science"/gt
  • ltguacademicStaff gutitle"lecturer"
  • guname"Mate Jones"
  • guschool"Information Technology"/gt
  • lt/vuinstructorsgt

114
Namespace Declarations
  • Namespaces are declared within an element and can
    be used in that element and any of its children
    (elements and attributes)
  • A namespace declaration has the form
  • xmlnsprefix"location"
  • location is the address of the DTD or schema
  • If a prefix is not specified xmlns"location"
    then the location is used by default

115
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

116
Addressing and Querying XML Documents
  • In relational databases, parts of a database can
    be selected and retrieved using SQL
  • Same necessary for XML documents
  • Query languages XQuery, XQL, XML-QL
  • The central concept of XML query languages is a
    path expression
  • Specifies how a node or a set of nodes, in the
    tree representation of the XML document can be
    reached

117
XPath
  • XPath is core for XML query languages
  • Language for addressing parts of an XML document.
  • It operates on the tree data model of XML
  • It has a non-XML syntax

118
Types of Path Expressions
  • Absolute (starting at the root of the tree)
  • Syntactically they begin with the symbol /
  • It refers to the root of the document (situated
    one level above the root element of the document)
  • Relative to a context node

119
An XML Example
  • ltlibrary location"Bremen"gt
  • ltauthor name"Henry Wise"gt
  • ltbook title"Artificial Intelligence"/gt
  • ltbook title"Modern Web Services"/gt
  • ltbook title"Theory of Computation"/gt
  • lt/authorgt
  • ltauthor name"William Smart"gt
  • ltbook title"Artificial Intelligence"/gt
  • lt/authorgt
  • ltauthor name"Cynthia Singleton"gt
  • ltbook title"The Semantic Web"/gt
  • ltbook title"Browser Technology Revised"/gt
  • lt/authorgt
  • lt/librarygt

120
Tree Representation
121
Examples of Path Expressions in XPath
  • Address all author elements
  • /library/author
  • Addresses all author elements that are children
    of the library element node, which resides
    immediately below the root
  • /t1/.../tn, where each ti1 is a child node of
    ti, is a path through the tree representation

122
Examples of Path Expressions in XPath (2)
  • Address all author elements
  • //author
  • Here // says that we should consider all elements
    in the document and check whether they are of
    type author
  • This path expression addresses all author
    elements anywhere in the document

123
Examples of Path Expressions in XPath (3)
  • Address the location attribute nodes within
    library element nodes
  • /library/_at_location
  • The symbol _at_ is used to denote attribute nodes

124
Examples of Path Expressions in XPath (4)
  • Address all title attribute nodes within book
    elements anywhere in the document, which have the
    value Artificial Intelligence
  • //book/_at_title"Artificial Intelligence"

125
Examples of Path Expressions in XPath (5)
  • Address all books with title Artificial
    Intelligence
  • /book_at_title"Artificial Intelligence"
  • Test within square brackets a filter expression
  • It restricts the set of addressed nodes.
  • Difference with query 4.
  • Query 5 addresses book elements, the title of
    which satisfies a certain condition.
  • Query 4 collects title attribute nodes of book
    elements

126
Tree Representation of Query 4
127
Tree Representation of Query 5
128
Examples of Path Expressions in XPath (6)
  • Address the first author element node in the XML
    document
  • //author1
  • Address the last book element within the first
    author element node in the document
  • //author1/booklast()
  • Address all book element nodes without a title
    attribute
  • //booknot _at_title

129
General Form of Path Expressions
  • A path expression consists of a series of steps,
    separated by slashes
  • A step consists of
  • An axis specifier,
  • A node test, and
  • An optional predicate

130
General Form of Path Expressions (2)
  • An axis specifier determines the tree
    relationship between the nodes to be addressed
    and the context node
  • E.g. parent, ancestor, child (the default),
    sibling, attribute node
  • // is such an axis specifier descendant or self

131
General Form of Path Expressions (3)
  • A node test specifies which nodes to address
  • The most common node tests are element names
  • E.g., addresses all element nodes
  • comment() addresses all comment nodes

132
General Form of Path Expressions (4)
  • Predicates (or filter expressions) are optional
    and are used to refine the set of addressed nodes
  • E.g., the expression 1 selects the first node
  • position()last() selects the last node
  • position() mod 2 0 selects the even nodes
  • XPath has a more complicated full syntax.
  • We have only presented the abbreviated syntax

133
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

134
Displaying XML Documents
  • ltauthorgt
  • ltnamegtGrigoris Antonioult/namegt
  • ltaffiliationgtUniversity of Bremenlt/affiliationgt
  • ltemailgtga_at_tzi.delt/emailgt
  • lt/authorgt
  • may be displayed in different ways
  • Grigoris Antoniou Grigoris Antoniou
  • University of Bremen University of Bremen
  • ga_at_tzi.de ga_at_tzi.de

135
Style Sheets
  • Style sheets can be written in various languages
  • E.g. CSS2 (cascading style sheets level 2)
  • XSL (extensible stylesheet language)
  • XSL includes
  • a transformation language (XSLT)
  • a formatting language
  • Both are XML applications

136
XSL Transformations (XSLT)
  • XSLT specifies rules with which an input XML
    document is transformed to
  • another XML document
  • an HTML document
  • plain text
  • The output document may use the same DTD or
    schema, or a completely different vocabulary
  • XSLT can be used independently of the formatting
    language

137
XSLT (2)
  • Move data and metadata from one XML
    representation to another
  • XSLT is chosen when applications that use
    different DTDs or schemas need to communicate
  • XSLT can be used for machine processing of
    content without any regard to displaying the
    information for people to read.
  • In the following we use XSLT only to display XML
    documents

138
XSLT Transformation into HTML
  • ltxsltemplate match"/author"gt
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgtltxslvalue-of select"name"/gtlt/bgtltbrgt
  • ltxslvalue-of select"affiliation"/gtltbrgt
  • ltigtltxslvalue-of select"email"/gtlt/igt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt

139
Style Sheet Output
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgtGrigoris Antonioult/bgtltbrgt
  • University of Bremenltbrgt
  • ltigtga_at_tzi.delt/igt
  • lt/bodygt
  • lt/htmlgt

140
Observations About XSLT
  • XSLT documents are XML documents
  • XSLT resides on top of XML
  • The XSLT document defines a template
  • In this case an HTML document, with some
    placeholders for content to be inserted
  • xslvalue-of retrieves the value of an element
    and copies it into the output document
  • It places some content into the template

141
A Template
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgt...lt/bgtltbrgt
  • ...ltbrgt
  • ltigt...lt/igt
  • lt/bodygt
  • lt/htmlgt

142
Auxiliary Templates
  • We have an XML document with details of several
    authors
  • It is a waste of effort to treat each author
    element separately
  • In such cases, a special template is defined for
    author elements, which is used by the main
    template

143
Example of an Auxiliary Template
  • ltauthorsgt
  • ltauthorgt
  • ltnamegtGrigoris Antonioult/namegt
  • ltaffiliationgtUniversity of Bremenlt/affiliationgt
  • ltemailgtga_at_tzi.delt/emailgt
  • lt/authorgt
  • ltauthorgt
  • ltnamegtDavid Billingtonlt/namegt
  • ltaffiliationgtGriffith Universitylt/affiliationgt
  • ltemailgtdavid_at_gu.edu.netlt/emailgt
  • lt/authorgt
  • lt/authorsgt

144
Example of an Auxiliary Template (2)
  • ltxsltemplate match"/"gt
  • lthtmlgt
  • ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltxslapply-templates select"authors"/gt
  • lt!-- Apply templates for AUTHORS children
    --gt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt

145
Example of an Auxiliary Template (3)
  • ltxsltemplate match"authors"gt
  • ltxslapply-templates select"author"/gt
  • lt/xsltemplategt
  • ltxsltemplate match"author"gt
  • lth2gtltxslvalue-of select"name"/gtlt/h2gt
  • Affiliationltxslvalue-of
  • select"affiliation"/gtltbrgt
  • Email ltxslvalue-of select"email"/gt
  • ltpgt
  • lt/xsltemplategt

146
Multiple Authors Output
  • lthtmlgt
  • ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • lth2gtGrigoris Antonioult/h2gt
  • Affiliation University of Bremenltbrgt
  • Email ga_at_tzi.de
  • ltpgt
  • lth2gtDavid Billingtonlt/h2gt
  • Affiliation Griffith Universityltbrgt
  • Email david_at_gu.edu.net
  • ltpgt
  • lt/bodygt
  • lt/htmlgt

147
Explanation of the Example
  • xslapply-templates element causes all children
    of the context node to be matched against the
    selected path expression
  • E.g., if the current template applies to /, then
    the element xslapply-templates applies to the
    root element
  • I.e. the authors element (/ is located above the
    root element)
  • If the current context node is the authors
    element, then the element xslapply-templates
    select"author" causes the template for the
    author elements to be applied to all author
    children of the authors element

148
Explanation of the Example (2)
  • It is good practice to define a template for each
    element type in the document
  • Even if no specific processing is applied to
    certain elements, the xslapply-templates element
    should be used
  • E.g. authors
  • In this way, we work from the root to the leaves
    of the tree, and all templates are applied

149
Processing XML Attributes
  • Suppose we wish to transform to itself the
    element
  • ltperson firstname"John" lastname"Woo"/gt
  • Wrong solution
  • ltxsltemplate match"person"gt
  • ltperson firstname"ltxslvalue-of
    select"_at_firstname"gt"
  • lastname"ltxslvalue-of select"_at_lastname"gt"/gt
  • lt/xsltemplategt

150
Processing XML Attributes (2)
  • Not well-formed because tags are not allowed
    within the values of attributes
  • We wish to add attribute values into template
  • ltxsltemplate match"person"gt
  • ltperson firstname"_at_firstname"
  • lastname"_at_lastname"/gt
  • lt/xsltemplategt

151
Transforming an XML Document to Another
152
Transforming an XML Document to Another (2)
  • ltxsltemplate match"/"gt
  • lt?xml version"1.0" encoding"UTF-16"?gt
  • ltauthorsgt
  • ltxslapply-templates select"authors"/gt
  • lt/authorsgt
  • lt/xsltemplategt
  • ltxsltemplate match"authors"gt
  • ltauthorgt
  • ltxslapply-templates select"author"/gt
  • lt/authorgt
  • lt/xsltemplategt

153
Transforming an XML Document to Another (3)
  • ltxsltemplate match"author"gt
  • ltnamegtltxslvalue-of select"name"/gtlt/namegt
  • ltcontactgt
  • ltinstitutiongt
  • ltxslvalue-of select"affiliation"/gt
  • lt/institutiongt
  • ltemailgtltxslvalue-of select"email"/gtlt/emailgt
  • lt/contactgt
  • lt/xsltemplategt

154
Summary
  • XML is a metalanguage that allows users to define
    markup
  • XML separates content and structure from
    formatting
  • XML is the de facto standard for the
    representation and exchange of structured
    information on the Web
  • XML is supported by query languages

155
Points for Discussion in Subsequent Chapters
  • The nesting of tags does not have standard
    meaning
  • The semantics of XML documents is not accessible
    to machines, only to people
  • Collaboration and exchange are supported if there
    is underlying shared understanding of the
    vocabulary
  • XML is well-suited for close collaboration, where
    domain- or community-based vocabularies are used
  • It is not so well-suited for global communication.

156
Chapter 3Describing Web Resources in RDF
  • Grigoris Antoniou
  • Frank van Harmelen

157
Lecture Outline
  1. Basic Ideas of RDF
  2. XML-based Syntax of RDF
  3. Basic Concepts of RDF Schema
  4. ?he Language of RDF Schema
  5. The Namespaces of RDF and RDF Schema
  6. Axiomatic Semantics for RDF and RDFS
  7. Direct Semantics based on Inference Rules
  8. Querying of RDF/RDFS Documents using RQL

158
Drawbacks of XML
  • XML is a universal metalanguage for defining
    markup
  • It provides a uniform framework for interchange
    of data and metadata between applications
  • However, XML does not provide any means of
    talking about the semantics (meaning) of data
  • E.g., there is no intended meaning associated
    with the nesting of tags
  • It is up to each application to interpret the
    nesting.

159
Nesting of Tags in XML
  • David Billington is a lecturer of Discrete Maths
  • ltcourse name"Discrete Maths"gt
  • ltlecturergtDavid Billingtonlt/lecturergt
  • lt/coursegt
  • ltlecturer name"David Billington"gt
  • ltteachesgtDiscrete Mathslt/teachesgt
  • lt/lecturergt
  • Opposite nesting, same information!

160
Basic Ideas of RDF
  • Basic building block object-attribute-value
    triple
  • It is called a statement
  • Sentence about Billington is such a statement
  • RDF has been given a syntax in XML
  • This syntax inherits the benefits of XML
  • Other syntactic representations of RDF possible

161
Basic Ideas of RDF (2)
  • The fundamental concepts of RDF are
  • resources
  • properties
  • statements

162
Resources
  • We can think of a resource as an object, a
    thing we want to talk about
  • E.g. authors, books, publishers, places, people,
    hotels
  • Every resource has a URI, a Universal Resource
    Identifier
  • A URI can be
  • a URL (Web address) or
  • some other kind of unique identifier

163
Properties
  • Properties are a special kind of resources
  • They describe relations between resources
  • E.g. written by, age, title, etc.
  • Properties are also identified by URIs
  • Advantages of using URIs
  • ? global, worldwide, unique naming scheme
  • Reduces the homonym problem of distributed data
    representation

164
Statements
  • Statements assert the properties of resources
  • A statement is an object-attribute-value triple
  • It consists of a resource, a property, and a
    value
  • Values can be resources or literals
  • Literals are atomic values (strings)

165
Three Views of a Statement
  • A triple
  • A piece of a graph
  • A piece of XML code
  • Thus an RDF document can be viewed as
  • A set of triples
  • A graph (semantic net)
  • An XML document

166
Statements as Triples
  • (David Billington,
  • http//www.mydomain.org/site-owner,
  • http//www.cit.gu.edu.au/db)
  • The triple (x,P,y) can be considered as a logical
    formula P(x,y)
  • Binary predicate P relates object x to object y
  • RDF offers only binary predicates (properties)

167
XML Vocabularies
  • A directed graph with labeled nodes and arcs
  • from the resource (the subject of the statement)
  • to the value (the object of the statement)
  • Known in AI as a semantic net
  • The value of a statement may be a resource
  • ?t may be linked to other resources

168
A Set of Triples as a Semantic Net
169
Statements in XML Syntax
  • Graphs are a powerful tool for human
    understanding but
  • The Semantic Web vision requires
    machine-accessible and machine-processable
    representations
  • There is a 3rd representation based on XML
  • But XML is not a part of the RDF data model
  • E.g. serialisation of XML is irrelevant for RDF

170
Statements in XML (2)
  • ltrdfRDF
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synta
    x-ns"
  • xmlnsmydomain"http//www.mydomain.org/my-rdf-ns
    "gt
  • ltrdfDescription
  • rdfabout"http//www.cit.gu.edu.au/db"gt
  • ltmydomainsite-ownergt
  • David Billington
  • lt/mydomainsite-ownergt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

171
Statements in XML (3)
  • An RDF document is represented by an XML element
    with the tag rdfRDF
  • The content of this element is a number of
    descriptions, which use rdfDescription tags.
  • E
About PowerShow.com