COMS E6125 Web-enHanced Information Management (WHIM) - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

COMS E6125 Web-enHanced Information Management (WHIM)

Description:

COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012 21 February 2012 Kaiser: COMS E6125* – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 84
Provided by: GailK159
Category:

less

Transcript and Presenter's Notes

Title: COMS E6125 Web-enHanced Information Management (WHIM)


1
COMS E6125 Web-enHanced Information Management
(WHIM)
  • Prof. Gail Kaiser
  • Spring 2012

2
Todays Topic
  • Introduction to theSemantic Web
  • RDF
  • Ontologies

3
Simplicity is Good
  • The World Wide Web contains huge amounts of
    information created by many different
    organizations, communities and individuals for
    many different reasons
  • Web users can easily access this information by
    specifying a known URL or using a search engine,
    and following links to find other related
    resources
  • This simplicity is a key aspect that made the Web
    so popular

4
Simplicity is Bad
  • The simplicity of the current Web has a price
  • It is very easy to get lost, or discover
    irrelevant or unrelated information
  • For instance, if we search for courses taught by
    a person named Gail Kaiser, we might find all
    kinds of other information
  • https//www.google.com/search?qcoursestaughtby
    gailkaiserieutf-8oeutf-8aqtrlsorg.mozilla
    en-USofficialclientfirefox-a
  • The problem is that the search engine does know
    what courses or taught means

5
Machine accessible meaning (What its like
to be a machine)
6
So what does this mean?
  • Whats a CV?
  • Whats a name?
  • Etc.
  • Need semantics

7
What to do?
  • Develop enabling standards and technologies
  • to help machines understand more information on
    the Web
  • so that they can support richer discovery, data
    integration, navigation and automation of tasks

8
Add Metadata
  • Associate semantically rich, descriptive
    information with any resource
  • For instance, add metadata about teaching, so we
    can search for documents that have metadata
    specifying Gail Kaiser as a teacher (or
    instructor)

9
The Semantic Web
  • Provides a common framework that allows data to
    be shared and reused across application,
    enterprise and community boundaries
  • Not only provides URLs for documents, but to
    people, concepts and relationships
  • By giving unique identifiers to the person, the
    role teacher and the concept of course, we
    make very clear who the person is and the
    corresponding relation between this person and a
    particular document

10
Whats the difference?
  • Most Web content today is designed for humans to
    read, not for computer programs to manipulate
    meaningfully
  • Computers can adeptly parse Web pages for layout
    and routine processinghere a header, there a
    link to another pagebut in general, computers
    have no reliable way to process the semantics
  • The Semantic Web brings structure to the
    meaningful content of Web pages, creating an
    environment where software agents roaming from
    page to page can carry out sophisticated tasks
    for users

11
Whats the difference?
The Semantic Web is not a separate web but an
extension of the current web, in which
information is given well-defined meaning, better
enabling computers and people to work in
co-operation. Berners-Lee et al., 2001
12
Wasnt that what XML was supposed to do?
  • Yes and no
  • For the Semantic Web to function, computers must
    have access to structured collections of
    information and to sets of inference rules that
    they can use to conduct automated reasoning

13
Isnt that just Knowledge Representation?
  • Traditional knowledge representation systems
    typically have been centralized, requiring
    everyone to share exactly the same definition of
    common concepts such as parent or vehicle
  • But central control is stifling, and doesnt
    scale
  • Which is why centralized hypertext link servers
    were abandoned for WWW

14
What about Web Services?
  • Web services are computational programs accessed
    using Web technologies
  • They may or may not operate on Web pages as data
  • But when they do, the semantics are implied by
    WSDL descriptions but basically hidden inside the
    code
  • There is no way for an arbitrary Web service or
    other program to understand the semantics of
    Web pages

15
Semantic Web Layers(T. Berners-Lee)
16
Start with XML, not HTML
HTML
ltH1gtWHIMlt/H1gt ltULgt ltLIgtInstructor Gail
Kaiser ltLIgtStudents Donald Duck lt/ULgt
XML
ltcourse dateSpring 2012gt lttitlegtWHIMlt/titlegt
ltinstructorgtGail Kaiserlt/instructorgt ltstudentsgtDo
nald Ducklt/studentsgtlt/coursegt
17
XML document labeled tree
  • node label attr/values contents


ltcourse date...gt lttitlegt...lt/titlegt
ltinstructorgt...lt/instructorgt ltnamegt...lt/namegt lth
ttpgt...lt/httpgt ltstudentsgt...lt/studentsgtlt/course
gt
  • XML Schema grammars for describing legal
    trees and datatypes

18
Why not use XML Tags to represent Semantics?
  • Syntax the structure of your data
  • Semantics the meaning of your data
  • Two conditions necessary for interoperability
  • Adopt a common syntax enables applications to
    parse the data
  • Adopt a means for understanding the semantics
    enables applications to use the data

19
XML and Semantics?
  • lttitlegt lttitlegt
  • But what does title mean?
  • If we ask google, we get (on the 1st page)
  • Sports equipment and competitions
  • Prefix or suffix added to persons name
  • Laws regarding rights to a piece of property
  • HTML tag
  • Womens underwear
  • Library search for books
  • A research paper on ENVIRONMENTAL AND ECONOMIC
    COSTS ASSOCIATED WITH NON-INDIGENOUS SPECIES IN
    THE UNITED STATES

20
XML Limitations for Semantic Markup
  • XML makes no commitment on
  • ? Domain-specific vocabulary
  • ? Modeling primitives
  • Requires pre-arranged agreement on ? ?
  • Only feasible for closed collaboration
  • agents in a small stable community
  • pages on a small stable intranet
  • Not suited for sharing Web resources

21
XML ? machine accessible meaning
22
Beyond XML
  • XML lets everyone create their own tags
  • Scripts, or programs, can make use of these tags
    in sophisticated ways - but the programmer has to
    know what the page writer uses each tag for
  • XML allows users to add structure to their
    documents but says nothing about what the
    structures mean

23
Semantic Web Layers
24
Add RDF Resource Description Framework
  • Encodes meaning in sets of triples - subject,
    predicate and object - analogous to the subject,
    verb and object of an elementary sentence
  • Makes assertions that particular things (people,
    Web pages or whatever) have properties (such as
    is a sister of, is the author of) with
    certain values (another person, another Web page)
  • This structure can describe much of the data
    processed by machines

25
Example
  • Imagine that we want to state the fact that
    someone named Gail Kaiser wrote a particular Web
    page
  • A straightforward way to state this in English
    would be in the form of a simple statement such
    as
  • http//www.cs.columbia.edu/kaiser/index.html has
    an author whose value is Gail Kaiser

26
Making Statements about Resources
  • We need a way to identify the thing we want to
    describe (the Web page)
  • We need a way to identify a specific property
    (author) of the thing that we want to describe
  • We need a way to identify the thing we want to
    assign as the value of this property (who the
    author is), for the thing we want to describe

27
Making Statements about Resources
  • In the example, we used the Web page's URL
    (Uniform Resource Locator) to identify it -
    subject
  • We used the word author to identify the
    property we want to talk about - predicate
  • And the phrase Gail Kaiser to identify the
    thing (a person) we want to say is the value of
    this property - object

28
Many Statements can be made
  • We could state other properties of this Web page
    by writing additional English statements of the
    same general form
  • http//www.cs.columbia.edu/kaiser/index.html has
    a modification-date whose value is February 01,
    2012
  • http//www.cs.columbia.edu/kaiser/index.html has
    a size whose value is 20,860 bytes

29
But what do these Statements actually mean?
  • Subject and object can each be identified by a
    URL, just as used in a link on a Web page
  • The verbs predicates can also be identified
    by URLs, which enables anyone to define a new
    concept, a new predicate, just by defining a URL
    for it somewhere on the Web (a Web resource)
  • The URLs ensure that concepts are not just words
    in a document, but are tied to a unique
    definition that everyone can find on the Web

30
Web Resources
  • RDF is a language for representing information
    about resources on the World Wide Web
  • It is particularly intended for representing
    metadata about Web resources, such as the title,
    author, modification date and size of a Web page

31
Generalized Resources
  • By generalizing the concept of a Web resource,
    RDF can be used to represent information about
    things that can be identified on the Web, even
    when they can't be directly retrieved on the Web
  • Examples include the author of the web page

32
Reconsider Example
  • http//www.cs.columbia.edu/kaiser/index.html has
    an author whose value is Gail Kaiser
  • Neither the notion of a author nor Gail Kaiser
    can be retrieved from the Web
  • Thus we need URIs in addition to URLs

33
Concept Graphs
  • RDF is based on the idea of identifying things
    using URIs
  • And describing resources (subjects) in terms of
    simple properties (verbs or predicates) and
    property values (objects)
  • This enables RDF to represent related concepts as
    a graph of nodes and arcs representing the
    resources, their properties and values

34
Concept Graph Example
  • XML syntax
  • Chained triples form a graph

kaiser6125_at_...
email
Kaiser
35
Information Exchange
  • RDF provides a common framework for expressing
    this information so it can be exchanged between
    applications without loss of meaning
  • The ability to exchange information between
    different applications means that the information
    may be made available to applications other than
    those for which it was originally created
  • Application designers can leverage the
    availability of common RDF parsers and processing
    tools
  • RDF is written in XML format further leveraging
    XML tools and experience

36
What is RDF (again) ?
  • RDF is a data model
  • the model is domain-neutral and
    application-neutral
  • the model can be viewed as directed, labeled
    graphs or as an object-oriented model
    (object/attribute/value)
  • RDF data model is an abstract, conceptual layer
    independent of XML
  • consequently, XML is a transfer syntax for RDF,
    not a component of RDF
  • RDF data might never occur in XML form

37
RDF Model
  • RDF statements consist of
  • resources ( nodes)which have propertieswhich
    have values ( nodes,strings)

subject predicate object
38
RDF Model
http//www.w3.org/TR/REC-rdf-syntax/ has the
editor Dave Beckett
39
RDF Model Example
W3C
dcPublisher
http//www.w3.org/TR/REC-rdf-syntax/
dcCreator
dcDate
Dave Beckett
2004-02-10
40
Complex Values
  • So far, values of properties have been strings
  • A graph node (corresponding to a resource) also
    can be the value of a property
  • arbitrarily complex tree and graph structures are
    possible
  • syntactically, values can be embedded (i.e.,
    lexically in-line) or referenced (linked)

41
Complex Values
42
Complex Values
  • Corresponding triples
  • http//www.w3.org/TR/REC-rdf-syntax/,
    dcCreator, x
  • x, pName, Dave Beckett
  • x, pEMail, dave_at_dajobe.org

43
Containers
  • Containers are collections - allow grouping of
    resources (or literal values)
  • It is possible to make statements about the
    container (as a whole) or about its members
    individually Different types of containers
  • bag - unordered collection
  • seq - ordered collection ( sequence)
  • alt - represents alternatives
  • It is possible to create collections based on URI
    patterns e.g., all files in a particular web
    site
  • Duplicate values are permitted - no mechanism to
    enforce unique value constraints

44
Containers
45
Higher-order Statements
  • One can make RDF statements about other RDF
    statements
  • Example The Library of Congress affiliates Dave
    Beckett as the author of the RDF Syntax spec
  • Allow us to express beliefs (and other
    modalities)
  • Important for trust models, digital signatures,
    etc.
  • Constitute metadata about metadata
  • Represented by modeling RDF in RDF itself

46
Reification
  • The dotted box corresponds to the following
    statements
  • x, rdfpredicate, dccreator
  • x, rdfsubject, http//www.w3.org/TR/REC-rdf-sy
    ntax
  • x, rdfobject, Dave Beckett
  • x, rdftype, rdfstatement

47
Reification
  • Reification allows a computer to process an
    abstraction as if it were any other datum
  • RDF is not really second-order
  • But it does provide a built-in predicate
    vocabulary for reification

48
Reification
  • Any statement can be an object
  • (graphs can be nested)

claims
NYT
49
RDF Schema
  • Defines small vocabulary for RDF
  • Class, subClassOf, type
  • Property, subPropertyOf
  • domain, range
  • Organizes this vocabulary in a typed hierarchy
  • Vocabulary can be used to define other
    vocabularies for your application domain

Person
subClassOf
subClassOf
range
domain
Student
Researcher
hasSuperVisor
type
type
Swap
Gail
hasSuperVisor
50
RDF Schema syntax in XML
ltrdfDescription ID"MotorVehicle"gt ltrdftype
resource"http//www.w3.org/...Class"/gt
ltrdfsubClassOf rdfresource"http//www.w3.org/..
.Resource"/gt lt/rdfDescriptiongt ltrdfDescription
ID"Truck"gt ltrdftype resource"http//www.w3
.org/...Class"/gt ltrdfsubClassOf
rdfresource"MotorVehicle"/gt lt/rdfDescriptiongt
ltrdfDescription ID"registeredTo"gt
ltrdftype resource"http//www.w3.org/...Property
"/gt ltrdfdomain rdfresource"MotorVehicle"/gt
ltrdfrange rdfresource"Person"/gt lt/rdfDes
criptiongt ltrdfDescription IDownedBy"gt
ltrdftype resource"http//www.w3.org/...Property
"/gt ltrdfsubPropertyOf rdfresource"register
edTo"/gt lt/rdfDescriptiongt
51
Conclusions about RDF
  • Next step up from plain XML
  • modeling primitives
  • possible to define vocabulary
  • However
  • no precisely described meaning
  • no inference model
  • Problematic examples
  • Columbus believed that the world is flat
  • Gloria believes that the Web should be delivered
    on CD-ROM

52
Where do we get the precisely defined meaning?
  • Two databases may use different identifiers for
    the same concept, such as zip code vs. postal
    code
  • A program that wants to compare or combine
    information across the two databases has to know
    that these two terms mean the same thing
  • The program must have a way to discover such
    common meanings for whatever databases it
    encounters
  • A solution to this problem is provided by
    collections of information called ontologies

53
Semantic Web Layers
54
What is an Ontology?
  • In philosophy, an ontology is a theory about the
    nature of existence, of what types of things
    exist ontology as a discipline studies such
    theories
  • Semantic Web researchers (and various other
    communities) have co-opted the term for their own
    jargon
  • For Semantic Web researchers, an ontology is a
    document or file that formally defines the
    relationships among terms
  • The most typical kind of ontology for the Web has
    a taxonomy and a set of inference rules

55
Menu
What is a Taxonomy?
Taxonomy segmentation, classification and
ordering of elements into a classification system
according to the relationships between each other
56
Taxonomies
  • A taxonomy defines classes of objects and
    relations among them
  • For example, an address may be defined as a type
    of location, and city codes may be defined to
    apply only to locations
  • If city codes must be of type city and cities
    generally have Web sites, we can discuss the Web
    site associated with a city code even if no
    database links a city code directly to a Web site

57
Menu
An Ontology also provides a form of Thesaurus
Object
Person
Topic
Document
Student
Researcher
Semantics
PhD Student
Doctoral Student
Ontology
F-Logic
similar
synonym
  • Terminology for specific domain
  • Graph with primitives, fixed relationships
    (similar, synonym)

58
Menu
An Ontology also provides a Topic Map
  • Topics (nodes), relationships and occurrences
    (to documents)
  • Useful for navigation and visualization

59
The Taxonomy is Augmented by Inference Rules
Object
described_in
Person
Topic
Document
writes
Researcher
Student
instance_of
Tel
Swapneel Sheth
Rules
D
T
T
D
described_in
is_about
P
D
T
P
T
knows
writes
is_about
60
Inference Rules
  • An ontology may express the rule If a city code
    is associated with a state code, and an address
    uses that city code, then that address has the
    associated state code
  • A program could then deduce, for instance, that a
    Columbia University address, being in New York
    City, must be in New York State, which is in the
    U.S., and therefore should be formatted to U.S.
    standards
  • The computer doesn't truly understand any of
    this information
  • But it can now manipulate the terms much more
    effectively in ways that are useful and
    meaningful to the human user

61
Solution to Terminology Problems
  • The meaning of terms or XML tags used on a Web
    page can be defined by pointers from the page to
    an ontology
  • The same problems as before now arise if I point
    to an ontology that defines addresses as
    containing a zip code and you point to one that
    uses postal code
  • This can be resolved if ontologies (or other Web
    services) provide equivalence relations one or
    both of our ontologies may contain the
    information that my zip code is equivalent to
    your postal code

62
Using Ontologies
  • Ontologies can be used in a simple fashion to
    improve the accuracy of Web searches
  • The search program can look for only those pages
    that refer to a precise concept instead of all
    the ones using ambiguous keywords
  • More advanced applications could use ontologies
    to relate the information on a page to the
    associated knowledge structures and inference
    rules

63
Example
  • Suppose you wish to find the Ms. Cook you met at
    a trade conference last year
  • You don't remember her first name, but you
    remember that she worked for one of your clients
    and that her brother was a student at your alma
    mater

64
Example
  • An intelligent search program can sift through
    all the pages of people whose name is Cook
  • Sidestep all the pages relating to cooks,
    cooking, the Cook Islands and so forth
  • Find the person named Cook who works for a
    company that's on your client list
  • And follow links to Web pages of their relatives
    to track down if any are in school at the right
    place

65
Agents
  • The real power of the Semantic Web will be
    realized when people create (many) programs that
    collect Web content from diverse sources, process
    the information and exchange the results with
    other programs
  • The effectiveness of such software agents will
    increase exponentially as more machine-readable
    Web content and automated services (including
    other agents) become available

66
Proofs
  • The Semantic Web promotes this synergy even
    agents that were not expressly designed to work
    together can transfer data among themselves when
    the data comes with semantics
  • An important facet of agents' functioning will be
    the exchange of proofs

67
Example
  • Suppose Ms. Cook's contact information has been
    located by an online service, and places her in
    Baghdad
  • You want to check this, so your computer asks the
    service for a proof of its answer
  • An inference engine on your computer verifies
    this proof, i.e., that this Ms. Cook indeed
    matches the one you were seeking, and it can show
    you the relevant Web pages if you still have
    doubts

68
Service Discovery
  • Many automated Web-based services already exist
    without semantics
  • But current service discovery initiatives attack
    the problem at a structural or syntactic level,
    and rely heavily on standardization of a
    predetermined set of functionality descriptions

69
Service Discovery
  • Other programs such as agents have no way to
    locate a service that will perform a specific
    function
  • This process can happen only when there is a
    common language to describe a service in a way
    that lets other agents understand both the
    function offered and how to take advantage of it
  • The consumer and producer agents can reach a
    shared understanding by exchanging ontologies,
    which provide the vocabulary needed for
    discussion
  • Semantics also makes it easier to take advantage
    of a service that only partially matches a
    request

70
Non-Web Applications
  • The Semantic Web can extend into our physical
    world
  • URIs can point to anything, including physical
    entities, which means we can use RDF to describe
    devices such as cell phones and TVs
  • Such devices can advertise their functionality
    what they can do and how they are controlled
    much like software agents
  • Semantic descriptions of device capabilities and
    functionality will let us achieve home
    automation with minimal human intervention

71
Examples
  • When you answer your phone, other sound is
    automatically turned down
  • Instead of having to program each specific
    appliance, you could program such a function once
    and for all to cover every local device that
    advertises having a volume control the TV, the
    DVD player, the media players on the laptop,
  • Your Web-enabled microwave oven consults the
    frozen-food manufacturer's Web site for optimal
    cooking parameters

72
OWL Delivers Ontologies that Work on the Web
  • What's needed next is a way to develop domain
    specific vocabularies
  • An ontology defines the terms used to describe
    and represent an area of knowledge
  • Ontologies include computer-usable definitions of
    basic concepts in the domain and the
    relationships among them, making that knowledge
    reusable

73
OWL Web Ontology Language
  • For defining structured, Web-based ontologies
    enabling richer integration and interoperability
    of data among descriptive communities
  • Uses URIs for naming
  • Uses RDF and RDF Schema for description
  • Adds vocabulary for describing relations between
    classes (e.g. disjointness), cardinality (e.g.
    "exactly one"), characteristics of properties
    (e.g. symmetry)

74
Semantic Web Layers
75
Semantic Web Layers
  • The Unicode and URI layers make sure that we use
    international character sets and provide means
    for identifying the objects in the Semantic Web
  • The XML layer with namespaces and schema
    definitions make sure we can integrate the
    Semantic Web definitions with other XML-based
    standards

76
Semantic Web Layers
  • RDF and RDFSchema make it possible to make
    statements about objects with URIs and define
    vocabularies that can be referred to by URIs
  • RDFSchema defines the XML vocabulary for defining
    classes, subclasses, properties and subproperties
  • The Ontology layer (OWL) supports the evolution
    of vocabularies as it can define relations
    between the different concepts

77
Semantic Web Layers
  • The top layers, Logic, Proof and Trust, are
    under development
  • The Logic layer will enable the writing of rules
  • The Proof layer will execute the rules
  • The Trust layer together with the Digital
    Signature layer will provide mechanisms for
    applications to determine whether to trust the
    given proof or not

78
Semantic Web Layers
Work in Progress
RFC
79
Next Assignment Midterm Paper
  • Each paper must have a title, an author (with
    contact information), a brief abstract (about 100
    words), an introductory section, some number of
    body sections (3-5 is typical), a concluding
    section, and a bibliographic list of references
    most of which are cited somewhere in the paper
  • Do not simply survey some topic  Instead compare
    this to that, argue a position in favor or
    against something, evaluate something according
    to some meaningful criteria, etc. 
  • Pretend your reader will be another member of the
    class, who has heard all the same lectures you
    have/will, but may not know anything at all about
    the specifics of your particular topic

80
Midterm Paper Academic Honesty
  • All copied material must be short and must be
    explicitly quoted and cited
  • Non-copied material based conceptually on
    references must also be cited do not
    paraphrase, write in your own words
  • Example
  • If you dont like the Android phones on the
    market, just wait a minute. 1
  • 1 David Pogue, Android Phones Take a Power
    Trip,The New York Times, online edition, February
    8, 2012, http//www.nytimes.com/2012/02/09/technol
    ogy/personaltech/android-phones-go-on-a-power-trip
    -state-of-the-art.html

81
Midterm Paper Logistics
  • Due Tuesday February 28th by 10am
  • Approximately 15 pages (not including figures and
    reference list)
  • Submit by posting in Full Papers folder on
    CourseWorks
  • Must be in a format I can read, and the filename
    must adhere to the required naming convention
    (e.g., Full_Paper_Jane_Doe.pdf).

82
Upcoming Assignments
  • Full paper due Tuesday February 28th
  • Project proposal due Tuesday March 6th
  • Presentation proposal also due Tuesday March 6th

83
COMS E6125 Web-enHanced Information Management
(WHIM)
  • Prof. Gail Kaiser
  • Spring 2012
Write a Comment
User Comments (0)
About PowerShow.com