What Next for The Semantic Web Lessons From XML' - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

What Next for The Semantic Web Lessons From XML'

Description:

Not clear what problems are trying to be solved ... Clint Eastwood Movie Review -- text. Miranker 3/8/07. 19. To You, it Looks Like Data ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 65
Provided by: syedhamida
Category:

less

Transcript and Presenter's Notes

Title: What Next for The Semantic Web Lessons From XML'


1
What Next for The Semantic Web? Lessons From XML.
  • Professor Daniel P. Miranker
  • Department of Computer Sciences
  • University of Texas at Austin
  • Austin, Texas, USA
  • http//cs.utexas.edu/miranker

2
In the beginning, the Semantic Web
  • Very attractive
  • What would a smart Internet be like
  • Very confusing
  • Very large

thanks Berners-Lee
  • Very immature system
  • Not clear what problems are trying to be solved
  • Speakers/authors rarely motivate what
    (sub)problem
  • is being solved
  • why it is important

3
Goals for this talk
  • Give explanations of things that (I think) lead
    to confusion.
  • Define specific open issues.

4
Ontology First Lesson for the Semantic Net
  • This is what makes it attractive
  • Ontology is an idea from Artificial Intelligence
  • A way to make smart systems
  • (knowledge-based system)
  • Ontology, by itself
  • Large
  • Broad
  • Confusing

5
Breadth of Ontology Definitions
  • Controlled vocabulary list
  • Example
  • Intel Pentium 4 P4, Pentium 4, Pent 4
  • Computers are very confused by synonyms
  • Simply controlling vocabulary solves many
    problems
  • Taxonomy
  • Hierarchical representation of a vocabulary
  • Basis of impressive A.I. programs
  • Capture the knowledge of a college freshman
    biology class
  • A computer program passes the final exam

6
Ontology is a Broad Term Ac)
  • Can mean simple taxonomy
  • The Gene Ontology

The ontologies are structured vocabularies in the
form of directed acyclic graphs (DAGs) that
represent a network in which each term may be a
child of one or more than one parent.
part-of child is a component of the parent -
biological process, part of the Gene Ontology
is-a child is a specialized instance of the
parent - calcium adhesion is-a special kind of
adhesion
Thanks http//neo.bu.edu/be768/2004Class
7
Part of Go, as a Graph, Corresponding to Last
Example
8
Ontologies in A.I. Knowledge Base Systems
  • Stand alone methods for sophisticated problem
    solving. E.g.
  • Representing the a freshman course in biology.
  • Solving problems on the final exam

thanks, Porter et.al
9
As Knowledge is Gained, the Program Learns
  • Learning Graph is reorganized, not just new
    things added.

10
Lesson 1 Ontology can mean many things
  • All the definitions are useful.
  • Powerful
  • Important
  • All the definitions are in common use.
  • Uncommon to find people who think all the
    definitions are correct.
  • most people insist their favorite (narrow)
    definition is the only definition that should
    exist.
  • and
  • without asking, they assume you agree with them.

11
Corollary to Lesson 1
  • Do not even try to understand a presentation
    until you first figure out
  • The authors meaning of ontology
  • The larger problem context they are trying to
    solve.

12
Define the Problem (1)
  • How do we use the Internet?

13
Define the Problem (2)
  • The Internet is a great library
  • To use the Internet one must
  • Locate things of interest
  • (and move them to your machine)
  • Read, analyze, integrate the information

14
Define the Problem (3)
  • Many kinds of things on the Internet,
  • For this lecture we consider just two kinds
  • Text files
  • .html
  • .pdf
  • Data in databases

15
We have (at least) four problems four issues
Which problem(s) is the Semantic Web supposed to
help solve? Answer all of them. How many of
these problems does it solve today? Answer Can
the Semantic Web contribute to improvements to
all these problems? Answer Yes How often are
authors careful to say which of these four
problems they are trying to address? Answer
Not very often
16
Data Consider an Analysis Problem
  • Impact of Rainfall on
  • the Elevation of Central Texas Lakes
  • http//www.lcra.org/water/XML_rainfall.html

17
Browser Text or Data?
  • Always confusing

18
Clint Eastwood Movie Review --gt text
19
To You, it Looks Like Data
20
In your computer it looks like
Is this data? How does this file help analyze
rainfall, and lake levels?
21
Data comes from Excel, Text Files, Databases
Same information in Excel
Same information in Text File
This is data
22
Third kind of content Data whose structure has
been lost
  • Most Internet Content
  • Starts in Databases
  • Structure gets lost, when turned into html

Database
Dynamic Web Page Generator
23
Three kinds of data, two kinds of users
  • 1. Text files
  • a) people

Collections of text files
Internet
2. Generated pages DB content
Dynamic Web Page Generator
b) programs
Database
3. Pure data
24
Six problems, more issues
25
Locating Data
  • Text
  • Use a search engine
  • Casual use
  • Direct Data Feeds
  • Often know where to look,
  • Government agencies
  • Business partner
  • Intensive use
  • If not
  • Search

Collections of text files
Database
26
How does the Semantic Web XML fit?
27
Where does XML fit?
  • XML
  • 4) When applicable, XML solution works well.
  • 5, 6) Possibly even eliminating 5 6 as problems

28
Where does the Semantic Web fit best?
  • Ontologies (the semantic web) most papers deal
    with
  • Ontologies can be exploited to support better
    search retrieval of documents.
  • 4) Ontologies be exploited to help data
    integration.

29
Lesson 2 Determine What is Motivating the Author
  • Each time you read something about the Semantic
    Web,
  • See which problem the author is actually most
    worried about.
  • Which problem am I most worried about?
  • 4 data integration

30
A look at XML
  • The Complete XML Infrastructure,
  • also very large
  • XML data files (fn.xml)
  • Object definition languages (optional, 2 kinds)
  • DTD document type definition, first generation.
  • XSD - XML schema, second generation
  • 3. Style sheets (XSLT)
  • 4. Data transformation (XSL)
  • query languages,
  • XSL, Xquery, XPATH
  • Web Services,
  • WSDL/SOAP

31
Things that help make XML a success
  • Human readable
  • Flexible compliance
  • Data standards

32
Human Readable
  • Just like HTML, except the tags say what the
    data is, both people and programs can interpret a
    document
  • lt?xml version"1.0" encoding"ISO-8859-1"?gt
  • ltnotegt
  • lttogtTovelt/togt
  • ltfromgtJanilt/fromgt
  • ltheadinggtReminderlt/headinggt
  • ltbodygtDon't forget me this weekend!lt/bodygt
  • lt/notegt

33
For Data In a Relational Database
An obvious XML tag structure for table data
  • ltemployeegt
  • ltrecordgt
  • ltnamegt Miranker lt/namegt
  • lttitlegtProfessor lt/titlegt
  • ltagegt 49 lt/agegt
  • lt/recordgt
  • ltrecordgt
  • ltnamegt Mao lt/namegt
  • lttitlegtTeaching Assistant lt/titlegt
  • ltagegt 29 lt/agegt
  • lt/recordgt
  • lt/employeegt

34
2. Flexible compliance
  • well-formed XML properly nested/balanced tag
    structure.
  • No explicit definition of tags.
  • Explicit declaration of legal tags,
  • (two different ways)
  • Original DTD document type definition
  • Second generation XML schema
  • much better typing of the data, (not just tag
    definition)
  • Built in simple data types
  • Complex data types
  • Correctness constraints on legal data

35
Example of Well Formed XML
  • ltemployeegt
  • ltrecordgt
  • ltnamegt Miranker lt/namegt
  • lttitlegtProf. lt/titlegt
  • ltagegt old lt/agegt
  • lt/recordgt
  • ltrecordgt
  • ltnamegt Mao lt/namegt
  • lttitlegtTeaching Assistant lt/titlegt
  • ltagegt young lt/agegt
  • lt/recordgt
  • lt/employeegt

An XML schema definition was not defined
36
A schema could be defined
  • Partial Example
  • Tag name, must contain a string
  • ltxselement name"name type"xsstring"/gt
  • Tag age, must contain an integer
  • ltxselement nameage type"xsinteger"/gt

37
XML-Schema allows specification of legal data
values (1)
  • Restrict age, between 0 and 120
  • ltxselement name"age"gt
  • ltxssimpleTypegt
  • ltxsrestriction base"xsinteger"gt
  • ltxsminInclusive value"0"/gt
  • ltxsmaxInclusive value"120"/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • lt/xselementgt

38
Specify legal data values (2)
  • Restrict titles to an enumerated list
  • ltxselement nametitle"gt
  • ltxssimpleTypegt
  • ltxsrestriction base"xsstring"gt
  • ltxsenumeration valueProfessor"/gt
  • ltxsenumeration valueAssociate
    Professor"/gt
  • ltxsenumeration valueAssistant
    Professor"/gt
  • ltxsenumeration valueTeaching
    Assistant"/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • lt/xselementgt

39
3. Data Standards
  • An individual may publish an XML schema or DTD
  • A group of organizations agree (and publish) an
    XML schema or DTD
  • --gt A proper data standard

40
Data integration for individual schema definitions
Each Consumer of Data Must Integrate Each
Source O(n2)
41
Data integration if there is a standard
Standard
  • Each organization builds to the standard
  • O(n) solution
  • can, often does include controlled vocabulary

42
Things that help make XML a success
  • Human readable
  • Flexible compliance
  • Three levels of schema (tag) definition,
    including none
  • Well-formed
  • Public meta-data
  • Agreed upon standard
  • Given an agreed upon standard
  • O(n) solution to data integration
  • Agreed on vocabulary
  • Complete with correctness

43
Semantic Web
  • Authors regular refer to three big parts of the
    Semantic Web
  • Ontology
  • RDF
  • OWL
  • Authors also over simplify and say
  • RDF is like XML data files
  • Has some truth
  • OWL is the ontology language
  • wrong

44
The Semantic Web
  • Is a very large layered system

Tim Berners-Lee Hendler, conceptual stack
45
This Picture, is Misleading
  • pretty
  • but not good for understanding
  • Ontology only here
  • Rdfschema,
  • grouped with RDF
  • lower case

46
Its all ontology
  • Its all Ontology
  • more powerful definitions of ontology, stacked
    on weaker definitions

47
The Power of Each Layer is Important
  • The layers were defined by
  • very smart,
  • very experienced people
  • Exactly what happens in each layer,
  • What can be expressed
  • How it is expressed

The result of many years of experience. The
detailed organization is new. All the other
ideas are old.
48
RDF, a representation of labeled graphs
  • Many syntactic conveniences
  • Triples
  • XML

49
RDF is always a labeled graph
atagccgtacctgcgagtctagaagct
derives from
humanhemoglobin
oxygentransportprotein
is a
has 3D structure
Unified view
thanks Neuman and Quan
50
Physical Aspects (computer sense)
2. Representation of Labeled Graphs
Established Internet Technology
  • Global/International Standard
  • Representation of strings
  • Unique identifiers across the Internet

51
RDF Schema (RDFS)
  • The most important
  • but least discussed layer
  • Given RDF is a labeled graph,
  • RDFS defines
  • a vocabulary and
  • the structure
  • of the labels

52
Relationship between RDF and RDF Schema Layers
  • RDF Schema defines object properties, such as
  • Classes and Properties
  • Class Hierarchies and Inheritance
  • Property Hierarchies

RDFS
  • Thanks
  • Grigoris Antoniou
  • Frank van Harmelen,
  • A Semantic Web Primer, MIT Press

RDF
53
In object terminology
  • RDFS defines objects
  • RDF defines instances
  • The graph edge connecting instances and objects,
    rdftype

54
What else do you notice about the RDFS example?
  • Need to define member
  • Children of a node in the graph.
  • Because of the hierarchy children can be
  • Subclasses or
  • Instances
  • In OWL syntax, class element
  • Its a labeled graph!
  • it has a representation in RDF

55
Physical Aspects (computer sense)
3. Each of these layers has a representation as a
labeled graph
2. Representation of Labeled Graphs
  • Global/International
  • Representation of strings
  • Unique identifiers across the Internet

56
OWL as the Ontology Layer
  • Express additional properties about classes
  • How members may relate
  • e.g. Must be disjoint
  • A member cant be both an Assistant Professor and
    an Associate Professor.
  • Data type properties
  • e.g.
  • ltowlDatatypeProperty rdfID"age"gt
  • ltrdfsrange nonNegativeInteger"/gt
  • lt/owlDatatypePropertygt

57
Another Property Restriction Enumerations
  • ltowloneOf rdfparseType"Collection"gt
  • ltowlThing rdfabout"Monday"/gt
  • ltowlThing rdfabout"Tuesday"/gt
  • ltowlThing rdfabout"Sunday"/gt
  • lt/owloneOfgt
  • Is this starting to look familiar?

58
Three layers
  • Data
  • Schema
  • Restrictions/Constraints on legal values
  • Lesson 3
  • The precise layering in the semantic web is the
    deep contribution.
  • The standard pretty picture is a source of
    confusion

59
So how do we use ontologies to help data
integration?
  • The single best paper to explain this
  • Query Reformulation for Dynamic Information
    Integration (1996) 
  • Yigal Arens, Craig A. Knoblock, Wei-Min Shen
  • Journal of Intelligent Information Systems -
  • Special Issue on Intelligent Information
    Integration
  • (Im not claiming this is the best system only
    that if how ontologies may be used for data
    integration, and you want to read exactly 1
    paper, this is the 1 paper to read.)

60
The Semantic Web Vision
Semantic Query Engine
Local ontology
Local ontology
Local ontology
DB
DB
DB
  • How is this different then XML without standards?

61
The Semantic Web Vision
Semantic Query Engine
Local ontology
Local ontology
Local ontology
DB
DB
DB
  • How is this different then XML without standards?
  • Replace XML-schema matching with Ontology
    Matching

62
Another alternative
  • A group effort on a common, shared ontology.
  • A kind of ontology standard
  • This is happening
  • De facto - certain ontologies are simply leaders
  • e.g. The Gene Ontology
  • Explicit, large, focussed, ontology efforts

63
Finally Can the Semantic Web Work?
  • I see as a big problem
  • Who builds this?
  • Who generates this
  • mapping?
  • Why will they put in the time and effort?
  • Unless this becomes very cheap easy, only
  • Scientists
  • Military

64
Conclusion
  • The success of the semantic web will depend on
    the ability to easily, (even trivially), either
  • generate local ontologies as part of a database
    interface
  • generate mappings of local databases to global
    ontologies
  • The competition (XML)
  • a) trivial to generate an XML-schema for a
    database.
  • b) tools like .Net, make it easy to map local
    databases to standard XML-schema
Write a Comment
User Comments (0)
About PowerShow.com