Title: What Next for The Semantic Web Lessons From XML'
1What Next for The Semantic Web? Lessons From XML.
- Professor Daniel P. Miranker
- Department of Computer Sciences
- University of Texas at Austin
- Austin, Texas, USA
- http//cs.utexas.edu/miranker
2In the beginning, the Semantic Web
- Very attractive
- What would a smart Internet be like
- Very confusing
- Very large
thanks Berners-Lee
- Very immature system
- Not clear what problems are trying to be solved
- Speakers/authors rarely motivate what
(sub)problem - is being solved
- why it is important
3Goals for this talk
- Give explanations of things that (I think) lead
to confusion. - Define specific open issues.
4Ontology First Lesson for the Semantic Net
- This is what makes it attractive
- Ontology is an idea from Artificial Intelligence
- A way to make smart systems
- (knowledge-based system)
- Ontology, by itself
- Large
- Broad
- Confusing
5Breadth of Ontology Definitions
- Controlled vocabulary list
- Example
- Intel Pentium 4 P4, Pentium 4, Pent 4
- Computers are very confused by synonyms
- Simply controlling vocabulary solves many
problems - Taxonomy
- Hierarchical representation of a vocabulary
- Basis of impressive A.I. programs
- Capture the knowledge of a college freshman
biology class - A computer program passes the final exam
6Ontology is a Broad Term Ac)
- Can mean simple taxonomy
- The Gene Ontology
The ontologies are structured vocabularies in the
form of directed acyclic graphs (DAGs) that
represent a network in which each term may be a
child of one or more than one parent.
part-of child is a component of the parent -
biological process, part of the Gene Ontology
is-a child is a specialized instance of the
parent - calcium adhesion is-a special kind of
adhesion
Thanks http//neo.bu.edu/be768/2004Class
7Part of Go, as a Graph, Corresponding to Last
Example
8Ontologies in A.I. Knowledge Base Systems
- Stand alone methods for sophisticated problem
solving. E.g. - Representing the a freshman course in biology.
- Solving problems on the final exam
thanks, Porter et.al
9As Knowledge is Gained, the Program Learns
- Learning Graph is reorganized, not just new
things added.
10Lesson 1 Ontology can mean many things
- All the definitions are useful.
- Powerful
- Important
- All the definitions are in common use.
- Uncommon to find people who think all the
definitions are correct. - most people insist their favorite (narrow)
definition is the only definition that should
exist. - and
- without asking, they assume you agree with them.
11Corollary to Lesson 1
- Do not even try to understand a presentation
until you first figure out - The authors meaning of ontology
- The larger problem context they are trying to
solve.
12Define the Problem (1)
- How do we use the Internet?
13Define the Problem (2)
- The Internet is a great library
-
- To use the Internet one must
- Locate things of interest
- (and move them to your machine)
- Read, analyze, integrate the information
14Define the Problem (3)
- Many kinds of things on the Internet,
- For this lecture we consider just two kinds
- Text files
- .html
- .pdf
- Data in databases
15We have (at least) four problems four issues
Which problem(s) is the Semantic Web supposed to
help solve? Answer all of them. How many of
these problems does it solve today? Answer Can
the Semantic Web contribute to improvements to
all these problems? Answer Yes How often are
authors careful to say which of these four
problems they are trying to address? Answer
Not very often
16Data Consider an Analysis Problem
- Impact of Rainfall on
- the Elevation of Central Texas Lakes
- http//www.lcra.org/water/XML_rainfall.html
17Browser Text or Data?
18Clint Eastwood Movie Review --gt text
19To You, it Looks Like Data
20In your computer it looks like
Is this data? How does this file help analyze
rainfall, and lake levels?
21Data comes from Excel, Text Files, Databases
Same information in Excel
Same information in Text File
This is data
22Third kind of content Data whose structure has
been lost
- Most Internet Content
- Starts in Databases
- Structure gets lost, when turned into html
Database
Dynamic Web Page Generator
23Three kinds of data, two kinds of users
Collections of text files
Internet
2. Generated pages DB content
Dynamic Web Page Generator
b) programs
Database
3. Pure data
24Six problems, more issues
25Locating Data
- Text
- Use a search engine
- Casual use
- Direct Data Feeds
- Often know where to look,
- Government agencies
- Business partner
- Intensive use
- If not
- Search
Collections of text files
Database
26How does the Semantic Web XML fit?
27Where does XML fit?
- XML
- 4) When applicable, XML solution works well.
- 5, 6) Possibly even eliminating 5 6 as problems
28Where does the Semantic Web fit best?
- Ontologies (the semantic web) most papers deal
with - Ontologies can be exploited to support better
search retrieval of documents. - 4) Ontologies be exploited to help data
integration.
29Lesson 2 Determine What is Motivating the Author
- Each time you read something about the Semantic
Web, - See which problem the author is actually most
worried about. - Which problem am I most worried about?
- 4 data integration
30A look at XML
- The Complete XML Infrastructure,
- also very large
- XML data files (fn.xml)
- Object definition languages (optional, 2 kinds)
- DTD document type definition, first generation.
- XSD - XML schema, second generation
- 3. Style sheets (XSLT)
- 4. Data transformation (XSL)
- query languages,
- XSL, Xquery, XPATH
- Web Services,
- WSDL/SOAP
31Things that help make XML a success
- Human readable
- Flexible compliance
- Data standards
32Human Readable
- Just like HTML, except the tags say what the
data is, both people and programs can interpret a
document - lt?xml version"1.0" encoding"ISO-8859-1"?gt
- ltnotegt
- lttogtTovelt/togt
- ltfromgtJanilt/fromgt
- ltheadinggtReminderlt/headinggt
- ltbodygtDon't forget me this weekend!lt/bodygt
- lt/notegt
33For Data In a Relational Database
An obvious XML tag structure for table data
- ltemployeegt
- ltrecordgt
- ltnamegt Miranker lt/namegt
- lttitlegtProfessor lt/titlegt
- ltagegt 49 lt/agegt
- lt/recordgt
- ltrecordgt
- ltnamegt Mao lt/namegt
- lttitlegtTeaching Assistant lt/titlegt
- ltagegt 29 lt/agegt
- lt/recordgt
- lt/employeegt
342. Flexible compliance
- well-formed XML properly nested/balanced tag
structure. - No explicit definition of tags.
- Explicit declaration of legal tags,
- (two different ways)
- Original DTD document type definition
- Second generation XML schema
- much better typing of the data, (not just tag
definition) - Built in simple data types
- Complex data types
- Correctness constraints on legal data
35Example of Well Formed XML
- ltemployeegt
- ltrecordgt
- ltnamegt Miranker lt/namegt
- lttitlegtProf. lt/titlegt
- ltagegt old lt/agegt
- lt/recordgt
- ltrecordgt
- ltnamegt Mao lt/namegt
- lttitlegtTeaching Assistant lt/titlegt
- ltagegt young lt/agegt
- lt/recordgt
- lt/employeegt
An XML schema definition was not defined
36A schema could be defined
- Partial Example
- Tag name, must contain a string
- ltxselement name"name type"xsstring"/gt
-
- Tag age, must contain an integer
- ltxselement nameage type"xsinteger"/gt
37XML-Schema allows specification of legal data
values (1)
- Restrict age, between 0 and 120
- ltxselement name"age"gt
- ltxssimpleTypegt
- ltxsrestriction base"xsinteger"gt
- ltxsminInclusive value"0"/gt
- ltxsmaxInclusive value"120"/gt
- lt/xsrestrictiongt
- lt/xssimpleTypegt
- lt/xselementgt
38Specify legal data values (2)
- Restrict titles to an enumerated list
- ltxselement nametitle"gt
- ltxssimpleTypegt
- ltxsrestriction base"xsstring"gt
- ltxsenumeration valueProfessor"/gt
- ltxsenumeration valueAssociate
Professor"/gt - ltxsenumeration valueAssistant
Professor"/gt - ltxsenumeration valueTeaching
Assistant"/gt - lt/xsrestrictiongt
- lt/xssimpleTypegt
- lt/xselementgt
393. Data Standards
- An individual may publish an XML schema or DTD
- A group of organizations agree (and publish) an
XML schema or DTD - --gt A proper data standard
40Data integration for individual schema definitions
Each Consumer of Data Must Integrate Each
Source O(n2)
41Data integration if there is a standard
Standard
- Each organization builds to the standard
- O(n) solution
- can, often does include controlled vocabulary
-
42Things that help make XML a success
- Human readable
- Flexible compliance
- Three levels of schema (tag) definition,
including none - Well-formed
- Public meta-data
- Agreed upon standard
- Given an agreed upon standard
- O(n) solution to data integration
- Agreed on vocabulary
- Complete with correctness
43Semantic Web
- Authors regular refer to three big parts of the
Semantic Web - Ontology
- RDF
- OWL
- Authors also over simplify and say
- RDF is like XML data files
- Has some truth
- OWL is the ontology language
- wrong
44The Semantic Web
- Is a very large layered system
Tim Berners-Lee Hendler, conceptual stack
45This Picture, is Misleading
- pretty
- but not good for understanding
- Ontology only here
- Rdfschema,
- grouped with RDF
- lower case
46Its all ontology
- Its all Ontology
- more powerful definitions of ontology, stacked
on weaker definitions
47The Power of Each Layer is Important
- The layers were defined by
- very smart,
- very experienced people
- Exactly what happens in each layer,
- What can be expressed
- How it is expressed
The result of many years of experience. The
detailed organization is new. All the other
ideas are old.
48RDF, a representation of labeled graphs
- Many syntactic conveniences
- Triples
-
- XML
49RDF is always a labeled graph
atagccgtacctgcgagtctagaagct
derives from
humanhemoglobin
oxygentransportprotein
is a
has 3D structure
Unified view
thanks Neuman and Quan
50Physical Aspects (computer sense)
2. Representation of Labeled Graphs
Established Internet Technology
- Global/International Standard
- Representation of strings
- Unique identifiers across the Internet
51RDF Schema (RDFS)
- The most important
- but least discussed layer
- Given RDF is a labeled graph,
- RDFS defines
- a vocabulary and
- the structure
- of the labels
52Relationship between RDF and RDF Schema Layers
- RDF Schema defines object properties, such as
- Classes and Properties
- Class Hierarchies and Inheritance
- Property Hierarchies
RDFS
- Thanks
- Grigoris Antoniou
- Frank van Harmelen,
- A Semantic Web Primer, MIT Press
RDF
53In object terminology
- RDFS defines objects
- RDF defines instances
- The graph edge connecting instances and objects,
rdftype
54What else do you notice about the RDFS example?
- Need to define member
- Children of a node in the graph.
- Because of the hierarchy children can be
- Subclasses or
- Instances
- In OWL syntax, class element
- Its a labeled graph!
- it has a representation in RDF
55Physical Aspects (computer sense)
3. Each of these layers has a representation as a
labeled graph
2. Representation of Labeled Graphs
- Global/International
- Representation of strings
- Unique identifiers across the Internet
56OWL as the Ontology Layer
- Express additional properties about classes
- How members may relate
- e.g. Must be disjoint
- A member cant be both an Assistant Professor and
an Associate Professor. - Data type properties
- e.g.
- ltowlDatatypeProperty rdfID"age"gt
- ltrdfsrange nonNegativeInteger"/gt
- lt/owlDatatypePropertygt
57Another Property Restriction Enumerations
- ltowloneOf rdfparseType"Collection"gt
- ltowlThing rdfabout"Monday"/gt
- ltowlThing rdfabout"Tuesday"/gt
-
- ltowlThing rdfabout"Sunday"/gt
- lt/owloneOfgt
- Is this starting to look familiar?
58Three layers
- Data
- Schema
- Restrictions/Constraints on legal values
- Lesson 3
- The precise layering in the semantic web is the
deep contribution. -
- The standard pretty picture is a source of
confusion -
59So how do we use ontologies to help data
integration?
- The single best paper to explain this
- Query Reformulation for Dynamic Information
Integration (1996) - Yigal Arens, Craig A. Knoblock, Wei-Min Shen
- Journal of Intelligent Information Systems -
- Special Issue on Intelligent Information
Integration - (Im not claiming this is the best system only
that if how ontologies may be used for data
integration, and you want to read exactly 1
paper, this is the 1 paper to read.)
60The Semantic Web Vision
Semantic Query Engine
Local ontology
Local ontology
Local ontology
DB
DB
DB
- How is this different then XML without standards?
61The Semantic Web Vision
Semantic Query Engine
Local ontology
Local ontology
Local ontology
DB
DB
DB
- How is this different then XML without standards?
- Replace XML-schema matching with Ontology
Matching
62Another alternative
- A group effort on a common, shared ontology.
- A kind of ontology standard
- This is happening
- De facto - certain ontologies are simply leaders
- e.g. The Gene Ontology
- Explicit, large, focussed, ontology efforts
63Finally Can the Semantic Web Work?
- I see as a big problem
- Who builds this?
- Who generates this
- mapping?
- Why will they put in the time and effort?
- Unless this becomes very cheap easy, only
- Scientists
- Military
64Conclusion
- The success of the semantic web will depend on
the ability to easily, (even trivially), either - generate local ontologies as part of a database
interface - generate mappings of local databases to global
ontologies - The competition (XML)
- a) trivial to generate an XML-schema for a
database. - b) tools like .Net, make it easy to map local
databases to standard XML-schema