Title: On Community Web Portals and the Semantic Web: A Database Perspective
1On Community Web Portals and the Semantic Web
A Database Perspective
Vassilis Christophides Computer Science
Department, University of CreteInstitute for
Computer Science - FORTHHeraklion, Crete
2Portalmania!
3Portals Classification
Existing Communities
On-line Communities
4Elements of Comparison
Horizontal Portals Vertical or Thematic Portals E-marketplaces
Scope Internet-oriented Subject-oriented Industry-oriented
Mission reference points for the general user promote access to information promote economic activity
Methods voluntary registration human or robot-driven resource collection expert selection of resources voluntary participation by companies
- Gateways to WWW resources with the aim of making
information/service research simpler and more
effective
5Portals Classification
6Common Objectives/Goals
- Community Knowledge Management
- Ranging from simple vocabularies to formal
ontologies - Aggregation/Integration of Community Content
- Ranging from unstructured (documents) to
semi-structured (web sites) and structured
information (data) - Collaboration Messaging
- Ranging from simple to advanced task management
(synchronous/ asynchronous) - System Integration Security
- Front end to application servers/ workflow
systems - User Personalization (pull) Syndicated
Content Subscription (push) - role-based access control
- information filtering (contexts/viewpoints)
- customizable information rendering
- location/time specific information
7Community Web Portals Knowledge Management
8Knowledge Processes in Corporate Communities
Generating new knowledge
Accessing knowledge from external sources
Representing knowledge in documents and databases
Embedding knowledge in services and processes
Dissemination of knowledge within organisation
Using knowledge in decision making
9Knowledge Practices in Corporate Communities
10Corporate Communities Web Portals
11Community Web Portals Resource Descriptions
Collection Static Dynamic
Individual
Metadata Type Resources Nature
Automatic Query Mediation
Manual
Content Descriptive (unstructured,
semistructured, structured)
Semantics
Manual Manual
Manual
Structure Descriptive (semistructured,
structured)
Syntax
Automatic Automatic
Automatic
Management (all kinds)
System
12Community Web Portals A Broader Functional View
Presentation Services
Multiple Style Sheets Virtual Documents
Access and Integration Services
Content Syndication
Task Management
System Management
Classification Metadata
Security Network
Application Integration
Information Services
Personalization Services
Collaboration Services
Description, Search Docs Repositories
Messaging Workflow
Annotations, Recommendations
13Corporate Communities Web Portals
14Development Process of a Community Web Portal
Evaluation
15Some Portals Market Facts
16Some Portals Market Facts
17On the Semantic Web
- Main infrastructure for supporting Community Webs
- groups of people sharing a domain of discourse
and a set of information resources (e.g., data,
documents, services) and having some common
interests/objectives - Higher Quality Web Information Services
- having data and programs described in a way that
facilitates their reuse and integration by
machines across applications
Workplace
Education
Semantic Web
Commerce
Health
18Metadata exists for Almost Anything/Everywhere
- Physical Objects, Places,
- People,
- Devices, Networks,
- Infrastructure,
- Digital Documents, Data,
- Programs,
- User Profiles, Preferences,
lttag1gt lttag2gt lttag3gt lt/tag1gt
19RDF Objectives
- Enables communities to define their own semantics
of resource descriptions - we can disagree about semantics, but share the
same infrastructure (syntax, editors, query
languages, databases, etc.) - Imposes structural constraints on the expression
of metadata in various application contexts - for consistent encoding, exchange and processing
of metadata on the Web - Facilitates development of metadata vocabularies
without central coordination - mechanisms for reusing descriptions of resources,
concepts, etc. - Focus on DBMS technology for RDF metadata
- Related W3C efforts on XML data management
20Looking at existing RDF Applications
- Publishing/News
- Biblink
- Scholarly Link Specification (Slinks)
- Rich Site Summary (RSS)
- Education/ Academic
- Common European Research Information Format
(CERIF) - Mathematics International
- Universal
- IMS Global Learning Consortium
- Cultural Heritage/ Archives/ Libraries
- Inter. Committee for Documentation Reference
Schema (CIDOC) - Research Support Libraries Colle ction Level
Description (RSLP-CLD) - EUropean Libraries Electronic Resources in
Mathematical Sciences (Euler)
- Audio-visual
- Internet Movie DataBase (IMDB)
- Ubiquitous/Mobile/Grid Computing
- Composite Capability/Preference Profile (CC/PP)
- RDF Calendar Task Force
- Scheduler Allocation Ontology(SAO)
- E-commerce
- Basic Semantic Registry (BSR)
- Real Estate Data Consortium
- Universal Standard Products and Services
Classification (UNSPSC) - Geospatial/ Environmental
- Geography Markup Language(GML)
- Costal Zone Management Ontology
- Biology/Medecine
- Gene Ontology
- Cross-domain
21Semantic Depth of Resource Descriptions
- Dictionaries and Vocabularies
- the schemas developed at this level define simple
lists of concepts and their definitions - Taxonomies
- their characteristic is that the main relation
they define between concepts is that of
specialization - Thesauri
- besides defining relations among broader/narrower
terms through the definition of hierarchies, a
thesaurus also declares relations of equivalence,
association and synonymy - Reference Models
- comprise a representation vocabulary for
referring to the concepts in the subject area and
the logical statements that describe the nature
of the terms, the relations among the terms and
the way the terms can or cannot be related to
each other
22Ontologies - What Are They?
Thesauri narrower term relation
Frames (properties)
Formal is-a
General Logical constraints
Catalog/ ID
Informal is-a
Formal instance
Disjointnes, Inverse, part-of
Terms/ glossary
Value Restrs.
23A First Classification of RDF Schemas
Application Domain Dictionary/ Vocabulary Taxonomy Thesaurus Reference Model
Cultural Heritage/Archives/Libraries Euler RSLP-CLD CIDOC
Educational/ Academic IMS Universal Mathematics International CERIF
Publishing/ News BibLink SLinkS RSS
Audio-Visual IMDB
Geospatial/ Environmental CZM GML
Biology/ Medicine Gene
E-Commerce BSR UNSPSC RED
Ubiquitous/ Mobile/Grid Computing CC/PP RDF Calendar SAO
Cross-Domain CERES/NBII Dublin Core WordNet Metanet Limber Thesaurus Top Level Ontology
24Outline
- Database issues for RDF metadata management
- The Data Independence Issue
- The Query Language Issue
- The Model Issue
- RDF Query Language RQL
- Querying Large RDF Schemas
- Filtering/Navigating Complex RDF
- descriptions
- Storing Voluminous RDF descriptions
- Alternative DB representations
- Performance Figures
- The ICS-FORTH RDFSuite
- Conclusions and remaining issues
25The Data Independence Issue
- Conceptual Level Describing resources using one
or several RDF schemas - Logical Level How RDF descriptions and schemas
are physically stored - Logical-schema Data organization using tables,
objects, etc. - Physical-schema Data organization using files,
records, indices, etc. - RDF data independence is crucial for ensuring
scalability of real-scale Semantic Web
applications
26The Query Language Issue
Querying the Semantics (RQL)
Querying the Structure (Squish)
Querying the Syntax (XQuery)
27Why a Data Model for RDF ?
- As support for physical/logical independence
- RDF can be stored in files, a native repository,
a relational database - RDF can be virtual, as a view of a repository,
integrated sources - RDF can be in memory, using data structures in C,
C, Java, etc - RDF can be streamed between processes
- To describe information content of RDF Statements
- to agree and reason about information content,
preservation - To define semantics of a data manipulation
language - A query language describes in a declarative
fashion, the mapping between an input instance of
the data model to an output instance of the data
model
28But RDF has specifics Serialization syntax
- XML attributes vs elements for RDF properties
- fname, lname
- XML flat vs nested structures of RDF statements
- Description vs. Painter elements
- RDF properties are unordered, optional, and
multivalued - 2 paints and 0 creates
- One more motivation for a data model
- isolate the user from syntactic aspects of RDF/XML
ltrdfDescription rdfIDpicasso132" fnamePablo
lnamePicassogt ltpaints rdfresource"http//
museoreinasofia.mcu.es/guernica.gif"/gt
ltpaints rdfresource"http//www.artchive.com/woma
n.jpg/gt ltrdftypegtPainterlt/rdftypegt lt/rdf
Descriptiongt ltrdfDescription rdfabout
"http//museoreinasofia.mcu.es/guernica.gif"gt
ltrdftypegtPaintinglt/rdftypegt
ltcreatedgt1937lt/createdgt lt/rdfDescriptiongt
ltrdfDescription rdfabout " http//www.artchive.
com/woman.jpg"gt ltrdftypegtPaintinglt/rdftypegt
ltcreatedgt1904lt/createdgt lt/rdfDescriptiongt
ltPainter rdfIDpicasso132"gt
ltfnamegtPablolt/fnamegt ltlnamegtPicassolt/lnamegt
ltpaintsgt ltPainting rdfabout"http//w
ww.artchive.com/woman.jpg/gt
ltcreatedgt1904lt/createdgt lt/paintsgt
ltpaintsgt ltPainting rdfabout"http//museo
reinasofia.mcu.es/guernica.gif"gt
ltcreatedgt1937lt/createdgt lt/Paintinggt
lt/paintsgt lt/Paintergt
29But RDF has specifics Schema Semantics
ltrdfsClass rdfID"Artist"/gt ltrdfsClass
rdfID"Artifact"/gt ltrdfssubClassOf
rdfresource"Artist"/gt lt/rdfsClassgt ltrdfsClas
s rdfID"Painter"gt ltrdfssubClassOf
rdfresource"Artist"/gt lt/rdfsClassgt ltrdfsClas
s rdfID"Painting"gt ltrdfssubClassOf
rdfresource"Artifact"/gt lt/rdfsClassgt ltrdfPro
perty rdfID"fname"gt ltrdfsdomain
rdfresource"Painting"/gt ltrdfsrange
rdfresource
http//www.w3.org/rdf-
datatypes.xsdString"/gt lt/rdfPropertygt
ltrdfProperty rdfID"creates"gt ltrdfsdomain
rdfresource"Artist"/gt ltrdfsrange
rdfresource"Artifact"/gt lt/rdfPropertygt ltrdfP
roperty rdfID"paints"gt ltrdfsdomain
rdfresource"Painter"/gt ltrdfsrange
rdfresource"Painting"/gt ltrdfssubPropertyOf
rdfresource"creates"/gt
lt/rdfPropertygt ltrdfProperty rdfID"created"gt
ltrdfsdomain rdfresource"Painting"/gt
ltrdfsrange rdfresource
http//www.w3.org/rdf-
datatypes.xsdDate"/gt lt/rdfPropertygt
- Distinguish between labels of nodes and edges
- Painter vs. paints
- Class and properties are organized in subsumption
hierarchies - Painter lt Artist
- Properties are inherited
- r6 may also have a creates property
- References are typed
- r2 should be of class lt Painting
- Literal values are typed
- 1937 is not a string but a date value !
30But RDF has specifics Superimposed Descriptions
rdftype
rdftype
- Resources may belong to multiple (unrelated
though isa) classes - r2 is both a Painting and an ExtResource
- Heterogeneous descriptions reminiscent of SGML
exceptions - What is the structure of Painting resources?
31RDF/S vs. Well-Known Formalisms
- Relational or Object Database Models (ODMG, SQL)
- Classes dont define table or object types
- Instances may have associated quite different
properties - Collections with heterogeneous members
- Semistructured or XML Data Models (OEM, UnQL,
YAT, XML Schema) - Labels only on nodes or edges
- Class and property subsumption is not captured
- Heterogeneous structures reminiscent to SGML
exceptions - Knowledge Representation Languages (Telos, DL,
F-Logic) - Absence of complex values and n-ary
relationships (bags, sequences)
32A Semistructured Data Model for RDF
- Graph based, unordered, edge/node-labeled (in the
style of OEM) - But what about sequences (ordered)?
33Towards a Formal Data Model for RDF
- An RDF schema is a 5-tuple RS (VS, ES, H, ?,
?) - VS a set of nodes
- ES a set of edges
- ? (?,lt) a well-formed hierarchy of names
- ? an incidence function Es ? Vs?Vs
- ? a labeling function VS ? ES ?? ??
- An RDF description base, instance of a schema RS,
is a 5-tuple RD (VD, ED, ?, ?, ?) - VD a set of nodes
- ED a set of edges
- ? an incidence function ED ? VD?VD
- ? a valuation function VD ? V
- ? a labeling function VD ? ED ?2???
- ? u ? VD, ? ? n ? C?T ?(u) ?n
- ? e ? ED u,u, ? ? p
34Why a Type System for RDF ?
- For error detection safety
- to verify that statements comply to what the
application expects - to make sure that the application accesses valid
statements - to enforce safe operations (e.g., dont do float
arithmetic on classes!) - to check that compositions of operations make
sense - For performance
- to design storage (saving space, improving
clustering, etc.) - to process queries (algebraic laws, rewriting
path expressions, etc.) - We need a full-fledged Data Definition Language
for RDF ! - RDF Schema is viewed more as an ontology
modeling tool
35Towards a Type System for RDF
- Type System
- ? ?L ?U ? ? (1? 2? n?)
- Interpretation Function
- Literal types, ?L dom(?L)
- Bag types, ? ?1, ?2,, ?n, ?1, ?2,,
?n ? V are values of type ? - Seq types, ? ?1, ?2,, ?n, ?1, ?2,,
?n ? V are values of type ? - Alt types, (1?1 2?2 n?n ) ?I, ?i
? V, 1ltiltn is a value of type ?i - c ? C, c ? ? ? ?(c)??(c) c lt c
- p ? P, p ?1, ?2 ?1 ? domain(p), ?2
? range(p)??(p) p lt p
36A Formal Data Model for RDF/S
H
Property
Class
lt
lt
?
N
?
?
?L
?
?C
T
?P
?
?
.
.
V
val,val
U
val
?
?
URI
?
S
resources
containers
literals
37Schema Constraints
- Class, Property and Type names are mutually
exlusive - C ? P ? T ?
- Literal, Resources and Container values are
mutually exclusive - L ? U ? V/U, L ?
- ? c, c? ? C
- Class is the root of class hierarchy
- c lt Class
- subClassOf relation is transitive
- c lt c?, c?lt c? ? ? c lt c? ?
- subClassOf relation is antisymmetric
- c lt c? ? c ? c?
- Domain and range of properties should be defined
and they should be unique - ?p ? P, ?!c1 ? C (c1 domain(p))
? ?!c2 ?C ? TL (c2range(p))
- ? p, p?, p?? ? P
- Property is the root of property hierarchy
- p lt Property
- subPropertyOf relation is transitive
- plt p?, p?lt p? ? ? plt p? ?
- subPropertyOf relation is antisymmetric
- p lt p? ? p ? p?
- If p is subPropertyOf of p? then domain of p is
subset of domain of p? and range of p is subset
of range of p? - p lt p ?? domain(p) ? domain(p?) ?
range(p) ? range(p?) - A reified statement should have exactly one
rdfpredicate, rdfsubject and rdfobject property
38Data Constraints
- For all values ?u ? V
- If u is a URI then it is an instance of one or
more Classes - u ? U ? ?(u) ? C
- If u is a literal then it an instance of one
and only one Literal type - u ? L ? ?(u) ? TL
- If u is a container then it an instance of one
and only one Container type - u ? V/U, L ? ?(u) ? TB S A
- For all properties ?p ? P, u1,u2 ? p
- if p belongs to the set 1, 2, 3 then u1 is an
instance of either rdfBag or rdfSeq or rdfAlt - if p ?1, 2, 3? ?(u1)?TB S A
- if p doesnt belong to 1, 2, 3, then u1
belongs to the domain of p and u2 belongs to
the range of p - if p ? P/1, 2, 3 ? ?(u1) ? domain(p) ? ?(u2)
? range(p)
39Querying RDF Descriptions An Introduction to RQL
40The RDF Query Language (RQL)
- Declarative query language for RDF description
bases - relies on a typed data model (literal container
types union types) - follows a functional approach (basic queries and
filters) - adapts the functionality of XML query languages
to RDF, but also - treats properties as self-existent individuals
- exploits taxonomies of node and edge labels
- allows querying of schemas as semistructured data
- Relational interpretation of schemas resource
descriptions - Classes (unary relations)
- Properties (binary relations)
- Containers (n-ary relations)
41A Cultural Community Resource Description Example
Portal Schema
Portal Resource Descriptions
r2 museoreinasofia.mcu.es/ guernica.jpg
r1www.rodin.fr/ thinker.gif
r4museoreinasofia.mcu.es
r3www.artchive.com/ woman.jpg
Web Resources
42Querying Large RDF Schemas with RQL
- Basic Class Queries
- topclass
- subclassof(Artist)
- subclassof(Artist)
- superclassof(Painter)
- superclassof(Painter)
- Basic Property Queries
- topproperty
- subpropertyof(creates)
- subpropertyof(creates)
- superpropertyof(paints)
- superpropertyof(paints)
- domain(creates)
- range(creates)
- Querying the RDF/S meta-schema
- Class
- Property
- Literal
43Class Property Querying
- Find the domain and range of the property creates
- seq ( domain(creates), range(creates) )
- while thanks to functional composition we can
express - subclassof ( seq ( domain(creates),
range(creates) ) 0 ) or -
- select X from
- subclassof(seq(domain(creates),
range(creates))0) X
- Which classes can appear as domain and range of
property creates - select X, Y from XcreatesY or
- select X, Y from ClassX, ClassY,
XcreatesY
- Find all properties defined on class Painting and
its superclasses - select _at_P, range(_at_P) from Painting_at_P or
- select P, range(P) from PropertyP where
domain(P)gtPainting
44Schema Navigation using RQL
- Iterate over the subclasses of class Artist
- select X from ArtistX or
- select X from subclassof(Artist)X
- Find the ranges of the property exhibited which
can be reached from a class in
the range of property creates - select Y, Z from createsY.exhibitedZ
- Find the properties that can be reached from a
range class of property creates, as well as,
their respective ranges - select from createsY._at_PZ or
- from ClassY, (Class union Literal)Z,
createsY._at_PZ
45Exporting Schemas using RQL Queries
- Find Leaf Classes (i.e., classes without
subclasses) - select C1
- from ClassC1
- where not ( C1 in (select C1
- from ClassC2
- where C2 lt C1) )
- Find all schema information (i.e., group related
superclasses and properties for each class) - select C, superclassof(C), (select P, range(P)
- from
PropertyP - where
domain(P) C) - from ClassC
46Querying Complex RDF Descriptions with RQL
- Find all resources
-
Resource - Find the resources in the extent of the property
creates - creates or
- select from XcreatesY
- Find the resources of type ExtResource and
Sculpture - ExtResource intersect Sculpture
- ExtResource minus Sculpture
- ExtResource union Sculpture
47Navigating in Description Graphs using RQL
- Find the Museum resources that have been modified
after year 2000 (i.e., data path with node and
edge labels) - select X
- from MuseumX.last_modifiedY
- where Y gt 2000-01-01
- Find the resources that have been created and
their respective titles (i.e., data path using
only edge labels) - select X, Z from createsY.titleZ
- Find the titles of exhibited resources that have
been created by a Sculptor (i.e., multiple data
paths) - select Z, W
- from Sculptor.createsY.exhibited
Z, ZtitleW
48Using Schema to Filter Resource Descriptions
- Find the Painting resources that have been
exhibited as well as the related target resources
of type ExtResource (i.e., restrict multiply
classified property target values using node
labels) - select X, Y from XPaintingexhibitedY.ExtResou
rce - Note the difference with the following path
exression - select X, Y from XPaintingexhibitedYExtResour
ce - Find modified resources which can be reached by a
property applied to the class Painting and its
subclasses (i.e., restrict property source values
using edge labels) - select _at_P, Y, Z
- from X_at_P.Ylast_modifiedZ
- where X ltPainting
49Discover the Schema of RDF Descriptions
- Find the description of a resource with URI
http//www.museum.es - select X, (select _at_P, Y
- from Z Z _at_P Y
- where X Z and X Z)
- from X X
- where X http//www.museum.es
- Find the descriptions of resources whose URI
match www.museum.es - select X, (select W, (select _at_P, Y
- from Z Z
_at_P Y - where W Z and
W Z) - from W W
- where W X)
- from Resource X
- where X like "www.museum.es"
50And if you still like triples
- Find the description of resources which are not
of type ExtResource - (
- (select X, _at_P, Y from X _at_P Y)
- union
- (select X, type, X from X X)
- )
- minus
- (
- (select X, _at_P, Y from XExtResource_at_PY)
- union
- (select X, type, ExtResource from ExtResource
X) - )
51Comparing RQL to W3C XQuery
- Find the names of those who have created
artifacts which are exhibited in Museums, along
with the Museum titles - RQL
- select Y, Z, V, R
- from Xcreates.exhibitedY.titleZ,
- Xfirst_nameV,Xlast_nameR
52Comparing RQL to W3C XQuery
53Comparing RQL to W3C XQuery
- XQuery
- LET t document("sirpac-culture-merged.rdf")//
description - FOR artist IN rdfinstance-of-class(t,
rdfpredicate-domain(t, "creates")) - LET artifact rdfjoin-on-property(t,
artist,"creates"), - museum rdfjoin-on-property(t,
artifact, "exhibited") - RETURN
- ltresultgt
- filter(artist artist/last_name
artist/first_name), - filter(museum museum/title)
- lt/resultgt
54Comparing RQL to W3C XQuery
55Comparing RQL to W3C XQuery
- XML syntactic and schematic discrepancies of
semantically equivalent RDF statements - normalized representation under the form of
merged descriptions - XQuery has no built-in knowledge of the RDF
schema information - function library that exploits the RDF schema if
the assertions of the schema are also present in
the normalized representation - Data model mismatches between XML and RDF impact
type safety of functions and queries - bag( range(Artist) ) union subclassof(Artifact)
- In RQL Type Error
- In XQuery All the subclasses of Artifact !
56- Storing RDF Descriptions RSSDB Preliminary
Performance Results
57Modeling the ODP Catalog with RDF/S
rdf http//www.w3.org/1999/02/22-rdf-syntax-ns r
dfs http//www.w3.org/2000/01/rdf-schema
related
Class
ns1 http//www.dmoz.org/topic.rdf
Recreation
Regional
Paris
Lodging
Travel
Vacation- Rentals
Hotel
Ile-de-France
related
Hotel Directories
r2
r4
r1
r3
r1 http//www.sunscale.com/france/paris/index.ht
m
58ODP Statistics
- ODP Version 16-01-2001
- 170 Mbytes of class hierarchies
- 700 Mbytes of resource descriptions
- 337,085 topics
- 16 hierarchies with
- max depth 13 ( 6.86 on average)
- max subclasses 314 ( 4.02 on average)
- 2,342,978 URIs
59Generic Representation
Resources
Triples
uri text
id int
predid int
subid int
objid int
objvalue text
1
http//www.dmoz.org/topics.rdfsHotel
6
2
1
2
http//www.dmoz.org/topics.rdfsHotel
Directories
5
3
7
3
http//www.oclc.org/dublincore.rdfstitle
5
1
8
4
http//www.dmoz.org/schema.rdfExt.Resource
5
9
2
5
http//www.w3.org/1999/02/22-rdf-syntax-nstype
3
9
SunScale
6
http//www.w3.org/2000/01/rdf-schemasubClassOf
7
http//www.w3.org/1999/02/22-rdf-syntax-nsPropert
y
8
http//www.w3.org/2000/01/rdf-schemaClass
r1
9
60Specific Representation
Namespace
Type
idint
uri text
id int
nsid int
lpart text
1
http//www.w3.org/2000/01/rdf-schema
1
1
Resource
2
2
Bag
2
http//www.w3.org/1999/02/22-rdf-syntax-ns
3
http//www.oclc.org/dublincore.rdfs
3
2
Seq
4
http//www.dmoz.org/topics.rdfs
4
String
Property
Class
id int
nsid int
lpart text
rangeid int
nsid int
lpart text
domainid int
id int
11
5
Ext.Resource
4
14
3
title
1
15
4
12
3
description
1
4
Hotel
13
4
Hotel Directories
16
5
title
11
4
SubClass
SubProperty
subid int
superid int
subid int
superid int
11
1
16
14
12
1
13
12
61DBMS Size vs. Schema Triples
- DBMS size scales linearly
with the number of schema triples
62DBMS Size vs. Data Triples
- DBMS size scales linearly
with the number of data triples
63Query Templates for RDF description bases
64Execution Time of RDF Benchmark Queries
65Comparison
- Specific Representation permits the customization
of the database representation of RDF metadata - Specific Representation outperforms the Generic
Representation for all types of queries - Q1, Q2, Q5, Q7, Q10, Q11 by a factor up to 3.73
- Q3, Q4, Q6 by a factor up to 2.8
- Q8, Q9 by a factor up to 95,538
- Generic representation pays severe penalty for
maintaining large tables (Triples, Resources) - e.g., queries Q8, Q9 require (self-) joins of
Triples, Resources
66Other Issues
- RDF Metadata Generation from Legacy Repositories
- need to capture schemas from heterogeneous
resources - RDF Schema Evolution and Metadata Revision
- to support the dynamics of resource descriptions
- RDF Repositories Distribution
- for integration with WebDAV or LDAP-like
architectures - RDF Query Languages Optimization
- for real-scale Semantic Web applications
67The ICS-FORTH RD Activities on the Semantic Web
68The C-Web Project
- EC IST Project (13479) 1999-2000
- Overall Aim Set-up methodologies and
infrastructure for fast deployment and easy
management of Web Portals for communities
requiring - effective knowledge assimilation,elicitation
- efficient query answering
- Partners INRIA(FR), FORTH(GR), EDW(IT)
- Running Application Scenario Learning Portals
for intranets or the Internet - Corporate Knowledge Servers (e.g., automobile,
telecommunications) - Memory Organizations (e.g., museums, libraries,
archives)
69Project MESMUSES
- Programme(IST) KAIII.1.4. Multimedia Content and
Tools (Access to digital collections of cultural
and scientific content) - Contract IST-2000-26074 (02/2001 07/2003)
- Partners INRIA (France),
- FINSIEL - Multimedia Services (Italy),
- ICS-FORTH (Hellas),
- ENSTB - Ecole Nationale Supérieure des
Télécommunications - Bretagne (France), VALORIS -
Group, Paris (France) - IMSS - Istituto e Museo di Storia della
Scienza, Firenze (Italy) - CSI - Cité des Sciences et de l'Industrie,
Paris (France) - EDW International, Milano (Italy)
- DET-UNIFI - University of Florence (Italy)
70C-Web Architecture
Artist
Artist
Painting
Client Tier
Museum
URL
Query Browsing Interface
Painter
Resource Description Interface
Schema Generator
http
XML/XSL
RQL
RDF/XML Schema
RDF/XML Descriptions
CWEB/Application Server
Middleware APIs
Session Manager
Logical Middle Tier
Metadata Store
RDF/XML Loader
XML/XSL Processor
Query Engine
URL Resolver
XML
XML
XML
Resources
http
XML Wrapper
Well-formed
XML enabled DBMS
Other docs
XML docs
on the Intranet
on the Web
e.g. mails,
news, reports
71The C-Web Metadata Middleware
PARENT PROCESS
APACHE
CHILD PROCESS ID
PHP
TOMCAT
VRP
COCOON
DB CONNECT
LOADER
RQL server
PostgreSQL Server
72The ICS-FORTH RDFSuite High-level and Scalable
Tools for the Semantic Web http//139.91.183.309
090/RDF/
73(No Transcript)
74The RDFSuite Main Components
- The Validating RDF Parser (VRP) Karsten Tolle
Diploma Thesis - The First RDF Parser supporting semantic
validation of both resource descriptions and
schemas - The RDF Schema Specific DataBase (RSSDB) Sophia
Alexaki MSc. Thesis - The First RDF Store using schema knowledge to
automatically generate an Object-Relational
(SQL3) representation of RDF metadata and load
resource descriptions - The RDF Query Language (RQL) Greg Karvournarakis
MSc. Thesis - The First Declarative Language for uniformly
querying RDF schemas and resource descriptions
75The RDFSuite Architecture
ICS-RSSDB
ICS-VRP
ICS-RQL Interpreter
Class
Property
Typing
p_name
domain
range
c_name
LIB C
Graph Constructor
Loading RDF Java APIs
DBMS RDF query API
JDBC
RDF Loader
VRP
Internal
SQL3 SPI functions
SubClass
RDF Model
SubProperty
SQL3
SQL3
Evaluation
Parser
class1
property
URI
creates
76Validating RDF Parser (VRP)
- Syntactic Validation
- RDF/XML syntax described in the RDF MS
Specification - Semantic Validation
- Semantic constraints derived from the RDF Schema
Specification - Implementation
- Standard compiler generator tools for Java CUP
(0.1) JFLEX (1.3.2) - 100 Java(TM) development (Java 1.2.2)
77VRP Interface
78VRP Features
- Understands embedded RDF in HTML or XML
- Full XML Schema Data Types support
- Full Unicode support
- Statement validation across several RDF/XML
namespaces - Persistent namespaces (for consistency,
optimization) - Various Output Options
- Debugging
- Serialization in files under the form of triples
and graphs - Statistics for schema characteristics
(class/property hierarchies) and resource
distribution (class population) - Easy to use as a standalone application
- No other software needs to be installed (e.g.,
XML Parsers) - Easy to integrate with other applications e.g.,
visualization tools - RDF Model Construction and Validation Java APIs
79RDF Schema Specific DataBase (RSSDB)
- Persistent RDF Store using standard database
technology - Separates schema form data information
- Distinguishes between classes and properties
- Preserves the flexibility of RDF in
- Refining schemas
- Enriching descriptions
- Using multiple schemas
- Implementation
- On top of an object-relational DBMS (SQL3) like
PostgreSql - Using JDBC Interface (2.0)
80The RDF to DBMS Loader
Extended VRP Validator
RDF Model
P1
Persistent Namespace (DBMS)
Additional Constraints
C2
C1
store()
P1
r1
r2
RDF Querying APIs
- RDF_Statement
- rdfpredicate
- rdfsubject
- rdfobject
-
- RDF_Property
- rdfsdomain
- rdfsrange
- rdfssubPropertyOf
- link_list
-
store()
store()
RDF Loading APIs
DBMS
r1
81RSSDB Interface
82RSSDB Features
- Customization of the database representation
according to - Employed meta-schemas (RDF/S, DAML-OIL)
- RDF schemas and description bases peculiarities
(number of classes vs. properties, resource
distribution per classes) - Query functionality of applications
- Scalability
- size of DBMS scales linearly with the number of
loaded triples (tested with the Open Directory
Portal comprising about 6 million triples) - incremental loading of voluminous description
bases - Easy to use as a standalone application
- Requires only JDBC-compliant ORDBMS
- Easy to integrate with other applications e.g.,
metadata servers - RDF Model Loading Update Java APIs
83RDF Query Language (RQL)
- Declarative language (like ODMG OQL) for
conceptual browsing querying of voluminous RDF
Description Bases - Easy navigation and resource discovery (using few
query terms) - Task-specific personalization of RDF description
bases (views) - Seamless querying of RDF schemas and resource
descriptions - Flexible export facilities of RDF metadata
(restructuring) - RQL fully supports
- XML Schema data types (for filtering literal
values) - grouping primitives (for constructing complex XML
results) - aggregate functions (for extracting statistics)
- recursive traversal of class and property
hierarchies (for matchmaking) - Implementation
- C development (GCC 2.95.1) on top of an ORDBMS
(Unix, Linux) - Client/Server architecture (XDR-based)
84The RDF Query Interpreter (RQL)
Syntax analysis
Query string
- Syntactical analysis (lex/yacc)
- CNF transformation
(1)
Type inference
Syntax tree under CNF
- Checks type compatibility
- Sets appropriate evaluation functions
(2)
Query string
(3)
Main
Graph construction
Query result
(4)
- Evaluation of dependencies
- Factorization functions
Query graph
Typing
DBMS RDF Query APIs
(5)
Query graph
Evaluator
Evaluation
(6)
- Defines evaluation functions
- Query Processing
Result
DBMS
85RQL Web Interface
86RQL Features
- Pushes as much as possible query evaluation to
the underlying DBMS - Benefit from robust SQL3 query engines
- Extensive use of DB indices
- Generic RDF/XML result form (Containers)
- Standard XSL/XSL processing for customized
rendering - Easy to couple with commercial ORDBMSs (Oracle,
DB2) - RDF querying APIs (SQL3/C functions)
- Easy to integrate with different Application
Servers (Zope, JetSpeed) - C or Java drivers to RQL servers
- Easy to learn and use
- One day training
87RDFSuite Summary
- RDFSuite addresses the needs of effective and
efficient RDF metadata management by providing
tools for validation, storage and querying - validation follows a formal data model and
constraints enforcing consistency of RDF schemas - scalability
- declarative query language for schema and data
querying - Ongoing efforts
- RQL query optimization
- RQL update and transactional aspects
88Thank you
Hvala
Danke
Merci
Gracias
Grazie
89University Portals
- Most of the Corporate Portal features apply to
higher education - uPortal is bridging the gap between corporate
portals and the needs of Higher Education
Institutions - One of the most complex portal applications is
instruction. Several information channels have to
be synchronized together to - present learning materials and assessments
- monitor the learners progress and adapt the
presentation to the learners knowledge - audit the progression through content
- and perhaps even simulate a process
90The Evolving Campus
91The 21Century Campus in the .com World
92The Higher Education Web World
93uPortal Hierarchy
94uPortal Interfaces
- Authentication
- Proving your identity
- Authorization
- Deciding what you can access
- Directory services
- Such as populating EduPerson
- User preferences
- Profiles, structure, themes, skins
- Channel information
- Availability and configuration