OKKAM Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDE - PowerPoint PPT Presentation

About This Presentation

Title:

OKKAM Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDE

Description:

Ockham's Razor (14 century): 'entities should not be multiplied ... OKKAM's Razor (21 century): 'entity identifiers should not be multiplied beyond necessity' ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 20

Provided by: claudian2

Category:

more less

Transcript and Presenter's Notes

Title: OKKAM Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDE

1
OKKAM Enabling the Web of EntitiesA SCALABLE
AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND
GLOBAL IDENTIFIER REUSE IN DECENTRALIZED
INFORMATION ENVIRONMENTS

KnowDive Seminar
April 11, 2007
Trento, Italy

2
Background KR goes Global

Knowledge representation is a field which
currently seems to have the reputation of being
initially interesting, but which did not seem to
shake the world to the extent that some of its
proponents hoped.
It made sense but was of limited use on a small
scale, but never made it to the large scale. This
is exactly the state which the hypertext field
was in before the Web.
Each field had made certain centralist
assumptions -- if not in the philosophy, then in
the implementations, which prevented them from
spreading globally.
But each field was based on fundamentally sound
ideas about the representation of knowledge.
The Semantic Web is what we will get if we
perform the same globalization process to
Knowledge Representation that the Web initially
did to Hypertext. We remove the centralized
concepts of absolute truth, total knowledge, and
total provability, and see what we can do with
limited knowledge.
Tim Berners-Lee, What the Semantic Web can
represent, 1998

3
In practice
Web of Meanings
Niederee
Knows
Bouquet
Is_involved_in
Works-for
UniTN
Coordinates
L3S
VIKEF
Works-for
Web_page
Web_page
Web_page
www.unitn.it
www.trento.it
www.ryanair.com
href
href
href
www.l3s.org
href
href
www.paolobouquet.net
href
href
href
www.google.com
ockham.org
Web of Links
4
What went wrong (personal view)

The Web of Meanings (the Semantic Web) is not
happening, at least not as the WWW happened along
the 90s
Enabling factors for the Web of Links (the WWW)
Any available resource has a global URL, which
allows Web clients to address it
The same identifier can be resolved to retrieve
the resource through the HTTP protocol (running
on top of TCP/IP)
Creating href links is easy on top of this
infrastructure
What about the Web of Meanings?
Non addressable resources do not have an
infrastructure for supporting the use of global
identifiers (more about this)
Non addressable resources cannot be retrieved
Creating global links between non addressable
resources is difficult
Outcome we lack the preconditions for the Web of
Meanings to happen!

5
Further (strategic) errors

On top of these infrastructural issues, a big
strategic error was made (personal opinion!)
The AI people came in, and tried to recycle
their logical know-how on the Semantic Web
The plan was to build the Semantic Web starting
from representations (theories, currently known
as ontologies) and not from resources (entities)
This led to a scalability issue reasoning is
hard for local theories, forget about going
global! Heard about semantic heterogeneity,
ontology mapping, alignment, distributed
reasoning, ?

6
My vision

Back to the building blocks entities!
First, create the infrastructure for enabling in
practice a global space of identifiers (e.g.
URIs)
Second, show how we can create value simply from
linking globally identified entities
Third, specify vocabularies and ontologies for
(subsets of) globally identified entities
Fourth, link ontologies to each others on top of
the already integrated domain of globally
identified entities
Hopefully, this will lead to the Web of Entities,
namely a global digital space in which any
knowledge expressed in any local web of entities
can be seamlessly integrated and reasoned about

7
OKKAM overall goal

The goal of the OKKAM project is to implement the
first part of this plan.
Establishing a scalable and sustainable
infrastructure for the storage and reuse of
global identifiers for non addressable entities
in decentralized information environments
Enabling different forms of OKKAMization of old
and new content
Creating a primitive index which links global
identifiers to OKKAMized content
Building applications which can showcase the
potential value of this approach

8
But why OKKAM?

Ockham's Razor (14 century)
entities should not be multiplied beyond
necessity
OKKAMs Razor (21 century)
entity identifiers should not be multiplied
beyond necessity

9
1. Infrastructure

Cornerstone large-scale EntityRepository (ER)
Architecture distributed, supports federation of
local ERs, replicated (no single point of
failure)
ER vs. Entity Base (or Knowledge Base)
supporting reuse vs. collecting and providing
knowledge about entities
Basic schema set of attribute/value pairs
(called labels) with no predefined semantics
Features
Size unbelievable (billions of
identifiersprofiles stored
Network traffic massive (up to millions of
requests per minute)
Quality hard to ensure
Update grows monotonically (no deletion). Aging
mechanism?

10
2. OKKAMization

Enabling the runtime or ex post OKKAMization of
data in various formats (from unstructured to
structured)
Examples
Office tools (named entity recognition and
annotation)
Databases (annotating records with OKKAM ids)
Ontologies (replacing local URIs with global
URIs)
HTML pages
Objective creating the critical mass of
OKKAMized content

11
3. Indexing

The model of knowledge devolution
The ER stores only IDs simple labels
Knowledge about entities must be developed
outside
Idea use OKKAM to store and index pointers to
external resources which mention an OKKAM id
Different types of pointers
Informal pointing to a document which contains
an OKKAM id as a simple annotation for a piece of
text
Formal pointing to formal resources (e.g.
ontologies) in which an OKKAM id is used as a URI
of an instance
Using this index also for entity resolution /
matching

12
Okkam Architecture
13
Okkam Applications

Three examplary applications on top of OKKAM
infrastrucure
Entity-centric search engine
Entitity-centric organizational knowledge
management
Multimedia authoring based
Purpose
Show benefits of entity-centric approach
Trigger the development of further applications
contribute to building a community around the
OKKAM approach

14
Entity-centric search engine

Starting point Different types of OKKAMized
content collections, e.g. knowledge bases,
document collections, metadata repositories,
image collectons, etc.
Goal
enabling completely new methods for browsing and
searching large collections of data and documents
(including the Web itself)
enable new forms of intelligent entity-centric
search that exploit the OKKAMization of content
RTD Challenges
Retrieval indexing that takes into account the
OKKAM IDs
Combination of entity-centric and semantic search
Combined ranking
Adequate combination and visualization of the
results from different kinds of resources (e.g.
knowledg base document collection)
...

15
Entity-centric organizational knowledge management

Idea
Exploit OKKAM benefits in organizational context,
Managing and structuring corporate knowledge
using entity identifiers as pivots for
aggregating information not only from structured
sources, but also from poorly or non-structured
sources, like electronic documents, email
messages, slide presentations, video and audio
files, etc.
Using and interlinking a local organizational
entity repository

16
Multimedia Content Authoring

Idea
Creation of an authoring environment, which makes
use of the OKKAMization of content
Variants
authoring environment, which helps the scientific
author by providing targeted additional
information during writing process
Support for the creation of value added artefact
on the basis on OKKAMized content (text, video)
creation template for task-specific /selective)
enrichment with information about the entity
found in the content object (semantic infusion)
tool for publishers, broadcasters

17
Example Semantic Infusion
18
RTD Challenges

Building a scalable entity repository in which a
massive and growing number of entity IDs and
profiles can be collected, stored and indexed
Guaranteeing security and privacy for the data
stored in the repository
Making the repository efficiently searchable and
usable by Web users as well as through APIs
Supporting effective and reliable methods for
entity matching and for ranking results
Enabling several channels through which the
repository can be populated, either manually or
automatically (import filters, crawling,
harvesting, )
Supporting the integration of OKKAM with a
variety of content creation applications (e.g.
text editors, office applications, HTML and XML
editors, ontology editors, DBMS, etc.)
Ensuring the quality of data in the repository
Enabling a virtuous circle of trust and
collaboration with users