OKKAM Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDE - PowerPoint PPT Presentation

About This Presentation
Title:

OKKAM Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDE

Description:

Ockham's Razor (14 century): 'entities should not be multiplied ... OKKAM's Razor (21 century): 'entity identifiers should not be multiplied beyond necessity' ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 20
Provided by: claudian2
Category:

less

Transcript and Presenter's Notes

Title: OKKAM Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDE


1
OKKAM Enabling the Web of EntitiesA SCALABLE
AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND
GLOBAL IDENTIFIER REUSE IN DECENTRALIZED
INFORMATION ENVIRONMENTS
  • KnowDive Seminar
  • April 11, 2007
  • Trento, Italy

2
Background KR goes Global
  • Knowledge representation is a field which
    currently seems to have the reputation of being
    initially interesting, but which did not seem to
    shake the world to the extent that some of its
    proponents hoped.
  • It made sense but was of limited use on a small
    scale, but never made it to the large scale. This
    is exactly the state which the hypertext field
    was in before the Web.
  • Each field had made certain centralist
    assumptions -- if not in the philosophy, then in
    the implementations, which prevented them from
    spreading globally.
  • But each field was based on fundamentally sound
    ideas about the representation of knowledge.
  • The Semantic Web is what we will get if we
    perform the same globalization process to
    Knowledge Representation that the Web initially
    did to Hypertext. We remove the centralized
    concepts of absolute truth, total knowledge, and
    total provability, and see what we can do with
    limited knowledge.
  • Tim Berners-Lee, What the Semantic Web can
    represent, 1998

3
In practice
Web of Meanings
Niederee
Knows
Bouquet
Is_involved_in
Works-for
UniTN
Coordinates
L3S
VIKEF
Works-for
Web_page
Web_page
Web_page
www.unitn.it
www.trento.it
www.ryanair.com
href
href
href
www.l3s.org
href
href
www.paolobouquet.net
href
href
href
www.google.com
ockham.org
Web of Links
4
What went wrong (personal view)
  • The Web of Meanings (the Semantic Web) is not
    happening, at least not as the WWW happened along
    the 90s
  • Enabling factors for the Web of Links (the WWW)
  • Any available resource has a global URL, which
    allows Web clients to address it
  • The same identifier can be resolved to retrieve
    the resource through the HTTP protocol (running
    on top of TCP/IP)
  • Creating href links is easy on top of this
    infrastructure
  • What about the Web of Meanings?
  • Non addressable resources do not have an
    infrastructure for supporting the use of global
    identifiers (more about this)
  • Non addressable resources cannot be retrieved
  • Creating global links between non addressable
    resources is difficult
  • Outcome we lack the preconditions for the Web of
    Meanings to happen!

5
Further (strategic) errors
  • On top of these infrastructural issues, a big
    strategic error was made (personal opinion!)
  • The AI people came in, and tried to recycle
    their logical know-how on the Semantic Web
  • The plan was to build the Semantic Web starting
    from representations (theories, currently known
    as ontologies) and not from resources (entities)
  • This led to a scalability issue reasoning is
    hard for local theories, forget about going
    global! Heard about semantic heterogeneity,
    ontology mapping, alignment, distributed
    reasoning, ?

6
My vision
  • Back to the building blocks entities!
  • First, create the infrastructure for enabling in
    practice a global space of identifiers (e.g.
    URIs)
  • Second, show how we can create value simply from
    linking globally identified entities
  • Third, specify vocabularies and ontologies for
    (subsets of) globally identified entities
  • Fourth, link ontologies to each others on top of
    the already integrated domain of globally
    identified entities
  • Hopefully, this will lead to the Web of Entities,
    namely a global digital space in which any
    knowledge expressed in any local web of entities
    can be seamlessly integrated and reasoned about

7
OKKAM overall goal
  • The goal of the OKKAM project is to implement the
    first part of this plan.
  • Establishing a scalable and sustainable
    infrastructure for the storage and reuse of
    global identifiers for non addressable entities
    in decentralized information environments
  • Enabling different forms of OKKAMization of old
    and new content
  • Creating a primitive index which links global
    identifiers to OKKAMized content
  • Building applications which can showcase the
    potential value of this approach

8
But why OKKAM?
  • Ockham's Razor (14 century)
  • entities should not be multiplied beyond
    necessity
  • OKKAMs Razor (21 century)
  • entity identifiers should not be multiplied
    beyond necessity

9
1. Infrastructure
  • Cornerstone large-scale EntityRepository (ER)
  • Architecture distributed, supports federation of
    local ERs, replicated (no single point of
    failure)
  • ER vs. Entity Base (or Knowledge Base)
    supporting reuse vs. collecting and providing
    knowledge about entities
  • Basic schema set of attribute/value pairs
    (called labels) with no predefined semantics
  • Features
  • Size unbelievable (billions of
    identifiersprofiles stored
  • Network traffic massive (up to millions of
    requests per minute)
  • Quality hard to ensure
  • Update grows monotonically (no deletion). Aging
    mechanism?

10
2. OKKAMization
  • Enabling the runtime or ex post OKKAMization of
    data in various formats (from unstructured to
    structured)
  • Examples
  • Office tools (named entity recognition and
    annotation)
  • Databases (annotating records with OKKAM ids)
  • Ontologies (replacing local URIs with global
    URIs)
  • HTML pages
  • Objective creating the critical mass of
    OKKAMized content

11
3. Indexing
  • The model of knowledge devolution
  • The ER stores only IDs simple labels
  • Knowledge about entities must be developed
    outside
  • Idea use OKKAM to store and index pointers to
    external resources which mention an OKKAM id
  • Different types of pointers
  • Informal pointing to a document which contains
    an OKKAM id as a simple annotation for a piece of
    text
  • Formal pointing to formal resources (e.g.
    ontologies) in which an OKKAM id is used as a URI
    of an instance
  • Using this index also for entity resolution /
    matching

12
Okkam Architecture
13
Okkam Applications
  • Three examplary applications on top of OKKAM
    infrastrucure
  • Entity-centric search engine
  • Entitity-centric organizational knowledge
    management
  • Multimedia authoring based
  • Purpose
  • Show benefits of entity-centric approach
  • Trigger the development of further applications
  • contribute to building a community around the
    OKKAM approach

14
Entity-centric search engine
  • Starting point Different types of OKKAMized
    content collections, e.g. knowledge bases,
    document collections, metadata repositories,
    image collectons, etc.
  • Goal
  • enabling completely new methods for browsing and
    searching large collections of data and documents
    (including the Web itself)
  • enable new forms of intelligent entity-centric
    search that exploit the OKKAMization of content
  • RTD Challenges
  • Retrieval indexing that takes into account the
    OKKAM IDs
  • Combination of entity-centric and semantic search
  • Combined ranking
  • Adequate combination and visualization of the
    results from different kinds of resources (e.g.
    knowledg base document collection)
  • ...

15
Entity-centric organizational knowledge management
  • Idea
  • Exploit OKKAM benefits in organizational context,
  • Managing and structuring corporate knowledge
    using entity identifiers as pivots for
    aggregating information not only from structured
    sources, but also from poorly or non-structured
    sources, like electronic documents, email
    messages, slide presentations, video and audio
    files, etc.
  • Using and interlinking a local organizational
    entity repository

16
Multimedia Content Authoring
  • Idea
  • Creation of an authoring environment, which makes
    use of the OKKAMization of content
  • Variants
  • authoring environment, which helps the scientific
    author by providing targeted additional
    information during writing process
  • Support for the creation of value added artefact
    on the basis on OKKAMized content (text, video)
  • creation template for task-specific /selective)
    enrichment with information about the entity
    found in the content object (semantic infusion)
  • tool for publishers, broadcasters

17
Example Semantic Infusion
18
RTD Challenges
  • Building a scalable entity repository in which a
    massive and growing number of entity IDs and
    profiles can be collected, stored and indexed
  • Guaranteeing security and privacy for the data
    stored in the repository
  • Making the repository efficiently searchable and
    usable by Web users as well as through APIs
  • Supporting effective and reliable methods for
    entity matching and for ranking results
  • Enabling several channels through which the
    repository can be populated, either manually or
    automatically (import filters, crawling,
    harvesting, )
  • Supporting the integration of OKKAM with a
    variety of content creation applications (e.g.
    text editors, office applications, HTML and XML
    editors, ontology editors, DBMS, etc.)
  • Ensuring the quality of data in the repository
  • Enabling a virtuous circle of trust and
    collaboration with users

19
Conclusions
  • There are many critical issues
  • Size and Performance
  • Quality of entity search and matching
  • Critical mass of data and applications
  • Trust and community building
  • Sustainability and exploitation
  • but its fun and I want to give it a try!!
Write a Comment
User Comments (0)
About PowerShow.com