Digital Object Storage and Retrieval (DOSR) Vision - PowerPoint PPT Presentation

About This Presentation
Title:

Digital Object Storage and Retrieval (DOSR) Vision

Description:

Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector Disclaimer The Mundaneum In 1910 Belgians Paul Otlet and future Nobel Peace Prize laureate Henri La ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 11
Provided by: JoshAls2
Learn more at: http://www.fdis.org
Category:

less

Transcript and Presenter's Notes

Title: Digital Object Storage and Retrieval (DOSR) Vision


1
Digital Object Storage and Retrieval(DOSR)Vision
  • Josh Alspector

2
Disclaimer
This presentation discusses areas of technology
investigation and interest. It does not relate to
any existing DARPA program, nor should it be
inferred to anticipate a future DARPA program.
3
The Mundaneum
  • In 1910 Belgians Paul Otlet and future Nobel
    Peace Prize laureate Henri La Fontaine opened the
    Palais Mondial, later renamed the Mundaneum.
  • The Mundaneums mission was to collect metadata
    on every book, journal, and periodical ever
    published and record it in a card file system
    that embodied what we would call a faceted
    classification scheme. By 1934 it contained over
    15 million entries.
  • Unique identifiers included embedded links to
    related documents.
  • Staff responded to search requests received by
    post and telegraph and returned hand-copied cards
    by post.
  • In 1934 Otlet conceived a global network of
    electric telescopes that would allow people to
    search and browse through interlinked documents,
    images, audio and motion picture recordings. He
    wrote that, from his armchair, everyone will
    hear, see, participate, will even be able to
    applaud, give ovations, sing in the chorus, add
    his cries of participation to those of all the
    others.

4
DOSR Vision
  • Create a resilient, distributed, scalable, and
    secure network of information that does not
    require a completely trusted or stable network of
    processing nodes employ network overlays, and
    advanced cryptographic techniques
  • Advance the state-of-the art in automated
    metadata generation and interoperability apply
    machine learning techniques
  • Automatically get information where it is needed,
    or may be needed, using less bandwidth and
    processing. integrate user models, compact
    information retrieval encodings, and distributed
    content delivery
  • Reliably track where information goes, and where
    it came from encapsulate provenance and audit
    information in network-maintained virtual
    objects
  • Enable secure, resilient information storage,
    characterization, retrieval, and collaboration
    across barriers of time, geography, community of
    interest, technology, and administrative domain

User and Data Models
Automated Metadata Generation
What we can find defines what we can do
Photos courtesy of U.S. Army, U.S. Navy
5
Hard Problems
  • Automated metadata extraction and generation
  • DoD has many stovepipe systems with limited
    metadata
  • Automatic extraction of metadata, especially from
    non-textual information is an unsolved problem
    requiring some form of artificial intelligence
  • Email, papers, presentations, forms, databases do
    not possess a community-maintained mesh of
    reciprocal references, so Google-like search,
    relevance, and ranking algorithms do not work
  • Scalable security for sharable objects
  • Decentralized (for scalability) key distribution
    systems present security challenges
  • Protection from known cryptographic and
    corruption attacks is hard protection from
    unknown attacks is harder
  • Usable secure sharing (as convenient as email) is
    needed or system wont be used
  • Scalable, revocable group access to synchronized,
    encrypted, versioned documents is essential
  • Scalable replicated storage and parallel data
    distribution
  • Globally unique identifiers (GUIDs) for retrieval
    and update are essential, and must be
    unbreakable, verifiable, and afford scalable
    resolution of a retreivable, trackable object
  • How to track fragmented and replicated objects
    for persistence and provenance
  • Object replication for secure, scalable,
    high-bandwidth distribution (secure
    BitTorrent-style)
  • Enhance resiliency and service in network-poor,
    areas
  • Respond adaptively to service degradation for
    high-demand data and large-scale disruptions
  • Personalization, intelligent agents and user
    models
  • Intelligent agents needed to locate content near
    likely users, based on user models
  • User models based on authorization, active input
    and passive tracking

6
Key Capabilities
Object 1 Version 1
Replicas and fragments
Retrieve latest version from closest fragments or
replica
  • Architecture and protocols
  • Protocols for exchanging objects, metadata, and
    security controls
  • Mobile agents and federated requests for
    information
  • Persistence of digital objects
  • Distribute replicas and coded fragments
  • Global, persistent, verifiable, unique
    identifiers (GUIDs)
  • Version-controlled, collaborative updates
  • Trust, security and provenance
  • Authorized, authenticated access
  • Decentralized encryption for scalability
  • Verifiable provenance and tracking of all objects
  • Resilience to attacks
  • Scalability
  • Scale-free architecture
  • Decentralized, peer-to-peer techniques
  • Manage latency, consistency and security as scale
    grows
  • Metadata and search
  • Extract metadata from video, maps, images
  • Relevance feedback

Object 1 Version 2 update
Decentralized, scalable key distribution
Scalable resources, storage and participant
networks
Needed objects migrate to local server for user
7
Interesting Research Ongoing in
  • Automated metadata extraction
  • Decentralized, self-configuring, location and
    routing
  • Federated search
  • Information retrieval
  • Personalization and user models
  • Proxy re-encryption
  • Scalable security and PKI
  • Search over encrypted indexes
  • Securing resilient peer-to-peer networks

DOSR Workshop will address these areas
8
Preliminary Schedule
July 15 Posters 420 pm Break 440 pm Poster
Session 1 520 pm Poster Session 2 600 pm
Adjourn July 16 Breakouts 900 am Dr. Josh
Alspector - DOSR vision and breakout group
instructions 930 am Breakout group
discussions Noon Lunch 130 pm Brief out Group
1 200 pm Brief out Group 2 230 Break 250
Brief out Group 3 320 Brief out Group 4 345
Plenary Session 415 Adjourn
  • July 15 Talks
  • 830 am Opening remarks DARPA
  • Architecture
  • 845 am Dr. Robert Kahn - keynote address
  • 915 am Dr. Peter Lucas MAYA
  • 935 am Dr. Daniel Crichton NASA
  • 955 am Break
  • Metadata
  • 1015 am Dr. Ajay Divakaran - Sarnoff Corp.
  • 1035 am Dr. Randal Burns - JHU
  • 1055 am Dr. Shmuel Peleg - HU-J
  • 1115 am Mr. Jason Byassee - Northrop Grumman
  • Security
  • 1135 am Dr. James Allan - U. Mass-Amherst
  • 1155 am Dr. Rafail Ostrovsky UCLA
  • 1215 pm Lunch
  • 140 pm Dr. Urs Muller - Net-Scale Tech.
  • 200 pm Dr. Matt Staker - IBM Research
  • 220 pm Dr. Angelos Stavrou - Global InfoTek Inc.

9
Levels of Success
  • DoD adopts system internally
  • Portions of system are made available for
    open-source uses by Apache
  • Legal, medical, and financial records management
    firms adopt GUIDs, protocols, and system
    components
  • ISPs and media companies adopt GUIDs, protocols,
    and system components for subscription services
  • Amazon, Google and iTunes use GUIDs and
    protocols

10
Prior Art
  • Coda (CMU)
  • Cooperative File System (MIT)
  • FARSITE (Microsoft)
  • Grid (Argonne National Laboratory)
  • Lustre (now owned by Sun Microsystems)
  • OceanStore (UC Berkeley)
  • PASIS (CMU)
  • Universal Database (Maya Design)
Write a Comment
User Comments (0)
About PowerShow.com