Title: Digital Object Architecture: an Advanced Architecture for Managing Digital Information
1Digital Object Architecture an Advanced
Architecture for Managing Digital Information
WSIS Forum 2011 May 19, 2011
- Presentation by
- Robert E. Kahn
- President CEO
- Corporation for National Research Initiatives
2Origins of the Internet
- Multiple Different Packet Networks
- Open Architecture
- Implemented via the TCP/IP Protocols
- Standards Processes
- Sustained Research Support
- Eventually resulting in
- Commercialization
- Widespread Dissemination
- Global Acceptance
3Three Initial Networks
- DARPA originally funded three seminal packet
networks ARPANET, Packet Radio, Packet
Satellite - The Internet came about from a desire to enable
users and their computers to communicate
efficiently, independent of the network they were
using - Initial challenges were in areas such as
- Addressing
- Routing
- Congestion Control
- Host Protocols
- Addressing (16 bits to the wire, 32 bit IPv4
addresses later -- 128 bit IPv6 addresses, URLs)
4Key Initial Decisions
- Global Addresses (IP) freed us from ARPANET
addressing of the wires - Gateways introduced for IP routing and for
Network Impedance Matching now called routers - TCP dealt with network-related concerns
- different packet sizes, duplicates, error
detection, losses due to tunnels, mountains,
jamming, etc. - Enabled separate network administration
- Global information system based on an open
architecture
5From Packet Communication to Information
Management
- The Internet did not start out with a primary
goal of assisting users in managing information. - Fast, efficient, reliable, global connectivity
was the main goal - Information management was limited to ensuring
proper information flows in the Internet - The World Wide Web was an important step in
simplifying user access to information - Other alternatives are now emerging.
- We now present an open architecture approach to
information management that - Makes use of existing Internet capabilities
- allows different types of information management
systems to be developed and interoperate.
6Digital Object Architecture
- To reformulate the Internet architecture to focus
more specifically on managing information rather
than just communicating bits - Making use of its world-wide connectivity, but
independent of current technology choices - Enabling existing and new types of information to
be reliably managed and accessed in the Internet
environment, including over very long periods of
time - Providing mechanisms to stimulate dynamic new
forms of expression and to manifest older forms - Support for multi-lingual identifier names in
most native/local scripts - While supporting privacy, security, intellectual
property protection, managed access and
well-formed business practices
7Digital Object Architecture
- Technical Components
- Digital Objects (DOs)
- Structured data with a unique persistent
identifier - Resolution of the Unique Identifiers
- To state information about the DOs
- Repositories
- To deposit DOs
- To access DOs with security
- Registries
- To create and store metadata
- For secure searching
8Digital Object Architecture
User
9Selected Digital Object Types
- Documents, Books, Music, Videos, Spreadsheets
- Personal data (coordinates, financial, medical)
- Observational data (climate, radio astronomy)
- Networking Information (operations, provisioning,
forecasting) - Commerce and Business Information (contracts,
bills of lading, letters of credit, etc) - Software (programs, running processes
distributed systems) - Information about Things
10Repositories
Store and Access Digital Objects on the Net
Logical External Interface
Any Hardware Software Configuration
Digital Object Protocol
11Digital Object Protocol
- Uniform interface for accessing repositories and
their digital objects - Based on the use of identifiers
- Provides authentication of both users and servers
upon request or where required - Uses identity management based on the use of
public keys - Key means of implementing interoperability
12The Digital Object Protocol is a Meta-Level,
Extensible Interface
ltinput sequencegtltH1gt ltH2gt ltParamsgt ltoutput
sequencegt H1 is a handle for the operation
applied to the Target DO H2. Similarly both A and
B are known by their Handles HA and HB. The steps
of the protocol are
Establish a connection from A to B Optionally
A asks B to authenticate himself If successful,
A provides an input string to B Optionally B
asks A to authenticate herself B provides the
results of the operation Either party may choose
to continue or close
13Metadata Registry
- Registers the existence and access conditions for
Digital Objects - Enables collections to be defined with
appropriate access controls - Provides a user interface to browse and search
the registry, and an API for other programs to
search the registry - Integrates existing technologies
- Handle System for identification and access
- Digital Object Repository for metadata object
storage and access - XML for object description and submission
- Specification of Metadata Schemas
14CORDRA
Federation Level Metadata
CORDRA Registry Community
Content Repositories
15What are Handles?Why Resolution Systems?
- CNRI uses the name Handles to denote digital
object identifiers - Others may prefer to use their own descriptors
- Existing identifier schemes are accommodated
- Identifiers provide a way to identify data
structures independent of their physical form or
location, if any - Identifiers can be of many forms, and may contain
randomly generated strings, date-time stamps as
well as semantics - The identifier itself will not usually contain
useful information about the digital object - The resolution system is intended to make
available the useful information
16Why are identifiers Important
- For global addressing
- and possibly routing
- For long-term information preservation
- For building linkages
- In lieu of attachments
- To create virtual structures
- For accessing related metadata
- To convey search results
- To authenticate/validate
- Connectivity
- Individual Digital Objects
- Identity
17Structure of the Identifiers
- Digital Object Identifiers are structured as
prefix/suffix - They may be conveyed in various forms, such as
- 10.1234/Conf_Summary
- HDL10.1234/Conf_ Summary
- hdl.handle.net/10.1234/Conf_Summary
- Each prefix has its own administrator with PKI
access to the system for creation, change and
deletion. - Resolution of an identifier results in a returned
resolution record generally within a fraction
of a second
18Resolution Mechanism
Multiple Workstations Distributed Globally
DO Identifier Resolution Record
Handle System ltwww.handle.netgt
System is non nodal Scaleable
Distributed Supports global (and local) resolution
19Handle System Features
- Supports both Resolution and Administration
- Internationalized character sets
- Secured resolution service
- Provides for Unique Persistent Identifiers
- Current Users include
- DOI System, Open Archives Initiative, Library of
Congress, CNNIC, Office of European Publications,
DataCite, EIDR, DSpace Community and others
20Handle Resolution
GHR
21Mirroring the Global Handle Registry
Administration
M
M
P
M
M
Contains System Handle Records
user
user
user
Non-System Handle Records are in lots of Local
Handle Services
?
22Planned Deployment of aMulti-Primary Global
Registry
A limited number of primaries each Administered
Separately
Plus Mirrors
Plus Mirrors
P
P
P
P
P
Contains System Handle Records
user
user
user
Non-System Handle Records are in lots of Local
Handle Services
?
23Observations
- Identifiers provide the glue that holds complex
distributed systems together - Security can be provided at a very fine level of
granularity in the system - Repositories enable reliable long-term access to
digital objects over generations of technology
change - Registries enable digital objects to be made
known and findable using multiple metadata
schemas - The Multi-primary Global Registry enables
distributed administration on a collaborative
basis by multiple parties around the world. - Finally, DONA will provide a framework for the
management of the DO Architecture in the future.