Title: Enabling the Integrated Information Network with Fedora
1Enabling theIntegrated Information Networkwith
Fedora
Daniel DavisJohn Faure
2Goals for enabling users
- Creation and publication of new forms of
information units - Services to better enable business processes
- Knowledge environments that captures semantic and
factual relationships among information units - Promote information re-use and contextualization
- Facilitate collaborative activity and capture
information that is created as a byproduct of it
Sandy Payette - http//www.vala.org/vala2006/prog2
006.htm
3Content Trends
Documents ? Integrated Information Networks
Source Sandy Payette - http//www.vala.org/vala20
06/prog2006.htm
4System Goals
For unstructured or semi-structured data
- Evolvable (Enduring Data)
- Information Lifecycle Management
- Capture, Create, Ingest
- Store
- Manage
- Preserve
- Deliver
- Share
- Reuse
- Fuse
- Repurpose
- Incremental integration
- Decouple data from applications
- Enable agile enterprises
- Enable technology refresh
- Reduce system cost
- Guarantee enterprise continuity
- Enable collaboration
- Ensure legal and regulatory compliance
5Build an Integrated Information Network
- It is not a packaged application but a system
with many parts - A monolithic system will likely fail to support
the enterprises' mission in the long run - Is best built as the integration of a set of
re-usable components
6Concept of Operations
7Driving Requirements
- Highly Scalable System Both Small and Large
- 1 Exabyte, 10 Teraobjects
- 3 Petabyte Ingest, 1.5 Petabyte Access Per Year
- Roughly Equals 100M 100 Page Documents Per Day
- Very high degree of automation required
especially for ingest - Support hardware and software independence of
digital assets - Evolvable to use new technologies gracefully with
minimal vendor lock-in - Extensible to new types (formats) without
redesign - Support customized uses without redesign
Policy-driven system - No single point of failure, trusted storage of
content - Customizable trust models
- Protect digital rights, sensitive information,
and classified digital assets - Provide a disciplined, collaborative work system
- Preserve digital assets for as long as needed
- Low cost and enterprise grade solutions needed
8Notional Software Architecture
Users
Develop Customer Specific Apps and Workflows
Develop Customer Specific Services
Select ECM Components
COTS Workflow App
Search App
COTS System Mgmt App
COTS System Admin App
Domain App A1
Domain App A2
COTS Web App Servers
Connect To Middleware
Standard Middleware Elements Web Services, JMS,
HTTP, EJB, ESB, others
Middleware M1, M2
Legacy System Interface L1
Legacy System Interface L2
Integrate Customer Legacy Systems
Search Services
Digital Asset Services
COTS App Servers
Domain Service S1
Domain Service S2
Select Support Components
Standard Middleware Elements Web Services, JMS,
HTTP, EJB, ESB, others
Security Services
COTS Search Engine Services
COTS System Admin Services
Logging Services
COTS System Mgmt Services
COTS Workflow Services
COTS App Servers
COTS Service Registry
Legacy System L1
Legacy System L2
9Notional Component View
Records Management Application
Web Browser
SIP Builder
Submission Information Package
Email System
File System
HTML over HTTP
FTP / Media
Ingest Manager
Forms Editor
Content Model Editor
Forms Wizard
User Workbenches
Collaboration Apps
Archives Browser
Preservation Manager
Workflow Apps
System Admin Apps
Presentation Layer
Search Services
Fedora
Ingest Services
Preservation Services
Archive Services Layer
Format Translation Services
Workflow Services
System Admin Services
System Services Layer
10Typical Components
- Repository/Content Manager (JSR-170 JCP)
- Message Oriented Middleware - Enterprise Service
Bus - Workflow Business Process Execution
- Enterprise Search and Access
- Portal Access (JSR-168)
- Content Creation (Authoring) and/or Capture Tools
- Forms Creation Tools (often with XML Output)
- Security Elements and Secure Infrastructure
(LDAP, WS-Security, JAAS) - Multi-vendor Relational Database Support (usually
cannot replace current enterprise database) - Also Traditional Enterprise Information System
Applications - Turnkey Enterprise Systems Management
Infrastructure and Tools (e.g. NOCC, System
Console, Monitoring, Helpdesk, etc.) - Linux and Windows (and legacy OS), J2EE and .NET
11Relevant Technology Trends
- Peer-to-Peer Networks
- Message-oriented Middleware Enterprise Service
Bus - Service-oriented architecture
-
- Web 2.0
- Semantic Web
P2P
ESB
MOM
SOA
Web 2.0
RDF
OWL-S
OWL
Source Sandy Payette - http//www.vala.org/vala20
06/prog2006.htm
12Challenges to Address
- How can I link all my existing assets, create new
business logic, and tie it all together in a way
that automates my mission? - and drive flexibility and efficiency at the
same time?
People Partners Institutions Employees
Systems Partners Institutions
Existing Applications
Existing Databases
13MOM ESB
- Message-oriented middleware is a rock-solid
technology - Supported by all J2EE (JEE) containers
- Well supported in open source (JBoss
MQ/Messaging/ESB, ActiveMQ, OpenJMS, UberMQ,
MantaRay, Presumo, JORAM ) - Commercial (IBM MQ/ESB, Sonic, BEA Aqualogic,
Iona ) - Primary technology for Java is the Java Message
System (JMS) but there are others - Can be used in a large number of architectures
and patterns - e.g. P2P, Publish/Subscribe (Hub and Spoke,
Multicast) - Used with App Servers, standalone, and with
Heavy-weight GUI Ajax - Extends to the boundaries of the enterprise or
beyond with adapters - Manages Quality of Service (QoS)
- Enterprise Service Bus
- New Product vs. Design Pattern
- Includes Web services over JMS
- Designed to extend beyond the enterprise (someday
No QoS now)
14SOA Definition
- SOA is an architectural approach to designing,
implementing and deploying information systems
from discrete business functions, components, and
services. - SOA is targeted at maximizing
- application interoperability (i14y)
- reuse and
- sharing in a distributed environment
15SOA Definition (cont.)
- An SOA is a collection of services with
well-defined interfaces and a shared
communications model. - A service is a coarse-grained, discoverable, and
self-contained software entity that interacts
with applications and other services through a
loosely coupled, often asynchronous,
message-based communication model. - Coarse-grained refers to the tendency of services
to provide significant business process
capability, as opposed to low-level business
functions. - Discoverable refers to the fact that services can
somehow be located and their interfaces
understood (not a core SOA capability). - Self-contained refers to capabilities that do not
require context or state information of other
services, nor do they maintain state from one
request to another. - Loose coupling refers to a design principle
whereby modules have few, well-known
dependencies, and interfaces to the module are
defined to be as independent as possible from the
implementation of the module.
16IBMs SOA Reference Architecture
Business Innovation Optimization Services
Facilitate better decision-making with real-time
business information
Interaction Services
Process Services
Information Services
DevelopmentServices
IT ServiceManagement
Enable collaboration between people, processes
information
Orchestrate and automate business processes
Manage diverse data and content in a unified
manner
Integrated environment for design and creation of
solution assets
Manage and secure services, applications
resources
Partner Services
Business App Services
Access Services
Connect with trading partners
Build on a robust, scaleable, and secure services
environment
Facilitate interactions with existing information
and application assets
Infrastructure Services
Optimize throughput, availability and performance
Source IBM
17The SOA Roadmap
Source IBM
18Typical Technology Set
- J2EE-compliant Application Server
- Server Side Messaging
- JMS, JMX, SOAP, RMI
- Enterprise Service Bus
- Client Side
- Web Services
- JSP
- Java, C .Net Thick clients
- Thin Clients
- HTML/AJAX Lightweight Clients
- Database Access
- JDBC, EJB
- Workflow and Management
- BPEL, WS-, BAM
- Geospatial Standard Interfaces
- WMS, WFS
- Data Representation, Translation
- XML, XSD, XSLT, XPath, XQuery
- Registration and Discovery
- UDDI with LDAP, etc.
- Specifying Service contracts
- WSDL
- Security
- WS-Security, SAML, XACML
19Workflow Business Process Execution
- Workflow/BPE is key to policy driven agile
integration - Includes both Human-in-the-loop and
Process-to-Process models - Long-term and short-term transactions with
compensation (rollback) - BPEL (Business Process Execution Language)
appears to be the winner as a vendor-independent
standard - Does not currently support Human-in-the-loop as
an interoperable standard - Process-to-Process works well
- Promises to go beyond the enterprise boundaries
- Not well supported in open source (ActiveBPEL,
Agila) - Very difficult to use
- Look at MOM/ESB first!
- Other Workflow e.g. jBPM
- Fedora Workflow Working Group, RepoMMan
20Portal User Interface
Aggregates presentation in a distributed
architecture
- Consistency and standards
- Visual Hierarchy
- Spatial Separation
- Grouping
- Colors
- Fonts
- Icons
- Visibility of system status
- Indicates current position and progress through
task
- Help and documentation
- Informational instruction at point of need
- Match between system and the real world
- Users terminology
Recognition rather than recall Show which
document is being used from screen to screen
- User control and freedom
- User is not locked in can cancel or go back to
previous question
21Preservation
- Preservation is moving the content forward
through time as formats and the software that
uses it changes - Very similar to transformation which is needed
for agile system integration - Dependent on precise format definitions and
enabled by content models - Replication is critical to preservation of
digital assets backup is not practical in the
long run - Needs to flexibly bind and manage software to
perform preservation of the SOA and the digital
assets - Integral to the Information Life Cycle
- Notion of Level of Service
- Fedora plus a resolver is well suited for
preservation in a distributed architecture with
replication - Distributed architectures require an event driven
architecture - Fedora Preservation Working group
- Global Data Format Registry and PRONOM
22Security
- Security features are needed at the system, data,
and architecture level - System is dependent on many COTS products and
standards - Starts with identity/authorization management
within the enterprise primarily LDAP - Security Access Markup Language (SAML) and
WS-Security for SOA - Shibboleth for external users
- Fine grain access control needed for data
security --- XACML (Sun, IA) - Multiple Security Level (MSL) Architectures for
distributed systems (vs. MLS)
23Federation
A Federation
24Network Design Element
To Edge Router for the Site
All Federations Public and Private
- Ethernet Switching
- Intrusion Detection
- VPN/Data Encryption
- Secure Socket Layer
- Statefull Firewall
- Server Load Balancing
WLAN/LAN Data Networking Network Security
Component
Building Block Fiber-Channel Disk Storage
System
User Workstations
Removable Media Kiosks
Web Interaction Servers
Ingest Servers
Removable Media Distribution
Storage Area Network (Fabric)
DMZ
- Ethernet Switching
- Intrusion Detection
- Stateless Firewall
User Security Servers
Data Integrity Servers
User Presentation Servers
Application Workflow Server
Management Admin Servers
App Mgmt Security Zone
- Ethernet Switching
- Intrusion Detection
- Stateless Firewall
Database Repository Security Zone
Fedora Repository Servers
Hierarchical Storage Management
Automated Tape Library
25Processing HWCI Blade Center
Service Processor Module
Chassis
BIT/Diagnostic Module
Optical Passthru Module
Media Module
26Network HWCI - WAN/LAN Connectivity Ethernet
Switch (Cont)
- Key characteristics (Cont)
- Configure chassis provides approximately five 9s
inherent availability (.99999) - Support for three generations of blades
- Other considerations
- Chassis selected and configured based on loads
- Two and four chassis configured for HA
- Two hardware firewalls and SLB/SSLs maintain
sessions to effect stateful failover - No interruption in service when a hardware
failure occurs
27Storage HWCI Director-Class SAN Switch
- Brocade director-class SAN switch
- Key characteristics
- Blade architecture
- 16 ports per card
- 2 Gb/sec fiber channel ports currently supported
- Architecture supports 4 and 10 Gb/sec ports
- Mixed blade card configuration supported
- 128 port switch available currently with 256 port
switch available before end of CY 05 - Hardware enforced zoning
- Approximately five 9s inherent availability
(.99999) - Other considerations
- Dual switch configuration in each system instance
- Data center best practice
- Ports configured based on load and added as
needed
28Storage HWCI Building Block Fiber Channel/SATA
Disk Storage System
- IBM DS6800 Key characteristics
- Building block architecture
- 3U base and expansion chassis
- Fully redundant power and cooling
- Support for 13 expansion chassis
- 224 drive maximum
- Ideal for pay as you grow environments
- 2 controllers per system
- IBM Power microprocessors utilized
- Undoubtedly will be used for migrating additional
intelligence into SAN - Currently 4 GB mirrored cache per controller
- 4 fiber channel interfaces to SAN per controller
- 4 switched back-end connections to drives per
controller rather that redundant fiber channel
arbitrated loops (FC AL) - No single point of failure
- Global spares
- Approximately five 9s inherent availability
(.99999)
29Storage HWCI Automated Tape Library
Streamline SL8500
Exterior
Interior
30Digital Object Storage
31Conclusions
- Key themes
- Services (not a packaged application)
- Think integration
- SOA with MOM/ESB enables an evolvable, agile
architecture - Start with what you have and build an integrated
information network - Emergent Aspects
- Web/Content/Record/Storage technologies are all
converging into an integrated information network - It will be collaborative with varying kinds of
disciplined processes and trust models - People will use it in ways you have not
anticipated