Enabling the Integrated Information Network with Fedora - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Enabling the Integrated Information Network with Fedora

Description:

Creation and publication of new forms of ' ... Roughly Equals 100M 100 Page Documents Per Day. Very high degree of automation required especially for ingest ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 32
Provided by: CBA73
Category:

less

Transcript and Presenter's Notes

Title: Enabling the Integrated Information Network with Fedora


1
Enabling theIntegrated Information Networkwith
Fedora
Daniel DavisJohn Faure
2
Goals for enabling users
  • Creation and publication of new forms of
    information units
  • Services to better enable business processes
  • Knowledge environments that captures semantic and
    factual relationships among information units
  • Promote information re-use and contextualization
  • Facilitate collaborative activity and capture
    information that is created as a byproduct of it

Sandy Payette - http//www.vala.org/vala2006/prog2
006.htm
3
Content Trends
Documents ? Integrated Information Networks
Source Sandy Payette - http//www.vala.org/vala20
06/prog2006.htm
4
System Goals
For unstructured or semi-structured data
  • Evolvable (Enduring Data)
  • Information Lifecycle Management
  • Capture, Create, Ingest
  • Store
  • Manage
  • Preserve
  • Deliver
  • Share
  • Reuse
  • Fuse
  • Repurpose
  • Incremental integration
  • Decouple data from applications
  • Enable agile enterprises
  • Enable technology refresh
  • Reduce system cost
  • Guarantee enterprise continuity
  • Enable collaboration
  • Ensure legal and regulatory compliance

5
Build an Integrated Information Network
  • It is not a packaged application but a system
    with many parts
  • A monolithic system will likely fail to support
    the enterprises' mission in the long run
  • Is best built as the integration of a set of
    re-usable components

6
Concept of Operations
7
Driving Requirements
  • Highly Scalable System Both Small and Large
  • 1 Exabyte, 10 Teraobjects
  • 3 Petabyte Ingest, 1.5 Petabyte Access Per Year
  • Roughly Equals 100M 100 Page Documents Per Day
  • Very high degree of automation required
    especially for ingest
  • Support hardware and software independence of
    digital assets
  • Evolvable to use new technologies gracefully with
    minimal vendor lock-in
  • Extensible to new types (formats) without
    redesign
  • Support customized uses without redesign
    Policy-driven system
  • No single point of failure, trusted storage of
    content
  • Customizable trust models
  • Protect digital rights, sensitive information,
    and classified digital assets
  • Provide a disciplined, collaborative work system
  • Preserve digital assets for as long as needed
  • Low cost and enterprise grade solutions needed

8
Notional Software Architecture
Users
Develop Customer Specific Apps and Workflows
Develop Customer Specific Services
Select ECM Components
COTS Workflow App
Search App
COTS System Mgmt App
COTS System Admin App
Domain App A1
Domain App A2
COTS Web App Servers
Connect To Middleware
Standard Middleware Elements Web Services, JMS,
HTTP, EJB, ESB, others
Middleware M1, M2
Legacy System Interface L1
Legacy System Interface L2
Integrate Customer Legacy Systems
Search Services
Digital Asset Services
COTS App Servers
Domain Service S1
Domain Service S2
Select Support Components
Standard Middleware Elements Web Services, JMS,
HTTP, EJB, ESB, others
Security Services
COTS Search Engine Services
COTS System Admin Services
Logging Services
COTS System Mgmt Services
COTS Workflow Services
COTS App Servers
COTS Service Registry
Legacy System L1
Legacy System L2
9
Notional Component View
Records Management Application
Web Browser
SIP Builder
Submission Information Package
Email System
File System
HTML over HTTP
FTP / Media
Ingest Manager
Forms Editor
Content Model Editor
Forms Wizard
User Workbenches
Collaboration Apps
Archives Browser
Preservation Manager
Workflow Apps
System Admin Apps
Presentation Layer
Search Services
Fedora
Ingest Services
Preservation Services
Archive Services Layer
Format Translation Services
Workflow Services
System Admin Services
System Services Layer
10
Typical Components
  • Repository/Content Manager (JSR-170 JCP)
  • Message Oriented Middleware - Enterprise Service
    Bus
  • Workflow Business Process Execution
  • Enterprise Search and Access
  • Portal Access (JSR-168)
  • Content Creation (Authoring) and/or Capture Tools
  • Forms Creation Tools (often with XML Output)
  • Security Elements and Secure Infrastructure
    (LDAP, WS-Security, JAAS)
  • Multi-vendor Relational Database Support (usually
    cannot replace current enterprise database)
  • Also Traditional Enterprise Information System
    Applications
  • Turnkey Enterprise Systems Management
    Infrastructure and Tools (e.g. NOCC, System
    Console, Monitoring, Helpdesk, etc.)
  • Linux and Windows (and legacy OS), J2EE and .NET

11
Relevant Technology Trends
  • Peer-to-Peer Networks
  • Message-oriented Middleware Enterprise Service
    Bus
  • Service-oriented architecture
  • Web 2.0
  • Semantic Web

P2P
ESB
MOM
SOA
Web 2.0
RDF
OWL-S
OWL
Source Sandy Payette - http//www.vala.org/vala20
06/prog2006.htm
12
Challenges to Address
  • How can I link all my existing assets, create new
    business logic, and tie it all together in a way
    that automates my mission?
  • and drive flexibility and efficiency at the
    same time?

People Partners Institutions Employees
Systems Partners Institutions
Existing Applications
Existing Databases
13
MOM ESB
  • Message-oriented middleware is a rock-solid
    technology
  • Supported by all J2EE (JEE) containers
  • Well supported in open source (JBoss
    MQ/Messaging/ESB, ActiveMQ, OpenJMS, UberMQ,
    MantaRay, Presumo, JORAM )
  • Commercial (IBM MQ/ESB, Sonic, BEA Aqualogic,
    Iona )
  • Primary technology for Java is the Java Message
    System (JMS) but there are others
  • Can be used in a large number of architectures
    and patterns
  • e.g. P2P, Publish/Subscribe (Hub and Spoke,
    Multicast)
  • Used with App Servers, standalone, and with
    Heavy-weight GUI Ajax
  • Extends to the boundaries of the enterprise or
    beyond with adapters
  • Manages Quality of Service (QoS)
  • Enterprise Service Bus
  • New Product vs. Design Pattern
  • Includes Web services over JMS
  • Designed to extend beyond the enterprise (someday
    No QoS now)

14
SOA Definition
  • SOA is an architectural approach to designing,
    implementing and deploying information systems
    from discrete business functions, components, and
    services.
  • SOA is targeted at maximizing
  • application interoperability (i14y)
  • reuse and
  • sharing in a distributed environment

15
SOA Definition (cont.)
  • An SOA is a collection of services with
    well-defined interfaces and a shared
    communications model.
  • A service is a coarse-grained, discoverable, and
    self-contained software entity that interacts
    with applications and other services through a
    loosely coupled, often asynchronous,
    message-based communication model.
  • Coarse-grained refers to the tendency of services
    to provide significant business process
    capability, as opposed to low-level business
    functions.
  • Discoverable refers to the fact that services can
    somehow be located and their interfaces
    understood (not a core SOA capability).
  • Self-contained refers to capabilities that do not
    require context or state information of other
    services, nor do they maintain state from one
    request to another.
  • Loose coupling refers to a design principle
    whereby modules have few, well-known
    dependencies, and interfaces to the module are
    defined to be as independent as possible from the
    implementation of the module.

16
IBMs SOA Reference Architecture
Business Innovation Optimization Services
Facilitate better decision-making with real-time
business information
Interaction Services
Process Services
Information Services
DevelopmentServices
IT ServiceManagement
Enable collaboration between people, processes
information
Orchestrate and automate business processes
Manage diverse data and content in a unified
manner
Integrated environment for design and creation of
solution assets
Manage and secure services, applications
resources
Partner Services
Business App Services
Access Services
Connect with trading partners
Build on a robust, scaleable, and secure services
environment
Facilitate interactions with existing information
and application assets
Infrastructure Services
Optimize throughput, availability and performance
Source IBM
17
The SOA Roadmap
Source IBM
18
Typical Technology Set
  • J2EE-compliant Application Server
  • Server Side Messaging
  • JMS, JMX, SOAP, RMI
  • Enterprise Service Bus
  • Client Side
  • Web Services
  • JSP
  • Java, C .Net Thick clients
  • Thin Clients
  • HTML/AJAX Lightweight Clients
  • Database Access
  • JDBC, EJB
  • Workflow and Management
  • BPEL, WS-, BAM
  • Geospatial Standard Interfaces
  • WMS, WFS
  • Data Representation, Translation
  • XML, XSD, XSLT, XPath, XQuery
  • Registration and Discovery
  • UDDI with LDAP, etc.
  • Specifying Service contracts
  • WSDL
  • Security
  • WS-Security, SAML, XACML

19
Workflow Business Process Execution
  • Workflow/BPE is key to policy driven agile
    integration
  • Includes both Human-in-the-loop and
    Process-to-Process models
  • Long-term and short-term transactions with
    compensation (rollback)
  • BPEL (Business Process Execution Language)
    appears to be the winner as a vendor-independent
    standard
  • Does not currently support Human-in-the-loop as
    an interoperable standard
  • Process-to-Process works well
  • Promises to go beyond the enterprise boundaries
  • Not well supported in open source (ActiveBPEL,
    Agila)
  • Very difficult to use
  • Look at MOM/ESB first!
  • Other Workflow e.g. jBPM
  • Fedora Workflow Working Group, RepoMMan

20
Portal User Interface
Aggregates presentation in a distributed
architecture
  • Consistency and standards
  • Visual Hierarchy
  • Spatial Separation
  • Grouping
  • Colors
  • Fonts
  • Icons
  • Visibility of system status
  • Indicates current position and progress through
    task
  • Help and documentation
  • Informational instruction at point of need
  • Match between system and the real world
  • Users terminology

Recognition rather than recall Show which
document is being used from screen to screen
  • User control and freedom
  • User is not locked in can cancel or go back to
    previous question

21
Preservation
  • Preservation is moving the content forward
    through time as formats and the software that
    uses it changes
  • Very similar to transformation which is needed
    for agile system integration
  • Dependent on precise format definitions and
    enabled by content models
  • Replication is critical to preservation of
    digital assets backup is not practical in the
    long run
  • Needs to flexibly bind and manage software to
    perform preservation of the SOA and the digital
    assets
  • Integral to the Information Life Cycle
  • Notion of Level of Service
  • Fedora plus a resolver is well suited for
    preservation in a distributed architecture with
    replication
  • Distributed architectures require an event driven
    architecture
  • Fedora Preservation Working group
  • Global Data Format Registry and PRONOM

22
Security
  • Security features are needed at the system, data,
    and architecture level
  • System is dependent on many COTS products and
    standards
  • Starts with identity/authorization management
    within the enterprise primarily LDAP
  • Security Access Markup Language (SAML) and
    WS-Security for SOA
  • Shibboleth for external users
  • Fine grain access control needed for data
    security --- XACML (Sun, IA)
  • Multiple Security Level (MSL) Architectures for
    distributed systems (vs. MLS)

23
Federation
A Federation
24
Network Design Element
To Edge Router for the Site
All Federations Public and Private
  • Ethernet Switching
  • Intrusion Detection
  • VPN/Data Encryption
  • Secure Socket Layer
  • Statefull Firewall
  • Server Load Balancing
  • Anti-Virus

WLAN/LAN Data Networking Network Security
Component
Building Block Fiber-Channel Disk Storage
System
User Workstations
Removable Media Kiosks
Web Interaction Servers
Ingest Servers
Removable Media Distribution
Storage Area Network (Fabric)
DMZ
  • Ethernet Switching
  • Intrusion Detection
  • Stateless Firewall

User Security Servers
Data Integrity Servers
User Presentation Servers
Application Workflow Server
Management Admin Servers
App Mgmt Security Zone
  • Ethernet Switching
  • Intrusion Detection
  • Stateless Firewall

Database Repository Security Zone
Fedora Repository Servers
Hierarchical Storage Management
Automated Tape Library
25
Processing HWCI Blade Center
Service Processor Module
Chassis
BIT/Diagnostic Module
Optical Passthru Module
Media Module
26
Network HWCI - WAN/LAN Connectivity Ethernet
Switch (Cont)
  • Key characteristics (Cont)
  • Configure chassis provides approximately five 9s
    inherent availability (.99999)
  • Support for three generations of blades
  • Other considerations
  • Chassis selected and configured based on loads
  • Two and four chassis configured for HA
  • Two hardware firewalls and SLB/SSLs maintain
    sessions to effect stateful failover
  • No interruption in service when a hardware
    failure occurs

27
Storage HWCI Director-Class SAN Switch
  • Brocade director-class SAN switch
  • Key characteristics
  • Blade architecture
  • 16 ports per card
  • 2 Gb/sec fiber channel ports currently supported
  • Architecture supports 4 and 10 Gb/sec ports
  • Mixed blade card configuration supported
  • 128 port switch available currently with 256 port
    switch available before end of CY 05
  • Hardware enforced zoning
  • Approximately five 9s inherent availability
    (.99999)
  • Other considerations
  • Dual switch configuration in each system instance
  • Data center best practice
  • Ports configured based on load and added as
    needed

28
Storage HWCI Building Block Fiber Channel/SATA
Disk Storage System
  • IBM DS6800 Key characteristics
  • Building block architecture
  • 3U base and expansion chassis
  • Fully redundant power and cooling
  • Support for 13 expansion chassis
  • 224 drive maximum
  • Ideal for pay as you grow environments
  • 2 controllers per system
  • IBM Power microprocessors utilized
  • Undoubtedly will be used for migrating additional
    intelligence into SAN
  • Currently 4 GB mirrored cache per controller
  • 4 fiber channel interfaces to SAN per controller
  • 4 switched back-end connections to drives per
    controller rather that redundant fiber channel
    arbitrated loops (FC AL)
  • No single point of failure
  • Global spares
  • Approximately five 9s inherent availability
    (.99999)

29
Storage HWCI Automated Tape Library
Streamline SL8500
Exterior
Interior
30
Digital Object Storage
31
Conclusions
  • Key themes
  • Services (not a packaged application)
  • Think integration
  • SOA with MOM/ESB enables an evolvable, agile
    architecture
  • Start with what you have and build an integrated
    information network
  • Emergent Aspects
  • Web/Content/Record/Storage technologies are all
    converging into an integrated information network
  • It will be collaborative with varying kinds of
    disciplined processes and trust models
  • People will use it in ways you have not
    anticipated
Write a Comment
User Comments (0)
About PowerShow.com