Title: Case Study Distributed Data Integration Framework Roger Ruttimann Lead Engineer Enterprise Systems, GroundWork Opensource Inc.
1Case StudyDistributed Data Integration
FrameworkRoger Ruttimann Lead Engineer
Enterprise Systems, GroundWork Opensource Inc.
4th International Conference on Computer Science
and its Applications (ICCSA-2006)
2Objective
- Overview of integration of Open Source projects
into the development process - Design, risk assessment, and implementation of a
new product, leveraging OSS as much as possible - Discuss problems with this approach
3Agenda Details
- Case study of the development process for a Data
Integration Framework for Monitoring - Project requirements overview
- Design
- Risk assessment
- Implementation
- Encountered problems / issues
- Project life cycle and project maintenance
- Lessons learned
- Q A
4Project overview
Overview
- The company offers support and installation
assistance for an Open Source Monitoring system.
One component is the Open Source Project called
Nagios. - Runs on Linux/Unix only
- Data storage in text files
- UI compiled (C) classes parsing through text
files - Hard to scale and limited possibilities to
improve User Interface - The limitations to scale out and the User
Interface are the two major issues hindering the
adaption in larger installations
5Project Requirements
Overview
- The goal was to come up with a framework that
- leverages the core features of Nagios such as the
monitor-plugins, scheduler and the notification
engine. - Extends the UI and the back end so that it can be
deployed into larger data centers. - Has a generic data model so that other monitoring
data can be integrated. - Uses an enterprise-type back end, including
fail-over, load-balancing and high throughput.
6Mission The CTO said...
Overview
- Enable integration with multiple open source and
commercial monitoring tools - Provide a platform for a unified enterprise-class
solution - Provide real open source flexibility and
extensibility - Publish the Monitor Data Integration Framework as
an Open Source project so that outside developers
can contribute
7Development Constraints
Design Phase
We are a startup company with limited development
resources and an aggressive schedule - we had to
use existing components. As a new company we
didn't have legacy libraries for re-use. The best
alternative was to leverage Open Source
components as much as possible.
8Final Feature set
Design Phase
- Cross-platform application written in Java
- Data exchange with XML feeder framework
- Pluggable data normalization components
- Java, Perl and PHP APIs for accessing data
- Property-driven data structure for great
flexibility
9(No Transcript)
10API Layer
Design Phase
Lightweight Object Container
Java API
PHP API
Perl API
Data Access Objects (DAO)
Object Relational Bridge
Data Model
11Data Feeder / data normalization layer
Design Phase
Data Model
Lightweight Object Container
Object Relational Bridge
Data Access Objects (DAO)
Adapter Normalizer
Adapter Normalizer
Adapter Normalizer
Adapter Normalizer
Listener / Message dispatcher
XML Message
Feeder Perl script
Feeder PHPscript
Feeder JMS
Feeder C/C
Feeder VB
12Common Data model
Design Phase
Application Programming Interfaces
Common Data Model
Event Data
Log Data
Properties
Properties
State Data
Properties
Collector Normalizers
13How to choose the components?
Evaluation / Risk assessment
- Choose point solutions with minimal dependencies
- Business layer should be database agnostic
- persistence layer should not depend on specific
transaction managers or connection pools - Multiple projects with same functionality
available - Easier to replace component if problems occur
- License compatibility
14Choosing the Business Logic to database bridge
Evaluation / Risk assessment
- Requirements
- Database agnostic. Not using stored procedures
- Property based data model requires a lot of cross
tables joins to insert and retrieve data.
Developers are used to manipulate objects rather
than record sets. - For performance reasons a cache is required.
- Data consistency requires Transaction support
- Hibernate -- www.hibernate.org
- High performance object/relational persistence
and query service. - Most popular and stable O/R persistence tool
- Online documentation and books available.
- Active mailing lists and forums
15Choosing the Database
Evaluation / Risk assessment
- Requirements
- Easy to install
- popular and accepted
- multi platform support
- MySQL -- dev.mysql.com
- Most popular Open Source database
- Easy to install and to maintain
- Download, install and up-and-running in 15
Minutes - Online documentation and books available.
- Active mailing lists and forums
16Choosing the Lightweight object container
Evaluation / Risk assessment
- Requirements
- Framework to manage Java Bean objects creation
and maintenance - minimal configuration at run time
- Flexible to support aspect oriented programming
(aop) and transaction management - Spring -- www.springframework.org/
- Lightweight container far smaller footprint than
any available J2EE container. - Configuration through XML format assemblies that
can be injected at any time. - Seamless integration of Hibernate for transaction
management. - Online documentation and books available.
- Active mailing lists and forums
17Risk assessment
Evaluation / Risk assessment
- Choose popular and well documented projects
- Monitor forums to observe common user issues
- Large traffic alone doesn't indicate successful
project - Consider only stable and documented features
- Do extensive evaluation of core components but
not tool/utilities components
Even following these rules doesn't prevent you
from surprises. Unstable fast changing projects
can negatively affect your overall schedule
18(No Transcript)
19Encountered issues / problems
Implementation
- Java version. Clients were still running Java 1.3
or Java 1.4.x. Java 5 offers improvements that we
couldn't leverage. - By design all components are loosely coupled and
therefore replaceable. This requires more upfront
work to design the communication interfaces. - Documentation needs to be written!
- Training of staff installing and supporting the
framework. - Overhead of following Open Source projects to be
informed about updates/problems that might affect
the project
20Project Lifecycle
Project Lifecycle
- Feedback from the field needs to be integrated
- Improvements / bugfixes from the various Open
Source packages need to be evaluated and
integrated. - Constant risk evaluation when integrating third
party packages - Evaluate new Feature requests
- How do they fit into the framework
- Is there an Open Source package available
- What's the license?
- Can we integrate it easily? How much custom code?
21Release
Project Lifecycle
- Data integration Framework was released to Open
Source as GroundWork Foundation - http//gwfoundation.sf.net
- Used as a part of GroundWork Monitor Professional
- Customized by other users to store state and
event information not directly related to
infrastructure monitoring. - Development goes on Milestone-Releases available
- Since the project is public, developers have a
responsibility to support users and guarantee
stability
22Did the chosen approach work out?
Project Lifecycle
- Can we extend current design based on Open Source
components? - Is the maintenance manageable since we integrated
so many Open Source packages with their own
lifecycle? - Is the built in flexibility really needed?
23First design challenge Adding new features
Project Lifecycle
- Integration of new features
- Remote API (WebService)
- Higher throughput. Feed 500-1000 Message/sec
- Integration of other Monitor systems such as JMX
24(No Transcript)
25Second design challange Open Source package
upgrades
Project Lifecycle
- Upgrade of core components
- Hibernate update to version 3.1 (EJB 3.0
compliant) - Springframework update to 2.0 (JMX
support/enhanced aop) - Upgrade to Java 5
- Open Source packages have dependencies
- Log4j, commons, XML parsers,..
- Have unit tests in place to catch any differences
and incompatibilities early - Even if the upgrade is a drop-in update you
should leverage any new features and improvements - Once again check the forums and the mailing
lists!
26Conclusion
Lessons learned
- Without the usage of available Open Source
components we wouldn't have been able to meet the
aggressive release schedule. - Open Source Project evaluation and project
monitoring needs to be built into development
schedule - Mailing lists are a great help
- Constant learning projects change fast
- Cleaner code since code is public developer
pride!
27More Info
- Foundation Project
- http//gwfoundation.sf.net
- GroundWork Monitor Open Source
- http//www.groundworkopensource.com/downloads
- Contact
- Roger Ruttimann
- GroundWork Open Source, Inc.
- 139 Townsend Street, Suite 100
- San Francisco, CA 94107
- rruttimann_at_groundworkopensource.com