Design Guidelines for Large Message-based EAI Systems (A Case Study)

About This Presentation

Title:

Design Guidelines for Large Message-based EAI Systems (A Case Study)

Description:

Design Guidelines for Large Message-based EAI Systems (A Case Study) Jim White ... ETL is great for certain parts of a large solution ... – PowerPoint PPT presentation

Number of Views:176

Avg rating:3.0/5.0

Slides: 59

Provided by: cmpm7

Category:

more less

Transcript and Presenter's Notes

Title: Design Guidelines for Large Message-based EAI Systems (A Case Study)

1
Design Guidelines for Large Message-based EAI
Systems(A Case Study)

Jim White
Director of Training
Intertech, Inc.
St. Paul, MN
jwhite_at_intertech.com

2
(No Transcript)
3
This Talk

Presents an EAI case study
A very large EAI system for a retail chain.
Identify issues and challenges encountered in the
project
Identifies lessons learned and recommendations
for your EAI projects.
Lets you know others do have it as bad as you.
The story does have a happy ending
Maybe providing hope to the hopeless.

4
How many of you

Are actively working on an EAI project?
Have been on an EAI project in the past?
Plan on being on an EAI project in the near
future?
Have no idea what EAI is but it sounded like a
good topic to help put me to sleep after lunch??

5
How many of you

Are project architects?
Are technical project leads?
Are developers/designers of systems?
Are managers?
Are testers/QA/support folks?
Cant remember after 4 days of the conference?

6
First Off Whats EAI?

From Wikipedia the integration of data
between applications in a company.
wMUsers technology that connects
enterprise-wise systems evolved to refer to
technologies used to connect systems anywhere
Hohpe/Woolf enterprise integration using
messaging

7
EAI Messaging

Enables data or commands to be sent across the
network
using a send and forget approach

8
Messaging How?

Message-oriented middleware (MOM) like that
offered by
IBM WebSphere MQ
Microsoft BizTalk
TIBCO
WebMethods
SeeBeyond (now Sun owned)
Vitria
and others
Using
Java Message Service (JMS)
Microsofts Message Queuing (MSMQ) and/or
Messaging libraries in Microsoft .NET
Web services standards that support asynchronous
Web services
WS-ReliableMessaging
Suns Java API for XML Messaging (JAXM)
Microsofts Web Services Extensions (WSE).

9
Why large EAI is different?

Messaging/EAI development ? Web or other
distributed app development
Especially when very large
Many new or significantly altered considerations
Requirement differences
Time and space needs
Process control/orchestration
Failure handling
Monitoring
Proprietary nature of vendor solutions
Support turnover
Staffing needs

10
The Case Study

A major retail chain has dozens of distribution
centers
Each distribution center or warehouse services
hundreds of stores (gt1200 total stores).
Each distribution center is moving thousands of
cartons (i.e. boxes) around the warehouse each
day
Receiving them from trucks through dock doors.
Moving them with fork lifts to storage areas in
the warehouse
Conveying them to break down areas for
distribution to stores.
Conveying them down belts to storage areas or
outbound trucks.
Moving them onto trucks that depart the warehouse.

11
The Case Study

A box is tracked via labels and bar code readers.
Some reads are manual and some are automated.
Generating literally hundreds of events per
second per warehouse.
RFID was about to make create more events.
More reads from more points in the warehouse.
Potentially adding store reads to the event
list.

12
The System

Part I
The retail chain wanted all the data on events
regarding the movement of cartons sent to HQ
Providing them with unparalleled real time
information on inventory levels and product
status.
Providing more accurate information for
merchandise analyst and productivity monitoring
for warehouse managers.
Part II (Not germane to the discussion today)
Providing a Java Web application to nearly 10,000
users to access the data company wide.
Reports galore.
Some limited ad hoc query reporting.

13
Lets do the math

gt25 warehouses
Each generating 15-20 carton events per second
Averaging 400 messages a second incoming at HQ
Peak around 1300 messages a second incoming at HQ
Data around an event 200bytes/msg
24x7x52 (31,449,600 seconds for those not
counting)
4-7GB a day

14
and the math isnt getting better

During Christmas time things were worse much
worse.
The organization wants to double its current size
by 2010!
Oh yahdid I mention RFID was coming
Tripling or quadrupling the number of events

15
My Challenge

Design and implement a system to get the data
from the warehouses to HQ
In near real time to support the reporting needs
Use whatever makes sense (to some degree more
later)
With a good size team (20-25 people in various
roles)

16
My Background

15 year grizzled veteran of software
development.
6 years of Java experience.
Author of a Java book.
Experienced architect, manager, mentor, trainer.
Eager to take on any software system challenge.
No experience in EAI!
An organization with limited EAI experience.

17
The Perfect Storm
The size of the EAI project abilities of the
development team
18
The Solution

Significant company resources and investment in
SeeBeyond EAI product.
Put SeeBeyond at all the endpoints (warehouses
and HQ).
All data would move through SeeBeyond.
SeeBeyond is Java based (also a company
technology direction).
Write routing/minor processing code in Java in
SeeBeyond.
Significant company resources and investment in
Oracle RDBMS.
Oracle already at the warehouses
Obtain a honking big Oracle DB at HQ.
Use Oracle stored procedures for heavy lifting
(data processing report data preparation).

19
Solution Diagram
Bundling 20-40 event messages at a time
Ex move this carton there, but have We gotten
the receive carton msg yet? We have recd a
carton do we have the reference data for the
product yet?
20
Problem 1 We werent ready

As an architect, I was not aware how different an
EAI messaging system is.
Asynchronous-everywhere nature
Had no patterns to follow (No I had not read
Hohpe/Woolf EAI book)
Did not have an awareness of the vendor landscape
Was easily talked into solutions by others.
My organization didnt see how big it was
Had only implemented smaller EAI solutions
Finding good help was hard and a critical step
Internally lots of support but no experience
Contractors lots of desire, but little
implementation experience to the scale/level of
effort

21
Getting Yourself Ready

Get yourself ready
Understand your options all the three letter
Es (EAI, ETL, EII, EDR, etc.)
Read EAI patterns
Know the products (WBI, Vitria, Tibco,
WebMethods, SeeBeyond, etc.)
Find people with real EAI experience
Experienced with systems matching the size of
your app
Find people with product expertise
Find people with design/pattern expertise

22
EAI Patterns

Enterprise Integration Patterns Hohpe/Woolf
Next Generation Application Integration
Linthicum
IT Architectures and Middleware Britton

23
EAI Component Basics

A typical messaging system is comprised of the
following parts.
Endpoints
Messages
Channels
Routers
Translators
Monitors

24
EAI Component Analogy
25
EAI Patterns

As the GOF pointed out in generic software, there
are common behaviors in software systems.
They are powerful tools for communicating
behavior.
They represent naturally occurring processes.
Are generally repetitive in nature, and lend
themselves to reuse.
Each of the message components also has several
patterns that represent common behaviors in a
messaging system and encourage reuse.

26
Getting Resources Ready

Let the network engineers know of your plans
You are going to be using a significant amount of
pipe.
Have you considered failover/load balancing?
(comm lines around warehouses get cut on
occasion)
Let the database engineers know of your plans
Terabytes of data to be stored and processed
where will it go?
Consider backup/recovery systems
Database logs/archiving
Performance tuning

27
Getting Support Ready

Support staffs will be lost at turnover
How many of your support shops really know
How to manage application servers?
How to manage web applications effectively?
Can you expect them to be able to operate,
maintain and support component based messaging
systems?
Do they know what a message server or bus is?
Across a very distributed environment?
Get them trained early (in messaging
infrastructure).
Have them help you design the monitoring tools
and alert systems.
Work together to develop proactive systems checks
and troubleshooting procedures.

28
Getting Others Ready

If your development team isnt ready, what about
Testing/QA teams?
Analyst?
Managers?
For example, finding experienced testers for
asynchronous messaging systems is difficult.
They usually need intricate knowledge of the
messaging subsystem monitors and admin
capabilities.

29
Problem 2 Proprietary EAI

EAI Products/Solutions are many.
EAI Standards are few.
EAI/ETL/EII/ market place is tumultuous
Sun has purchases SeeBeyond
IBM bought Ascential
Everyone calling their product an ESB (example on
next page)
Products/Solutions have scale limits
Some they know about
Others they do not
Java alone does not make you platform
independent.

30
Can you identify this product??

provides an award-winning messaging backbone
for deploying your enterprise service bus (ESB)
today as the connectivity layer of a
service-orientated architecture (SOA).

31
Examine Your Solution Options

See if what you already have would work.
There is a reason MQ has been around a long time.
Where possible consider tried, true and already
deployed platforms
But again do the math and see if they can support
the extra load.
In house support is probably better equipped
(more in a bit)
Not everything has to travel by message.
Consider multiple/alternate technologies for
parts of your solution.
ETL is great for certain parts of a large
solution
There is a reason why products like Oracle are
expensive (technologies like Oracle Replication
more in a bit).
Does, however, create more issues of timing.

32
What Travels by Message?

Consider multiple/alternate technologies for
parts of your solution.
Replication of reference data
Bulk/batch transfers
Non-real time needs
ETL is great for certain parts of a large
solution
Examine features in your DB/App Servers
There is a reason why products like Oracle are
expensive (technologies like Oracle Replication
more in a bit).
How about those Message Beans in the app server?
This can, however, create more issues of timing.

33
Reference Data

In many applications, you need reference data on
both ends of the messaging systems.
You can build a replicating message engine to
treat this like other message data (not
recommended).
Referential integrity becomes a real problem.
Consider issues of message timing (PR becomes the
51st state but messages with PR references start
to arrive before the new state data does)
Use simple replication technologies where
possible
ETL tools - if reference data changes only happen
at certain times.
Technologies like Oracle Replication for real
time (it can operate over a WAN).

34
Interoperability

We used Java, but
Even when you use Java, how is it being applied?
Java running inside of proprietary components
(like SeeBeyond eWays) does not make you
portable.
Write component code that can be used by or
incorporated by proprietary systems.
Under the covers, is the vendor using
JMS
JMX/SNMP
Web services/WS-Reliable Messaging/JAX-RPC
Etc

35
Process outside the bus

Process outside the message bus/subsystem if you
can
Let the bus focus on delivering the goods.
Too much processing time in the bus will create
Scalability problems
Monitoring problems
Possibly interoperability problems (especially
when using proprietary technology/components)
Process with components that are
Flexible
easy to get at (and change)
interoperable (if possible)
and contain reusable business logic (if possible)

36
Problem 3 Math we didnt do

We didnt do enough math up front.
We didnt plan for failure/growth.
The messages moved slower than anticipated.
The message processing took more time than
expected.
The amount of data was larger than expected.

37
Do the math and ask the tough ?s

How much time its going to take to get a message
from A to B
Test that estimate early.
Work with the business analysts to figure out how
many messages need to be moved.
Make volume estimates part of the non-functional
requirements gathering process.
Check that against the existing databases if
possible.
How much data needs to be packaged, shipped,
processed, stored?
Design the messages and calculate the size of the
overall message (XML and all).
Calculate the rate and add up the total volume.

38
and pad your answer!

Do you have room to spare??
Can the messaging system handle that (on both
ends)?
Can the consuming database handle that?
Can the hardware and network handle that?
Anticipate failure
What happens if something/anything goes down for
an hour?
What happens if you go down for a day?
What happens if you have unexpected growth?

39
Problem 4 Exception handling wasnt

More considerations for failover and redundancy
Versus Web application
We did not plan on downtime
Unplanned system issues
Planned outages
We didnt build in enough redundancy
Load balancing and
Failover were both after thoughts
All messages always correct all the time (NOT)
At first, we had no proper dead letter queuing
No proper exception processing
No means to properly see and react to issues
Many more points of failure and potential issues
More widely distributed

40
Design load balancing failover upfront

Load balancing and failover must be accommodated
Like security, you need a multi-layered approach
Hardware (like Big IP)
Redundant message bus/message servers
Processing components
Database
EAI system throttling
How are you going to kick over to the failover
systems (and return to regular systems)?
Without losing messages
Without causing timing problems in message
deliver/receipt

41
Throttling

Throttling limits ("throttles") the number of
requests it will respond to within a specified
period of time.
Limits congestion.
Built into most good EAI solutions today.
Often overlooked and not used.
Used in messaging systems to ensure that no one
part of the system is driven beyond its capacity
or performance efficiently.

42
Throttling at the potential congestion point
Throttle points. Potentially lots of messages,
especially if the WAN goes down
Congestion point
43
Space, space and more space

Plan on extra space for failure
A place for queued messages to sit if something
goes down
Space in the DB or space in the message channels
or both
Consider the time lags for getting additional
hardware bought, installed, and up and running
Plan on extra space for logs
You are going to want to keep log files around
for a while.
Some problems take time to manifest to a point of
awareness.
Devise an automated archive/clean up for logs.
Nonot all EAI systems provide log clean up
utilities.

44
Anticipate bad messages

Build a Dead Letter Queue (see EAI Patterns
book).
Unless you have a simple system, you will have
messages the system cant handle
Improper format, wrong data, etc
Build a means to capture and handle these
Less they clog your process.
Where do you put them? DB, other queue?
Who checks them (do you have a ones issue or
systemic problem?)

45
Message Repair

If possible, build a message triage mechanism to
inspect, fix, resend DLQed messages
This can be built/improved over time
More manual at first
Automated as you learn more.
Considerations
How are you going to clean up the error
droppings (messages that are truly dead)
Consider a retry queue with varied strategies
to retry messages that have failed.
Failure may be due to row locks or reference
updates that are just microseconds away from
completion.
Be cautious of when/why messages end up in the
dead letter queue.
You dont want it flooded because the DB is down.

46
Dead Letter Queue
47
Tools to Manage It/Monitor It

The multiple points of failures and issues of
your systems make them complicated to manage and
support.
Build in automated monitoring facilities and
system health dashboards.
You need a one stop shop for
whats up
whats down
whats queuing properly
whats queuing too much, etc.
Consider the use of JMX or SNMP
It is probably already built into some of your
infrastructure components.
Consider environment management for all phases,
not just production.
Environment management for large dev teams across
dev/test/stage was very laborious.
Compounded when other projects need to leverage
the same systems.
Calculate system thresholds.
Provide automated alerts to the dashboard and
email/page/etc. systems when they start to get
close (not once they have been achieved).

48
Problem 5 Change is Inevitable

The size and shape of our messages changed over
time.
We had no way to deal effectively with change.
Consequently, new system versions/updates caused
Shutdown
Replace (sometimes transforming data to a new
structure)
Restart
The real world was the only time we saw some
situations
We had no effective test harness
Typically leading to ugly back outs

49
Version Strategy

EAI system stability/life span depends on the
message structure.
Message structure is the hardest part to get
exactly right up front.
When message formats need to change, this creates
a real problem. The entire must be down, queues
emptied, etc.
Consider version information in the message and
routing/processing instructions in the bus.
More complicated system
Can also affect performance
Allows for dual operation (old and new systems)
without failure and major down time.
Its going to happen especially early plan for
it.

50
Version Routing
51
Testing is a !

Your test environment should be as close to
production as it can be in all respects
Consider collecting days worth of messages or
message generating data and using it for replay
scenarios.
Problem - even if you have all the data, you
dont have the same timing issues you will see in
the real world.
Testing all the potential message scenarios is
impossible with any significant sized system.
Consider developing a message replicator
subsystem.
Send replicated messages to a test harness.
A live test faucet of messages ready whenever
you need them.
Critical to be able test new/updated processes,
performance, etc.
Requires a fair amount of hardware and some
switch to turn it on/off.
This is not cheap!
Will impact performance
Consider putting the faucet on just one of the
servers in a farm

52
Test Faucet
53
Some Misc. Considerations

Payload type - XML vs. Text
Synch vs. Asynch

54
Payload

How should you format the message data? Payloads
can be in whatever format is reasonable for the
recipient
Text
Binary
Objects
XML
Payloads format can be a burden if it is not
standard across all recipients resulting in
transformation which
Can be expensive
May not always yield the desired result depending
on the payload contents adherence to a standard

55
Text vs. XML vs. Other

XML, while very descriptive and sender/receiver
agnostic
Will increase the size of the message
Will require parsing increase in cost,
performance, memory, message size
Text is very simple and straight forward,
however
It is difficult to represent complex commands or
events in this way
Usually requires a roll-your-own parser to
extract data
Binary, can be most efficient for transport, but
Every receiver will need custom code to marshal
the content back (RPC déjà vu)
Time consuming and brittle work
Objects, can be handy, making it easy to
aggregate information into a single
representative entity only -
Every receiver needs to have the EXACT same
definition of that object.
Non-object endpoints have difficulty
participating.
Vendor solutions may and often do influence this
decision.

56
Synchronous vs. Asynchronous

EAI ? Web service.
If you are going to invoke a service via the
message system be sure to minimize the number of
calls needed to initiate that behavior
When needed use synchronous messaging for
transactional needs.
Asynchronous messaging will work well for
non-transactional or long lived processes
This does not mean that you cant do transactions
it will just require a little more effort.

57
Wrap Up

Despite the issues the system is up and running
today.
Extremely useful to the business providing
unparalleled distribution information.
Like most things in software system development,
the lessons learned are more about
Organization
Architecture
and Design rather than implementation.
Thank you for your time and attention.

58
More Info or Questions

Jim White jwhite_at_intertech.com
Intertech Training specializing in real world
developer training.
The shortest distance between learning and
doing.
Intertech Software a leading Twin Cities-based
e-business and e-commerce consulting services
company.
www.intertech.com

Write a Comment

User Comments (0)