Design Guidelines for Large Message-based EAI Systems (A Case Study) - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Design Guidelines for Large Message-based EAI Systems (A Case Study)

Description:

Design Guidelines for Large Message-based EAI Systems (A Case Study) Jim White ... ETL is great for certain parts of a large solution ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 59
Provided by: cmpm7
Category:

less

Transcript and Presenter's Notes

Title: Design Guidelines for Large Message-based EAI Systems (A Case Study)


1
Design Guidelines for Large Message-based EAI
Systems(A Case Study)
  • Jim White
  • Director of Training
  • Intertech, Inc.
  • St. Paul, MN
  • jwhite_at_intertech.com

2
(No Transcript)
3
This Talk
  • Presents an EAI case study
  • A very large EAI system for a retail chain.
  • Identify issues and challenges encountered in the
    project
  • Identifies lessons learned and recommendations
    for your EAI projects.
  • Lets you know others do have it as bad as you.
  • The story does have a happy ending
  • Maybe providing hope to the hopeless.

4
How many of you
  • Are actively working on an EAI project?
  • Have been on an EAI project in the past?
  • Plan on being on an EAI project in the near
    future?
  • Have no idea what EAI is but it sounded like a
    good topic to help put me to sleep after lunch??

5
How many of you
  • Are project architects?
  • Are technical project leads?
  • Are developers/designers of systems?
  • Are managers?
  • Are testers/QA/support folks?
  • Cant remember after 4 days of the conference?

6
First Off Whats EAI?
  • From Wikipedia the integration of data
    between applications in a company.
  • wMUsers technology that connects
    enterprise-wise systems evolved to refer to
    technologies used to connect systems anywhere
  • Hohpe/Woolf enterprise integration using
    messaging

7
EAI Messaging
  • Enables data or commands to be sent across the
    network
  • using a send and forget approach

8
Messaging How?
  • Message-oriented middleware (MOM) like that
    offered by
  • IBM WebSphere MQ
  • Microsoft BizTalk
  • TIBCO
  • WebMethods
  • SeeBeyond (now Sun owned)
  • Vitria
  • and others
  • Using
  • Java Message Service (JMS)
  • Microsofts Message Queuing (MSMQ) and/or
    Messaging libraries in Microsoft .NET
  • Web services standards that support asynchronous
    Web services
  • WS-ReliableMessaging
  • Suns Java API for XML Messaging (JAXM)
  • Microsofts Web Services Extensions (WSE).

9
Why large EAI is different?
  • Messaging/EAI development ? Web or other
    distributed app development
  • Especially when very large
  • Many new or significantly altered considerations
  • Requirement differences
  • Time and space needs
  • Process control/orchestration
  • Failure handling
  • Monitoring
  • Proprietary nature of vendor solutions
  • Support turnover
  • Staffing needs

10
The Case Study
  • A major retail chain has dozens of distribution
    centers
  • Each distribution center or warehouse services
    hundreds of stores (gt1200 total stores).
  • Each distribution center is moving thousands of
    cartons (i.e. boxes) around the warehouse each
    day
  • Receiving them from trucks through dock doors.
  • Moving them with fork lifts to storage areas in
    the warehouse
  • Conveying them to break down areas for
    distribution to stores.
  • Conveying them down belts to storage areas or
    outbound trucks.
  • Moving them onto trucks that depart the warehouse.

11
The Case Study
  • A box is tracked via labels and bar code readers.
  • Some reads are manual and some are automated.
  • Generating literally hundreds of events per
    second per warehouse.
  • RFID was about to make create more events.
  • More reads from more points in the warehouse.
  • Potentially adding store reads to the event
    list.

12
The System
  • Part I
  • The retail chain wanted all the data on events
    regarding the movement of cartons sent to HQ
  • Providing them with unparalleled real time
    information on inventory levels and product
    status.
  • Providing more accurate information for
    merchandise analyst and productivity monitoring
    for warehouse managers.
  • Part II (Not germane to the discussion today)
  • Providing a Java Web application to nearly 10,000
    users to access the data company wide.
  • Reports galore.
  • Some limited ad hoc query reporting.

13
Lets do the math
  • gt25 warehouses
  • Each generating 15-20 carton events per second
  • Averaging 400 messages a second incoming at HQ
  • Peak around 1300 messages a second incoming at HQ
  • Data around an event 200bytes/msg
  • 24x7x52 (31,449,600 seconds for those not
    counting)
  • 4-7GB a day

14
and the math isnt getting better
  • During Christmas time things were worse much
    worse.
  • The organization wants to double its current size
    by 2010!
  • Oh yahdid I mention RFID was coming
  • Tripling or quadrupling the number of events

15
My Challenge
  • Design and implement a system to get the data
    from the warehouses to HQ
  • In near real time to support the reporting needs
  • Use whatever makes sense (to some degree more
    later)
  • With a good size team (20-25 people in various
    roles)

16
My Background
  • 15 year grizzled veteran of software
    development.
  • 6 years of Java experience.
  • Author of a Java book.
  • Experienced architect, manager, mentor, trainer.
  • Eager to take on any software system challenge.
  • No experience in EAI!
  • An organization with limited EAI experience.

17
The Perfect Storm
The size of the EAI project abilities of the
development team
18
The Solution
  • Significant company resources and investment in
    SeeBeyond EAI product.
  • Put SeeBeyond at all the endpoints (warehouses
    and HQ).
  • All data would move through SeeBeyond.
  • SeeBeyond is Java based (also a company
    technology direction).
  • Write routing/minor processing code in Java in
    SeeBeyond.
  • Significant company resources and investment in
    Oracle RDBMS.
  • Oracle already at the warehouses
  • Obtain a honking big Oracle DB at HQ.
  • Use Oracle stored procedures for heavy lifting
    (data processing report data preparation).

19
Solution Diagram
Bundling 20-40 event messages at a time
Ex move this carton there, but have We gotten
the receive carton msg yet? We have recd a
carton do we have the reference data for the
product yet?
20
Problem 1 We werent ready
  • As an architect, I was not aware how different an
    EAI messaging system is.
  • Asynchronous-everywhere nature
  • Had no patterns to follow (No I had not read
    Hohpe/Woolf EAI book)
  • Did not have an awareness of the vendor landscape
  • Was easily talked into solutions by others.
  • My organization didnt see how big it was
  • Had only implemented smaller EAI solutions
  • Finding good help was hard and a critical step
  • Internally lots of support but no experience
  • Contractors lots of desire, but little
    implementation experience to the scale/level of
    effort

21
Getting Yourself Ready
  • Get yourself ready
  • Understand your options all the three letter
    Es (EAI, ETL, EII, EDR, etc.)
  • Read EAI patterns
  • Know the products (WBI, Vitria, Tibco,
    WebMethods, SeeBeyond, etc.)
  • Find people with real EAI experience
  • Experienced with systems matching the size of
    your app
  • Find people with product expertise
  • Find people with design/pattern expertise

22
EAI Patterns
  • Enterprise Integration Patterns Hohpe/Woolf
  • Next Generation Application Integration
    Linthicum
  • IT Architectures and Middleware Britton

23
EAI Component Basics
  • A typical messaging system is comprised of the
    following parts.
  • Endpoints
  • Messages
  • Channels
  • Routers
  • Translators
  • Monitors

24
EAI Component Analogy
25
EAI Patterns
  • As the GOF pointed out in generic software, there
    are common behaviors in software systems.
  • They are powerful tools for communicating
    behavior.
  • They represent naturally occurring processes.
  • Are generally repetitive in nature, and lend
    themselves to reuse.
  • Each of the message components also has several
    patterns that represent common behaviors in a
    messaging system and encourage reuse.

26
Getting Resources Ready
  • Let the network engineers know of your plans
  • You are going to be using a significant amount of
    pipe.
  • Have you considered failover/load balancing?
    (comm lines around warehouses get cut on
    occasion)
  • Let the database engineers know of your plans
  • Terabytes of data to be stored and processed
    where will it go?
  • Consider backup/recovery systems
  • Database logs/archiving
  • Performance tuning

27
Getting Support Ready
  • Support staffs will be lost at turnover
  • How many of your support shops really know
  • How to manage application servers?
  • How to manage web applications effectively?
  • Can you expect them to be able to operate,
    maintain and support component based messaging
    systems?
  • Do they know what a message server or bus is?
  • Across a very distributed environment?
  • Get them trained early (in messaging
    infrastructure).
  • Have them help you design the monitoring tools
    and alert systems.
  • Work together to develop proactive systems checks
    and troubleshooting procedures.

28
Getting Others Ready
  • If your development team isnt ready, what about
  • Testing/QA teams?
  • Analyst?
  • Managers?
  • For example, finding experienced testers for
    asynchronous messaging systems is difficult.
  • They usually need intricate knowledge of the
    messaging subsystem monitors and admin
    capabilities.

29
Problem 2 Proprietary EAI
  • EAI Products/Solutions are many.
  • EAI Standards are few.
  • EAI/ETL/EII/ market place is tumultuous
  • Sun has purchases SeeBeyond
  • IBM bought Ascential
  • Everyone calling their product an ESB (example on
    next page)
  • Products/Solutions have scale limits
  • Some they know about
  • Others they do not
  • Java alone does not make you platform
    independent.

30
Can you identify this product??
  • provides an award-winning messaging backbone
    for deploying your enterprise service bus (ESB)
    today as the connectivity layer of a
    service-orientated architecture (SOA).

31
Examine Your Solution Options
  • See if what you already have would work.
  • There is a reason MQ has been around a long time.
  • Where possible consider tried, true and already
    deployed platforms
  • But again do the math and see if they can support
    the extra load.
  • In house support is probably better equipped
    (more in a bit)
  • Not everything has to travel by message.
  • Consider multiple/alternate technologies for
    parts of your solution.
  • ETL is great for certain parts of a large
    solution
  • There is a reason why products like Oracle are
    expensive (technologies like Oracle Replication
    more in a bit).
  • Does, however, create more issues of timing.

32
What Travels by Message?
  • Consider multiple/alternate technologies for
    parts of your solution.
  • Replication of reference data
  • Bulk/batch transfers
  • Non-real time needs
  • ETL is great for certain parts of a large
    solution
  • Examine features in your DB/App Servers
  • There is a reason why products like Oracle are
    expensive (technologies like Oracle Replication
    more in a bit).
  • How about those Message Beans in the app server?
  • This can, however, create more issues of timing.

33
Reference Data
  • In many applications, you need reference data on
    both ends of the messaging systems.
  • You can build a replicating message engine to
    treat this like other message data (not
    recommended).
  • Referential integrity becomes a real problem.
  • Consider issues of message timing (PR becomes the
    51st state but messages with PR references start
    to arrive before the new state data does)
  • Use simple replication technologies where
    possible
  • ETL tools - if reference data changes only happen
    at certain times.
  • Technologies like Oracle Replication for real
    time (it can operate over a WAN).

34
Interoperability
  • We used Java, but
  • Even when you use Java, how is it being applied?
  • Java running inside of proprietary components
    (like SeeBeyond eWays) does not make you
    portable.
  • Write component code that can be used by or
    incorporated by proprietary systems.
  • Under the covers, is the vendor using
  • JMS
  • JMX/SNMP
  • Web services/WS-Reliable Messaging/JAX-RPC
  • Etc

35
Process outside the bus
  • Process outside the message bus/subsystem if you
    can
  • Let the bus focus on delivering the goods.
  • Too much processing time in the bus will create
  • Scalability problems
  • Monitoring problems
  • Possibly interoperability problems (especially
    when using proprietary technology/components)
  • Process with components that are
  • Flexible
  • easy to get at (and change)
  • interoperable (if possible)
  • and contain reusable business logic (if possible)

36
Problem 3 Math we didnt do
  • We didnt do enough math up front.
  • We didnt plan for failure/growth.
  • The messages moved slower than anticipated.
  • The message processing took more time than
    expected.
  • The amount of data was larger than expected.

37
Do the math and ask the tough ?s
  • How much time its going to take to get a message
    from A to B
  • Test that estimate early.
  • Work with the business analysts to figure out how
    many messages need to be moved.
  • Make volume estimates part of the non-functional
    requirements gathering process.
  • Check that against the existing databases if
    possible.
  • How much data needs to be packaged, shipped,
    processed, stored?
  • Design the messages and calculate the size of the
    overall message (XML and all).
  • Calculate the rate and add up the total volume.

38
and pad your answer!
  • Do you have room to spare??
  • Can the messaging system handle that (on both
    ends)?
  • Can the consuming database handle that?
  • Can the hardware and network handle that?
  • Anticipate failure
  • What happens if something/anything goes down for
    an hour?
  • What happens if you go down for a day?
  • What happens if you have unexpected growth?

39
Problem 4 Exception handling wasnt
  • More considerations for failover and redundancy
  • Versus Web application
  • We did not plan on downtime
  • Unplanned system issues
  • Planned outages
  • We didnt build in enough redundancy
  • Load balancing and
  • Failover were both after thoughts
  • All messages always correct all the time (NOT)
  • At first, we had no proper dead letter queuing
  • No proper exception processing
  • No means to properly see and react to issues
  • Many more points of failure and potential issues
  • More widely distributed

40
Design load balancing failover upfront
  • Load balancing and failover must be accommodated
  • Like security, you need a multi-layered approach
  • Hardware (like Big IP)
  • Redundant message bus/message servers
  • Processing components
  • Database
  • EAI system throttling
  • How are you going to kick over to the failover
    systems (and return to regular systems)?
  • Without losing messages
  • Without causing timing problems in message
    deliver/receipt

41
Throttling
  • Throttling limits ("throttles") the number of
    requests it will respond to within a specified
    period of time.
  • Limits congestion.
  • Built into most good EAI solutions today.
  • Often overlooked and not used.
  • Used in messaging systems to ensure that no one
    part of the system is driven beyond its capacity
    or performance efficiently.

42
Throttling at the potential congestion point
Throttle points. Potentially lots of messages,
especially if the WAN goes down
Congestion point
43
Space, space and more space
  • Plan on extra space for failure
  • A place for queued messages to sit if something
    goes down
  • Space in the DB or space in the message channels
    or both
  • Consider the time lags for getting additional
    hardware bought, installed, and up and running
  • Plan on extra space for logs
  • You are going to want to keep log files around
    for a while.
  • Some problems take time to manifest to a point of
    awareness.
  • Devise an automated archive/clean up for logs.
  • Nonot all EAI systems provide log clean up
    utilities.

44
Anticipate bad messages
  • Build a Dead Letter Queue (see EAI Patterns
    book).
  • Unless you have a simple system, you will have
    messages the system cant handle
  • Improper format, wrong data, etc
  • Build a means to capture and handle these
  • Less they clog your process.
  • Where do you put them? DB, other queue?
  • Who checks them (do you have a ones issue or
    systemic problem?)

45
Message Repair
  • If possible, build a message triage mechanism to
    inspect, fix, resend DLQed messages
  • This can be built/improved over time
  • More manual at first
  • Automated as you learn more.
  • Considerations
  • How are you going to clean up the error
    droppings (messages that are truly dead)
  • Consider a retry queue with varied strategies
    to retry messages that have failed.
  • Failure may be due to row locks or reference
    updates that are just microseconds away from
    completion.
  • Be cautious of when/why messages end up in the
    dead letter queue.
  • You dont want it flooded because the DB is down.

46
Dead Letter Queue
47
Tools to Manage It/Monitor It
  • The multiple points of failures and issues of
    your systems make them complicated to manage and
    support.
  • Build in automated monitoring facilities and
    system health dashboards.
  • You need a one stop shop for
  • whats up
  • whats down
  • whats queuing properly
  • whats queuing too much, etc.
  • Consider the use of JMX or SNMP
  • It is probably already built into some of your
    infrastructure components.
  • Consider environment management for all phases,
    not just production.
  • Environment management for large dev teams across
    dev/test/stage was very laborious.
  • Compounded when other projects need to leverage
    the same systems.
  • Calculate system thresholds.
  • Provide automated alerts to the dashboard and
    email/page/etc. systems when they start to get
    close (not once they have been achieved).

48
Problem 5 Change is Inevitable
  • The size and shape of our messages changed over
    time.
  • We had no way to deal effectively with change.
  • Consequently, new system versions/updates caused
  • Shutdown
  • Replace (sometimes transforming data to a new
    structure)
  • Restart
  • The real world was the only time we saw some
    situations
  • We had no effective test harness
  • Typically leading to ugly back outs

49
Version Strategy
  • EAI system stability/life span depends on the
    message structure.
  • Message structure is the hardest part to get
    exactly right up front.
  • When message formats need to change, this creates
    a real problem. The entire must be down, queues
    emptied, etc.
  • Consider version information in the message and
    routing/processing instructions in the bus.
  • More complicated system
  • Can also affect performance
  • Allows for dual operation (old and new systems)
    without failure and major down time.
  • Its going to happen especially early plan for
    it.

50
Version Routing
51
Testing is a !
  • Your test environment should be as close to
    production as it can be in all respects
  • Consider collecting days worth of messages or
    message generating data and using it for replay
    scenarios.
  • Problem - even if you have all the data, you
    dont have the same timing issues you will see in
    the real world.
  • Testing all the potential message scenarios is
    impossible with any significant sized system.
  • Consider developing a message replicator
    subsystem.
  • Send replicated messages to a test harness.
  • A live test faucet of messages ready whenever
    you need them.
  • Critical to be able test new/updated processes,
    performance, etc.
  • Requires a fair amount of hardware and some
    switch to turn it on/off.
  • This is not cheap!
  • Will impact performance
  • Consider putting the faucet on just one of the
    servers in a farm

52
Test Faucet
53
Some Misc. Considerations
  • Payload type - XML vs. Text
  • Synch vs. Asynch

54
Payload
  • How should you format the message data? Payloads
    can be in whatever format is reasonable for the
    recipient
  • Text
  • Binary
  • Objects
  • XML
  • Payloads format can be a burden if it is not
    standard across all recipients resulting in
    transformation which
  • Can be expensive
  • May not always yield the desired result depending
    on the payload contents adherence to a standard

55
Text vs. XML vs. Other
  • XML, while very descriptive and sender/receiver
    agnostic
  • Will increase the size of the message
  • Will require parsing increase in cost,
    performance, memory, message size
  • Text is very simple and straight forward,
    however
  • It is difficult to represent complex commands or
    events in this way
  • Usually requires a roll-your-own parser to
    extract data
  • Binary, can be most efficient for transport, but
  • Every receiver will need custom code to marshal
    the content back (RPC déjà vu)
  • Time consuming and brittle work
  • Objects, can be handy, making it easy to
    aggregate information into a single
    representative entity only -
  • Every receiver needs to have the EXACT same
    definition of that object.
  • Non-object endpoints have difficulty
    participating.
  • Vendor solutions may and often do influence this
    decision.

56
Synchronous vs. Asynchronous
  • EAI ? Web service.
  • If you are going to invoke a service via the
    message system be sure to minimize the number of
    calls needed to initiate that behavior
  • When needed use synchronous messaging for
    transactional needs.
  • Asynchronous messaging will work well for
    non-transactional or long lived processes
  • This does not mean that you cant do transactions
    it will just require a little more effort.

57
Wrap Up
  • Despite the issues the system is up and running
    today.
  • Extremely useful to the business providing
    unparalleled distribution information.
  • Like most things in software system development,
    the lessons learned are more about
  • Organization
  • Architecture
  • and Design rather than implementation.
  • Thank you for your time and attention.

58
More Info or Questions
  • Jim White jwhite_at_intertech.com
  • Intertech Training specializing in real world
    developer training.
  • The shortest distance between learning and
    doing.
  • Intertech Software a leading Twin Cities-based
    e-business and e-commerce consulting services
    company.
  • www.intertech.com
Write a Comment
User Comments (0)
About PowerShow.com