Building FaultTolerant Enterprise Applications - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Building FaultTolerant Enterprise Applications

Description:

Enters a value too big for the database field. Types letters ... Throttle at network level. Use JMS and other asynchronous technologies to throttle on backend ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 39
Provided by: chariots
Category:

less

Transcript and Presenter's Notes

Title: Building FaultTolerant Enterprise Applications


1
Building Fault-Tolerant Enterprise Applications
  • Erin Mulder
  • Chariot Solutions
  • chariotsolutions.com

Brian McCallister Fort Hill Company forthillcompan
y.com
2
Agenda
  • Goals of Fault Tolerance
  • User Recoverable Errors
  • Expected Application Errors
  • System Failure
  • Useful Strategies
  • Discussion

3
Goals of Fault Tolerance
What are we really worried about?
  • Availability
  • Integrity
  • Confidentiality
  • Usability
  • Cost

4
Goals of Fault Tolerance
What can go wrong?
  • User Error
  • Concurrent Changes
  • Bugs
  • Resource Failure/Downtime
  • System Overload
  • Misconfiguration
  • Sabotage

5
Goals of Fault Tolerance
Themes well keep visiting
  • Prevention
  • Code Guidelines Reviews
  • Automated Validation Testing
  • Performance / Stress Testing
  • Detection
  • Logging and Auditing
  • Validation Patterns
  • Monitoring
  • Recovery
  • Exception handling patterns
  • Error feedback loop
  • Redundancy

6
Agenda
  • Goals of Fault Tolerance
  • User Recoverable Errors
  • Expected Application Errors
  • System Failure
  • Useful Strategies
  • Discussion

7
User Recoverable Errors
Simple validation error
  • What do you do when the user
  • Leaves a required field blank
  • Enters a value too big for the database field
  • Types letters in a numeric field
  • Selects inconsistent options
  • Tries to do things in the wrong order

8
User Recoverable Errors
Simple validation error
  • Fault tolerance is more than detection
  • Prevent the user from making errors
  • Set maxlengths on input fields
  • Use character masks
  • Specify units
  • Show example input
  • Dont allow the selection of inconsistent options
  • Dont present navigation options that arent
    meant to be followed

9
User Recoverable Errors
Simple validation error
  • Help the user recover quickly
  • Highlight all errors clearly
  • Show help text and examples for invalid fields
  • If some other action is required first, launch it
    instead of interrupting the flow with frustrating
    errors
  • Perception is everything!
  • Log the error for later analysis
  • Save enough information to recreate
  • Start automatically handling common mistakes

10
User Recoverable Errors
Optimistic concurrency clash
  • Everything looks good until the save
  • Then
  • Item has just gone out of stock
  • Another user has just updated the same document
  • Time has passed and action is no longer allowed

11
User Recoverable Errors
Optimistic concurrency clash
  • Increase save points
  • Alert user to potential risk
  • Low stock
  • Another user just accessed this record
  • Another user has soft lock on record
  • Offer useful options for resolving collision
  • Merge changes
  • Backorder
  • Automatically retry later
  • Email me when it is available
  • Give tips for avoiding future collisions

12
User Recoverable Errors
Bookmarks, back buttons and browsers
  • User escapes normal page flow
  • Bookmarks login page or internal page
  • Uses back button
  • Opens a new window within same session
  • Session times out
  • Missing context from previous requests
  • Next click is like bookmark to internal page
  • Other browser oddities
  • Double-clicking submit buttons
  • Pressing stop button in the middle of a request

13
User Recoverable Errors
Bookmarks, back buttons and sessions
  • Prevention is difficult the user is in control
  • Javascript can sometimes help
  • Javascript can sometimes hurt
  • Plan for and test each of these scenarios
  • Plan for handling out-of-sequence requests

14
User Recoverable Errors
Bookmarks, back buttons and sessions
  • To seamlessly handle session timeouts and
    out-of-sequence requests, consider
  • Persistent sessions (saved to database)
  • Passing state in every request (form fields or
    URL rewriting)
  • Storing state in custom cookies
  • Adding custom logic to recover from timed-out
    sequences
  • To simply detect and alert, consider
  • Using listener to catch session expiration
  • Using state validation to catch out-of-sequence
    requests
  • Redirecting user to session expiration page
  • To improve process
  • Log session losses (requests within expired
    session)
  • Consider increasing session timeout
  • Consider using prevention techniques described
    above
  • Increase save points

15
User Recoverable Errors
Bookmarks, back buttons and sessions
  • To minimize impact of back button, consider
  • Techniques described for out-of-sequence requests
  • Redirecting to GETs instead of returning
    responses to POSTs
  • To work around double submissions, consider
  • Disabling submit button after first click
  • Susceptible to Stop button or request timeout
  • Minimizing response times!
  • Detecting on server side using request id
  • Difficult to return correct response to second
    request
  • Immediately forwarding to intermediate page which
    can forward on when response is ready
  • To handle multiple windows, consider
  • Passing state in every request
  • Adapting web frameworks to map state (e.g. Struts
    form beans) by primary key or request ID instead
    of a static name

16
Agenda
  • Goals of Fault Tolerance
  • User Recoverable Errors
  • Expected Application Errors
  • System Failure
  • Useful Strategies
  • Discussion

17
Expected Application Errors
Resource is unavailable
  • Database is down for maintenance
  • No connection to integrated partner service
  • Resource is overloaded
  • Out of DB connections
  • JMS Queue full

18
Expected Application Errors
Resource is unavailable
  • To prevent, consider
  • Coordinating maintenance schedules
  • Planning for failover at the resource level
  • Increasing hardware budget ?
  • Increasing transaction timeout seconds (caution
    last resort)
  • To handle, analyze transactional requirements
  • Is immediate user response necessary?
  • Can the resource access be handled asynchronously
    with an extended, logical transaction?
  • Plan rollbacks carefully to allow for retries
    (consider idempotence, sub-transactions)
  • Alert operator/admin if out of SLA
  • Log all outages (study for patterns)

19
Expected Application Errors
Application is overloaded
  • Mentioned on CNBC
  • Linked from Slashdot
  • Denial of Service

20
Expected Application Errors
Application is overloaded
  • Test under heavy load
  • Tune hot spots
  • Run with excess capacity
  • Throttle at network level
  • Use JMS and other asynchronous technologies to
    throttle on backend
  • Tune application server to degrade gracefully
  • Monitor carefully
  • Be prepared to scale out, not just up

21
Expected Application Errors
Bugs and other undocumented features
  • Friendly bug
  • Triggers invalid state
  • Causes VM or app server to throw exception
  • Greedy bug
  • Monopolizes resources
  • Leaks connections
  • Silent and deadly bug
  • Corrupts data

22
Expected Application Errors
Bugs and other undocumented features
  • To handle friendly bugs
  • Bulletproof your transactions rollbacks
  • Write up strict guidelines
  • Conduct code reviews
  • If developers are junior and/or integrity is a
    lot more important than performance, consider
    using unchecked application exceptions
  • Catch Throwable somewhere in the UI
  • Display sanitized errors to user
  • Log carefully to allow easy debugging (use
    timestamps, thread IDs, transaction IDs
  • Alert operator/administrator

23
Expected Application Errors
Bugs and other undocumented features
  • To handle greedy bugs
  • Reduce transaction timeout seconds
  • Handle timeouts in the same way as friendly bugs
  • Monitor carefully
  • Log statistics ( of transaction timeouts, CPU
    usage, memory usage, GC, network traffic, stuck
    threads)
  • Automate log analysis
  • Trigger a thread dump (kill -3) during hot spots
  • Alert operator/administrator to hot spots
  • Use clustering to contain damage

24
Expected Application Errors
Bugs and other undocumented features
  • To handle silent and deadly bugs
  • Bulletproof transaction settings
  • Validate on multiple levels, use referential
    integrity
  • Audit everything
  • Unless performance/cost prohibits, keep a
    complete audit trail on every table (easy with
    triggers, aspects or code generators), try to
    include transaction ID
  • Flush caches regularly
  • After a save, load the record from the database
    and display back to the user
  • Run periodic audits with human review
  • Plan for how to use audit trail to recover from
    data corruption
  • Early detection is key escalate user concerns!

25
Agenda
  • Goals of Fault Tolerance
  • User Recoverable Errors
  • Expected Application Errors
  • System Failure
  • Useful Strategies
  • Discussion

26
System Failure
Never have an unplanned outage
  • Determine acceptable downtime
  • Plan clustering / failover accordingly
  • Monitor carefully so outages are detected
    immediately
  • Be ready with a tiny planned outage page and
    server in advance
  • Consider offsite host
  • Build this functionality into non-Web clients at
    development time
  • Plan for transaction recovery
  • Plan for JMS recovery
  • Use quiescing load balancing to bring servers
    offline for maintenance

27
Agenda
  • Goals of Fault Tolerance
  • User Recoverable Errors
  • Expected Application Errors
  • System Failure
  • Useful Strategies
  • Discussion

28
Useful Strategies
Be sure that you develop guidelines for
  • Error Messages
  • Validation (format, business rules, size,
    cleansing)
  • Logging (when, where, what)
  • Auditing
  • Monitoring (level of automation, alerts)
  • Transactions (who rolls back, checked vs.
    unchecked)
  • Sessions Caching (request vs. session,
    flushing)
  • Clustering

29
Useful Strategies
Error Messages
  • For validation errors, be sure to
  • Include format and size hints
  • Show examples
  • Give more information than the basic field label
  • Mention the error at the top of the screen
  • Highlight the actual field
  • Wherever possible, catch all errors at the same
    time!
  • For other user-recoverable errors
  • Let the user know what to do next
  • If the user cant recover
  • Apologize
  • Give no details
  • Suggest workarounds
  • (Silently log and alert!)

30
Useful Strategies
Validation
  • If possible, validate at all levels
  • Common strategies
  • Externalize validation rules and use AOP or CG
    techniques to build them into each layer
  • Clearly define which layers are responsible for
    which types of validation. For example
  • All format errors handled in web tier
  • All business rule violations handled in
    application tier
  • All field lengths enforced at data tier
  • If business logic isnt dependent on fields,
    define them dynamically, along with their
    validation rules
  • Use a rules engine

31
Useful Strategies
Logging
  • Log in all tiers
  • Helps diagnose problems
  • More secure
  • Define logging levels (debug, info, error, etc.)
  • Create consistent guidelines for those levels
  • Include
  • Timestamp
  • Current User
  • Some sort of thread ID, transaction ID, etc.
  • Dont make logs a source of failure (watch disk
    space, JMS load, etc.)

32
Useful Strategies
Auditing
  • Unless you have a performance or space problem,
    audit all changes
  • Provides accountability
  • Easier to support users
  • Easier to debug
  • Easier to recover from disaster
  • Easier to detect attacks
  • Include
  • Timestamp
  • Current User
  • Some sort of thread ID, transaction ID, etc.
  • Complete data record or diff
  • Decide whether a failure to audit should sink the
    transaction

33
Useful Strategies
Monitoring
  • Who is watching the logs?
  • Common strategies include
  • 24/7 operations center
  • Business hours operation center
  • Overworked admin whenever she happens to think
    about it (not recommended!)
  • Automated, redundant processes that analyze logs
    and raise alerts to on-call administrators
  • Logs show more than critical errors
  • Ideally, mine them for clues on usability,
    performance problems and attacks

34
Useful Strategies
Transactions
  • What runs in a transaction?
  • Who is responsible for rolling it back?
  • Common strategies
  • Top server-side tier creates a user transaction,
    catches all errors and then determines its fate
  • Container-managed transactions with session
    façade
  • Top level methods responsible for rollbacks
  • Unchecked (runtime) exceptions, so rollbacks are
    automatic
  • Mostly unchecked exceptions with a few special
    cases
  • How to pick a transaction timeout?

35
Useful Strategies
Sessions and Caching
  • Session or request?
  • Common strategies
  • Everything in HTTP session
  • Stateful session beans
  • Hidden form fields
  • URL rewriting
  • Encrypted cookies
  • Unencrypted cookies
  • When to flush cache?
  • Trade-off between integrity and performance
  • Caches can mask data problems

36
Useful Strategies
Clustering
  • Why use clusters?
  • Availability
  • Scalability
  • Will this application need a cluster?
  • Can you take it offline for maintenance?
  • Can you take it offline to scale it up?
  • Are you sure you wont need to scale it out?
  • Useful option, but not the answer to everything
  • Usually requires work on existing applications
  • Build applications to be clusterable from the
    start
  • Make shared state serializable
  • Obey J2EE restrictions
  • Consider how much should be stored in session
  • Test on a cluster so you know how close you are

37
Discussion
Get the slides online at http//www.chariotsoluti
ons.com/slides
37
38
Building Fault-Tolerant Enterprise Applications
  • Erin Mulder
  • Chariot Solutions
  • chariotsolutions.com

Brian McCallister Fort Hill Company forthillcompan
y.com
Write a Comment
User Comments (0)
About PowerShow.com