Efficient Online Monitoring of WebService SLAs - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Efficient Online Monitoring of WebService SLAs

Description:

Translate into a timed automaton/monitor (online) ... Use timed automata. However, they perform analysis prior to deployment instead of run-time. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 33
Provided by: nathanh2
Category:

less

Transcript and Presenter's Notes

Title: Efficient Online Monitoring of WebService SLAs


1
Efficient Online Monitoring of Web-Service SLAs
  • By Franco Raimondi, James Skene, and Wolfgang
    Emmerich
  • Presented by Nathan Heminger

2
Overview
  • Authors
  • Paper Background
  • Introduction
  • Background
  • Method
  • Implementation
  • Evaluation
  • Related Work
  • Critique

3
Authors
  • Franco Raimondi James Skene Wolfgang Emmerich
  • University College London (UCL)
  • London, UK.

4
Paper Background
  • Presented at ACM SIGSOFT/FSE 2008
  • Winner of an ACM SIGSOFT Distinguished Paper
    Award
  • Also presented at ISEC 2009
  • Continuation of The Monitorability of
    Service-Level Agreements for Application-Service
    Provision
  • J. Skene, A. Skene, J. Crampton, and W. Emmerich.
  • Proc. of the 6th (WOSP)
  • Citations
  • Domenico Bianculli, Lifelong Verification of
    Dynamic Service Compositions

5
Introduction
  • Increase in integration of service across
    organizational boundaries.
  • Web services most popular
  • Amazon Storefront web services allows small
    sellers to integrate with Amazons architecture.
  • Problem
  • When a service is down, revenue or other
    repercussions occur.
  • Businesses need a guarantee of service quality
    from external providers/organizations.

6
Introduction
  • Providers also want guarantees from clients
  • Prevent abuse of service
  • Outages are costly!
  • June 6, 2008 Amazon experienced a 2 hour outage.
  • Estimated loss of 31,000 per minute.
  • 2 hours 3,720,000
  • 24 hours 44,640,000

Source http//news.cnet.com/8301-10784_3-996201
0-7.html
7
Introduction
  • Solution Service Level Agreements (SLAs)
  • A bilateral agreement to specify requirements of
    Quality of Service (QoS) that has a penalty
    payment for violations.
  • Monitoring is necessary for
  • The Client, to verify the level of service
    quality
  • The Provider, to protect against misuse and false
    claims
  • Systems of SLAs quickly become complex.

8
Introduction
  • Monitoring
  • Offline
  • Collect data about the service delivery and
    analyze at time intervals
  • Disadvantage
  • Storage - large volumes of data necessary to know
    about service quality violations
  • Does not generate alerts instantaneously.
    Critical systems need alerts immediately.
  • Online
  • Delivery is analyzed while the service is
    provided so that service quality violations can
    be detected and acted upon immediately

9
Introduction
  • The authors approach
  • Specify an SLA in SLAng (authors SLA Language).
  • Translate into a timed automaton/monitor
    (online).
  • Apply handlers that invoke monitors to validate.

10
Background
  • Requirements for SLAs
  • Protectability Original intent.
  • Not exploitable Only breaches force payments.
  • Monitorable Trustworthy information on status.
  • Understandable Intent can be recovered.
  • Precise Intent is unambiguous.
  • Safety
  • Safe provider can guarantee that the requirement
    is met or possess an SLA from another provider
    that will ensure it is met.
  • Unsafe control is outside the providers system
    of SLAs and actions.

11
Background
  • Client has timeliness of w - z lt t
  • Conditions w - z lt t, y - x lt t1, z - y lt t2, w -
    z lt t3, z - x lt t1 t2, w - y lt t3 t2, such
    that t1 t2 t3 lt t.
  • 2326 236 6.9 1010 possible systems of
    SLAs
  • Depth First Search (DFS) reveals only one
    solution
  • I insures w - x lt t for C, and S insures z - y lt
    t2 for I.
  • Events at a single interface with the network, so
    monitoring should be used there.

12
Background
  • Timed Words and Timed Automata
  • Timed word
  • Simply a word and an associated time sequence.
  • Example (aab . . . ), (0.1, 0.3, 1.2, . . . ).
  • Timed Automata
  • Extend automata to include timed clocks
  • Time constraints

13
Method
  • Key Idea
  • Encode the specification patterns for an SLA as a
    timed automata to verify correctness/violation.
  • A timed automata that accepts a timed word
    signifies a violation.
  • Patterns of SLA timeliness constraints.

14
Method
  • Three typical web service requirements patterns
  • Latency
  • The response of the service must follow the
    request within t seconds
  • Reliability
  • The number of errors in a given time window does
    not exceed X
  • Throughput
  • The number of client requests in a given time
    window does not exceed X
  • Note Each can be translated into a timed
    automata such that the language accepted
    corresponds to the timed words characterized by a
    violation.

15
Method
  • Encode as automaton, then pass all events to
    verify if an accepting state is reach.
  • Automaton evolves with the execution.
  • If no transition exists for an event the
    automaton may reset to the initial state.
  • Cannot discard all the events that lead to a
    rejection (i.e. no violation occurred).

16
Method
  • Example of a timed automaton
  • Reliability 3 failures in t units of time

17
Method
  • Example Throughput
  • No more than 2 requests can be submitted in a
    given minute
  • Requests t 0, 0.9, 1.1, 1.2
  • Three requests between t 0.9 and t 1.2.

18
Method
  • Problem
  • The automaton resets at t 1.1 so no detection
    of violation at t 1.2.
  • Solution
  • When a state without successor occurs, discard
    only the very first state (t 0) and re-run the
    automaton
  • Implication
  • Must store state

19
Method
  • To detect a violation, must maintain events equal
    to the of states in the automaton 1.
  • Diameter of the witnesses
  • Limited storage (e.g. mobile phones that monitor
    SLAs).
  • Asymptotic Analysis
  • Theorem 1 On-line monitoring for the patterns
    has a worst case complexity O(n2), where n is the
    number of states of the automaton.

20
Implementation
  • SLAng Eclipse Plugin
  • Creating, editing, and verifying SLAs written in
    SLAng

21
Implementation
  • Monitor and Handler Eclipse Plugin
  • Automatically generates automata that encode
    SLAng SLA violations for latency, reliability,
    and throughput.
  • Checkers standalone Java checkers of the
    automata
  • Also produces handlers for Apache AXIS for
    intercepting messages and dispatching to
    checkers.

22
Implementation
  • Handlers deployed to client and provider

23
Evaluation
  • Evaluated on a web service computational grid.

24
Evaluation
  • Evaluation SLAs
  • Throughput limits on the total number of searches
    for a given client to 3 per 24 hour period.
  • Throughput limits of no more than 2 submissions
    per second for the GridSAM and Plotting services
  • Client latency for job submission is less than
    1000 milliseconds.
  • Reliability constraint of no more than one
    failure in 10 of job submission service and no
    more than one in 1000 failures for the plotting
    service.

25
Evaluation
  • Test conducted on grid
  • Linux servers, hyper-threaded CPUs, 2GB RAM.
  • All violations correctly identified.
  • 230,000 SOAP messages over 14,500 seconds ( 4
    hours).
  • Average validation time of 0.4 milliseconds.
  • 72 of validations under measurement precision of
    1 millisecond.
  • The total time spent validating
  • Total time spent in validation was 87.7 seconds
    (0.6 overhead)

26
Evaluation
  • Benefits
  • No modification of services
  • Very little overhead
  • Correctly identifies all violations

27
Evaluation
  • Performance

28
Related Work
  • Havelund and Rosu
  • Automatic generation of monitors
  • Lacks ability to express time properties
  • Fickas and Feather
  • Requirements monitoring.
  • Relies on triggers in the AP5 active database,
    which is written in LISP.
  • Authors claim their approach is more lightweight
    and significantly more efficient.
  • Robinson
  • Temporal logic and KAOS to define timeliness
    constraints.
  • No discussion on the efficient monitoring of
    temporal logic formulae.

29
Related Work
  • Baresi et al
  • Techniques for monitoring BPEL web service
    compositions.
  • Uses hand-coded monitors..
  • The authors simply monitor construction by
    generating automatically from SLAng timeliness
    constraints.
  • Mahbub and Spanoudakis
  • Framework for monitoring web service
    compositions.
  • Requires knowledge of events not observable by a
    BPEL engine.
  • No statement on efficiency of the monitors.
  • The authors demonstrate a couple of milliseconds
    overhead.
  • Song Dong et al.
  • Use timed automata.
  • However, they perform analysis prior to
    deployment instead of run-time.

30
Conclusion
  • Concluding Benefits
  • Non-intrusive.
  • Easily deployable with no knowledge of
    application.
  • Solution implemented in less than a day.
  • Small code footprint (10Kb per checker).
  • On-the-fly handling to allow hundreds of events
    per second with under a millisecond of
    verification time on average.

31
Critique
  • Overall an excellent paper
  • Clearly written with good use of diagrams.
  • Explains concepts clearly or refers to other
    works.
  • Builds on previous research in a logical manner.
  • Criticisms
  • Only demonstrated for web services. Applicable
    for other services?
  • Does not allow for the dynamic update of SLAs.
    Recompilation necessary when a client or provider
    wants to adjust contract?
  • Admittedly does not allow for SLAs that change
    over a time period.
  • Only discussed in the context of Apache Axis.
    What about other web service platforms?
  • I would like to see this combined with some type
    of AOP approach, such as used in "Non-Intrusive
    Monitoring and Service Adaptation for WS-BPEL.
  • Large standard deviation caused by distortion in
    results
  • 300 data points above 5 milliseconds (largest
    under 500 milliseconds).
  • Justified by stating that the time measurement
    of the validation overhead is not the only load
    on the machines

32
Questions
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com