Challenge Problem Session Detection and Reaction to Unplanned Operational Events in Large Scale Distributed Real-Time Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Challenge Problem Session Detection and Reaction to Unplanned Operational Events in Large Scale Distributed Real-Time Embedded Systems

Description:

Challenge Problem Session Detection and Reaction to Unplanned Operational Events in Large Scale Distributed Real-Time Embedded Systems Workshop on Parallel and ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 7
Provided by: PaulR197
Category:

less

Transcript and Presenter's Notes

Title: Challenge Problem Session Detection and Reaction to Unplanned Operational Events in Large Scale Distributed Real-Time Embedded Systems


1
Challenge Problem Session Detection and Reaction
to Unplanned Operational Events in Large Scale
Distributed Real-Time Embedded Systems
Workshop on Parallel and Distributed Real-Time
Systems 2005 April 4th and 5th, 2005, Denver,
Colorado
2
Challenge Problem Context
  • More real-time and embedded systems are becoming
    Quality of Service enabled thus allowing for the
    management of resources in a more dynamic policy
    based manner
  • The mechanisms for defining and operating on this
    policy are still maturing
  • These systems are also moving towards more
    peer-to-peer implementation of resource
    allocation for managing large-scale distributed
    networks of mixed hard and soft real-time
    subsystems
  • The computing devices, consisting of multiple
    blade processors, numbering in the hundreds and
    are connected via combination of LANs, WANs, and
    wireless communications.

3
Challenge Problem
  • One of the challenges in the management of
    resources (e.g., processors, memory, networks,
    communications, power) is the detection and
    reaction to operational events that were
    unplanned or unanticipated but shouldnt cause
    failures (unexpected behavior).
  • An example of this is receipt of a larger number
    of requests for service than specified by the
    requirements or anticipated by the system
    designers for a capability that if it fails would
    have a significant impact, e.g., cause the loss
    of a great deal of money.
  • What approaches, methods, architectural features,
    and mechanisms exist, are under development, or
    are the subject of research to deal with these
    sorts of situations?

4
Discussion Points (1 of 3)
  • In many large-scale real-time systems there are
    both periodic and aperiodic processes driven by
    data exchanges (messages) that affect the system
    performance.
  • In QoS enabled systems, end-to-end deadlines may
    be specified for a set of applications that make
    up an operations capability. The policy for
    responding to certain events may also be
    specified.
  • The occurrence of unplanned operational events
    may or may not cause resource exhaustion.
  • The detection of and remediation action for
    unanticipated operational events may be specified
    by a function that defines a set of thresholds
    (e.g., upper and lower bounds) and the action(s)
    to be taken when these thresholds are exceeded.

5
Discussion Points (2 of 3)
  • Is it better to have separate detection/reaction
    models for fault detection and handling and for
    unplanned operational events. Or does this make
    for a more complicated solution?
  • Given the nature of distributed systems, what
    might be the issues with implementing
    peer-to-peer mechanisms for event detection and
    correlation, policy management, and policy
    enactment?
  • There are some existing standards (e.g., the IETF
    SNMP and Distributed Management Task Force (DMTF)
    Common Information Model (CIM) that have been
    used by some of the enterprise level system
    management products (e.g., CA Unicenter, IBM
    Tivoli) but these dont really address real-time
    QoS based resource management. How can these be
    extended to support the DRE space for this type
    of problem?

6
Discussion Points (3 of 3)
  • What are the issues within both systems and
    software engineering disciplines to the
    development of solutions to these challenge
    problems (e.g., what are some changes in
    processes and culture within these disciplines
    that are necessary to support the development of
    robust solutions that can exceed specified
    requirements, but dont break the budget during
    the project development life-cycle)?
Write a Comment
User Comments (0)
About PowerShow.com