Title: Dynamic Resource Management Architectures and Algorithms for Distributed RealTime Applications
1Dynamic Resource Management Architectures and
Algorithms for Distributed Real-Time Applications
- Frank Drews
- Center for Intelligent, Distributed, and
Dependable Systems - School of Electrical Engineering and Computer
Science - Ohio University, Athens, Ohio 45701
- drews_at_ohio.edu
2Goals
- Enhance real-time and QoS capabilities in grid
middleware to meet the demands of scientific
applications
3Grid Middleware Challenges
- Quality-of-Service (QoS) adaptation under QoS
constraints - Need for coordinated end-to-end real-time
enforcement
4Outline
- Illustrative example
- Grid middleware requirements
- A generic Grid architecture
- Adaptive resource management models and algorithms
5Illustrative Applications
- Grid-based real-time medical image retrieval
- NASA on-board satellite systems HART
6Grid-based Real-Time Medical Image Retrieval
- Distributed medical data sets, including X-ray
images, CAT scans, etc, distributed across
different domains (medical research centers,
hospitals, etc.) - Medical Researchers submit queries via the
internet to a Medical Image Retrieval System
(MIRS) server - MIRS employs static and dynamic image retrieval
operations on a subset of database objects that
allow medical researchers establish structural
similarities between query objects and database
objects - The MIRS sends back the best hits
- A query contains a sample object along with a
variety of requested QoS parameters and real-time
timing constraints - Image quality, image size, various dynamic image
retrieval parameters, similarity metrics,
locations of image databases, real-time timing
deadlines, etc.
7Grid-based Real-Time Medical Image Retrieval
Problems
- Dynamic (content-based) image retrieval is highly
complex - We may need multiple high performance computing
facilities to distribute the load - Processing of multiple queries
- User queries may have different priorities
- It would be desirable if users could formulate
their individual QoS trade-offs - Timing is difficult to predict
- Retrieval operators can run at various levels of
QoS, resulting in different time and space
complexities, and thus different running times on
the (heterogeneous) computing nodes - The QoS parameters may need to be changed at
run-time - Data is highly distributed data sets vary in
sizes network transfer times difficult to
predict
Grid Forum Applications Working Group scenario on
parallel tomography is another example of the
need for real-time capabilities in Grid
middleware.
8 General Utility Model
- Real-time and QoS Requirements for Grid
Middleware are based on a general utility model
that involves timing and QoS factors - For example, a scientist may be willing to
tolerate a delay in getting results in return for
increased accuracy of the results - Or, the scientist may value timeliness above all
else and be willing to sacrifice the quality of
the computation to achieve results in a timely
fashion
9Real-time and QoS Requirements for Grid Middleware
- Support optimization of utility
- Support end-to-end timing constraints grid
services must include the capability to reserve
and deliver server resources and communication
bandwidth when required - Support varying levels of QoS Grid services must
include mechanisms that permit the dynamic
adjustment of QoS parameters - Coordinate support in all middleware components
10Real-time and QoS Requirements for Grid Middleware
- Transparent utility optimization support
- Support utility optimization with basic
infrastructure provided by the middleware and by
the end-systems that are involved - Optimization of utility, including timing
constraints and QoS adjustments, must fit within
the architecture and existing interfaces of the
middleware.
11Hard vs. Soft Real-Time Constraints
- These criteria assume that the applications have
timing constraints that are somewhere between the
classical definitions of hard and soft - i.e. that there is typically high value to the
system in meeting timing constraints, but that it
is not absolutely mandatory. - This in turn implies that the middleware should
embody sound real-time resource allocation and
scheduling techniques.
12Required Grid Middleware Capabilities
- The notions of real-time and QoS that are
supported by existing Grid middleware are basic -
they do not support the optimization of utility - The existing approaches do not support real-time
constraint enforcement and QoS adjustment that is
coordinated both end-to-end, and coordinated
among necessary middleware components of resource
allocation, scheduling, and bandwidth management
13Required Grid Middleware Capabilities
- In particular these important real-time and QoS
capabilities are required from Grid middleware - Support for adaptive, distributed resource
management - Support for distributed end-to-end scheduling
- Support for network bandwidth management
14Generic Real-Time Middleware Architecture
15Example
16Resource Management Algorithm Development
- Feedback control-based QoS optimization
- Host controller / 1 local resource (DQRAM 1-d)
- Host controller / k shared resources (DQRAM k-d)
- Hierarchical controller architecture
- Robust resource allocation for real-time
applications which process data at rates that
vary unpredictably over time
17Feedback Control based Resource Allocation
- Problem Given an amount of available resources,
provide on-line control of the QoS settings of
the tasks so as to optimize the overall system
utility
18Related Work
- Burns et al. 2000 The meaning and role of
value in scheduling flexible real-time systems. - Humphrey et al. 1997 DQM architecture
- QuO 2001 Quality Objects
- QRAM 1997 Quality-of-Service based Resource
Allocation Model
19QRAM
- QRAM - Quality-of-Service based Resource
Allocation Model - Uses resource profiles and utility profiles
- Tasks can run at various levels of resource usage
yielding various levels of quality of service - Determines an optimal resource allocation that
maximizes the total system benefit - Does not consider run-time variations such as
dynamic task arrivals and completions, changes in
the resource availability - Is not tolerant to misspecifications of the
resource profiles
20DQRAM 1-d
- Dynamic Quality-of-Service based Resource
Allocation Model (DQRAM) - We assume a single resource and a system
(potentially) consisting of hard, soft, and
non-real-time applications. - A controller provides on-line control of the soft
real-time tasks QoS settings so as to optimize
the overall system benefit - Approach is based on discrete control theory we
close the loop by feeding back the actual
resource utilization to the controller
21Goals
- Desired properties of the QoS controller
- Low time complexity
- Analytical performance guarantees
- Stability
- Robust against misspecification of resource
profiles and utility profiles
22DQRAM 1-d Feedback Controller
23DQRAM 1-d Feedback Controller
- The controller aims to always run the tasks in
states for which the total utility is maximized - The controller also monitors the actual current
availability of the resource - Disturbances in the resource utilization will
generally lead to an error in the predicted
resource usage of the tasks - Controller activations
- - Task arrival
- - Task termination
- - End of each task period
24Properties of DQRAM 1-d
- State-preserving, incremental, tolerant towards
misspecifications of resource profiles - Task arrivals and task terminations can be
accommodated easily and efficiently - Misspecification of resource profiles and
instantaneous peaks in resource usage require
only incremental changes to the current
allocation
25Properties of DQRAM 1-d
Version 1
Version 2
26Properties of DQRAM 1-d
optimality points
optimal utility curve
approximation algorithm
utility
amount of resource
27Properties of DQRAM 1-d
- Analytical performance guarantee dynamic and
static -
-
optimal utility curve
approximation algorithm
dynamic lower bound
utility
static lower bound
amount of resource
28DQRAM m-d
- Extension to multiple (k) shared resources
29DQRAM Implementation
- The DQRAM controller has been integrated into the
QARMA resource manager
30DQRAM Implementation
- In addition, we have integrated the DQRAM
controller into the Linux 2.6 Kernel
31DQRAM - Hierarchical ControlExample 1
32DQRAM - Hierarchical ControlExample 2
33Summary
- Requirements for a real-time grid middleware
- Presentation of a generic real-time grid
architecture - Overview of our recent progress in algorithms for
adaptive resource management
34Future Work
- Finish up the basic research on algorithms
- Develop real-time grid services