Title: A Complex Adaptive System Approach to QoS Assurance and Stateful Resource Management for Dependable Information Infrastructure (CIP Project)
1A Complex Adaptive System Approach toQoS
Assurance and Stateful Resource Management for
Dependable Information Infrastructure(CIP
Project)
- Nong Ye (PI)
- Professor of Industrial Engineering, Affiliated
Professor of Computer Science and Engineering - Ying-Cheng Lai (co-PI)
- Professor of Electrical Engineering and
Mathematics - Partha Dasgupta (co-PI)
- Associate Professor of Computer Science and
Engineering - Collaborators AFRL (John Faust and Pat Hurley)
- October 18, 2002
2Presentation Outline
- Project overview
- Year 1 work
- QoS requirements Nong Ye
- local-level QoS models (router and web server)
Nong Ye - Simulation model of Internet Nong Ye
- Mathematical theories on networks and attacks
Ying-Cheng Lai - Trust and security models of networks Partha
Dasgupta - Year 2 work and plan
- Regional-level QoS models Nong Ye
- Detection of emergent network states Nong Ye
- Mathematical theory on phase transition in
networks Ying-Cheng Lai - Trust and security model of networks Partha
Dasgupta
3Project Overview
- Goal
- Develop the bottom-up self-synchronization of
QoS-centric stateful resource management,
according to a Complex Adaptive Systems approach,
for a dependable information infrastructure that
will be used to host network-centric information
operations - Objectives
- Investigate, implement and test two enabling
elements of the dependable information
infrastructure - Control strategies to enable the bottom-up
self-synchronization of QoS-centric stateful
resource management - Control and communication protocols to embed the
control strategies of self-synchronization into
the existing information infrastructure for
making it dependable at affordable costs - Year 1 research local-level QoS and security
- Year 2-3 research regional-level QoS and
security - Year 4-5 research global-level QoS and security
4QoS Requirements
- Without QoS requirements, any QoS level is
acceptable - Sensitivity of various traffic data on computer
networks - QoS Attributes
- Timeliness
- Precision
- Accuracy
5QoS Requirements
- Traffic data classification
- Technology properties
- Time dependency
- Real Time (RT) hard constraints on delay and
jitter - Non Real Time (NRT) soft constraints mostly on
delay - Symmetry of Interaction
- Symmetric requests and responses consume
comparable amounts of resources - Asymmetric requests are less resource-consuming
than responses - Human factor properties
- Data on delay
- Conventional text and data lt 2-5 sec. tolerable
gt 5 sec. unacceptable - Audio lt 0.1-0.5 sec. for real time impression in
virtual reality (VR) - Video less sensitive than audio, lt 100 ms for
audio and video synchronization - Data on jitter
- Audio lt 20-30 sec. for VR, lt 100 ms for CD
sound, lt 400 ms for telephone speech - Video lt 50 ms for HDTV, lt100 ms for broadcast
TV, lt400 ms for video-conference - Data on bit error rate
- Audio lt10-2 for telephone, lt10-3 for
uncompressed CD, lt10-4 for compressed CD - Video 10-6 for HDTV, 10-5 for broadcast TV, 10-4
for videoconference
6QoS Requirements
- Traffic data classification
7QoS Requirements
- Standards of QoS requirements for each traffic
class - Voice over IP
8QoS Requirements
- Standards of QoS requirements for each traffic
class - Video on demand
9Local-Level QoS Models
- Existing models
- Best effort (BE) current Internet, FIFO, no
resource reservation, no service service
differentiation, no service guarantee - Differential service (DS) DiffServ, RFC2475,
per-hop service control, coarse granularity of
service differentiation through traffic
classification, conditioning, priority queuing,
bandwidth allocation by service class, weak
service guarantee, stateless - Integrated service (IS) InteServ, RFC1633,
end-to-end bandwidth reservation through RSVP,
queuing to enforce bandwidth allocation, firm
end-to-end per-flow service guarantee, problems
in scalability and flexibility - Goals
- Minimize execution time
- Maximize resource utilization
- Maximize throughput
10Local-Level QoS Models
- QoS principles
- Resource agents cannot provide end-to-end service
guarantee to user agents - Process agents need to be proactive in seeking
right resource agents to meet their end-to-end
QoS requirements - QoS goal of local-level resource agents
- Performance stability and thus predictability
through bounded or least variable performance - Service differentiation
- Guaranteed if admitted
11Local-Level QoS Models
- QoS model of router
- QoS model based on feedback control (FB) versus a
DS model - Goal bounded delay of high-priority packets
- State monitored high-priority queue length
- PID feedback control of high-priority admission
rate (r) - Root locus method for optimal control parameters
12Local-Level QoS Models
- QoS model of router
- QoS model based on adjusted WSPT (A-WSPT) versus
a best-effort model - Goal minimize and stabilize delay of
high-priority packets - A-WSPT scheduling rule
- Markov decision process for optimal scheduling
and admission control
13Local-Level QoS Models
- QoS models of router
- OPNET simulation experiments
- Parameters of router models
- BE FIFO queuing, no admission control
- WSPT and A-WSPT WSPT and A-WSPT queuing, no
admission control, W5 for high-priority packets,
W2 for low-priority packets - DS token rate400,000 bits/sec, bucket
depth100,000 bits, high-priority queue100,000
bits, low-priority queue450,000 bits - FB Kp 1.0, Ki 0.2, Ki 0.2, Control bound
value 80,000 bits, other configurations are
same as those for DS - Experiment set-up
- Each source generates either high-priority
packets or low-priority packets, NOT both - Inter-arrival time exponential distribution
- Packet size normal distribution, mean10,000
bits, standard deviation2,000 bits - One output interface Service rate 640,000
bits/sec - Total output queue space 550,000 bits
- Two types of packet High priority ToS value7,
Low priority ToS value0 - Simulation duration 180 seconds
14Local-Level QoS Models
- QoS models of router
- OPNET simulation experiments
- Experimental set-up
15Local-Level QoS Models
- QoS models of router
- Simulation results for high priority packets in
the heavy traffic condition
16Local-Level QoS Models
- QoS models of router
- Simulation results for high priority packets in
the heavy traffic condition
17Local-Level QoS Models
- QoS models of router
- Overall simulation results
- For the heavy traffic condition
- Feedback control
- Shortest time-in-system for high-priority packets
with low variation - Lowest packet loss for high-priority packets
- High throughput for high-priority packets
- DiffServ
- Generally similar performance to FB
- Higher loss of high-priority packets at the
output queue - Slightly better throughput of high-priority
- WSPT
- Highest throughput for high-priority traffic.
- Variable time-in-system, because WSPT allows
newly arriving packets to push back
lower-priority packets - A-WSPT
- Comparable to WSPT but with more stable
time-in-system - Best effort
- Similar performance for high and low priority
packets - For the light traffic condition
18Local-Level QoS Models
- QoS models of web server
- Web requests with due time
- Admission control if completion time gt due time,
reject - QoS models based on production planning for
single machine, parallel machines (cluster of web
servers) and serial-machines (multiple steps) - WSPT schedule by Wj/Pj
- ATC combine WSPT with minimum slack time,
- EDD schedule by the earliest due date
19Local-Level QoS Models
- QoS models of web server
- OPNET simulation experiments
- Five models BE, DS, WSPT, ATC, EDD
- Three scenarios
- Heavy traffic
- Traffic Generation
- Weight 1,2,3,6
- Packet inter-arrival time distribution
exponential (0.04) for W1, W2, W3, - and exponential (0.2) for W6
- Packet size distribution Normal(6000,1000)
bits - Traffic generated 480,000 bits per second in
average - Due date distribution Normal(0.8,0.08)
- Queue
- Service Rate 240,000 bits per second
- Capacity 512,000 bits. For DS, capacity of
high-priority queue is 32,000 - capacity of low-priority queue is 480,000
- K value for ATC 1000
- Longer due time
- Traffic Generation due date distribution of
Normal(2,0.2)
20Local-Level QoS Models
- QoS models of web server
- Simulation results for the heavy traffic condition
21Local-Level QoS Models
- QoS models of web server
- Overall simulation results
- Effects of due time and admission control less
drop at the queue - Effects of longer due time longer queue length
- Effects of less queue capacity
- Smaller lateness of all traffic for all five
models, W6, W3 and W1, because of a smaller queue - DS drops more W6
- Production planning admission control keeps the
lateness of all requests lt 0 - For W6 requests WSPT/ATC is similar to DS in
producing the best performance - For W3 and W1 requests WSPT/ATC is better than DS
22Simulation Model of Internet
- Goals
- Build a simulation model of Internet using
scale-free model of Internet - Discover data collection points, metrics and
analytical techniques to detect emergent network
states - Research stages
23Simulation Model of Internet
- Research stages
- Stage 1
- Write program which implements the scale-free
algorithm to build up internet topology - max of nodes n 5,000
- of connections m 1
- Initial of nodes n0 m
- Stage 2
- Classify devices as follows
- For all nodes with connectivity 1, assign
workstation model to 70 of nodes, server model
to 30 - Within server nodes, assign types 40 HTTP, 40
E-mail, 10 FTP, 10 Telnet - For all nodes with connectivity gt 16, assume ISP
assign ISP Router model (black box ISP). - For all remaining nodes, assign switch model
- For each ISP Router, recursively define
sub-network of all nodes connected to this router
and its children, etc. - Define top network as all sub-networks and the
links connecting them (these are router to router
links).
24Simulation Model of Internet
- Research stages
- Stage 3
- Generate java classes of Modeler Document Data
Type using Oracles XML Class Generator for Java - Use classes to generate XML document of internet
topology - Import XML document to OPNET and verify links
- Stage 4
- Create probe models to collect metrics
- Collect baseline system metrics
- Stage 5
- Create scenarios with random failure
- Create scenarios with planned attack
- Collect metrics
- Stage 6
- Detect emergent network states using analytical
techniques
25Simulation Model of Internet
- Topology
- 5,000 devices
- 32 ISP routers
- 1006 servers (30)
- Min subnet 38 devices
- Max subnet 441 devices
26Simulation Model of Internet
27Simulation Model of Internet
- Simulation set-up
- Simulations run for 6 minutes each
- All workstations initialize between 30 seconds
and 4.5 minutes - ISP routers
- Each ISP router has a number of interfaces, each
of which represents a point of access into the
ISP - Min (max) number of interfaces on a router 17
(77) - Total number of interfaces on the network 1,027
- RIP Routing protocol is implemented one each
interface - RIP creates dynamic routing tables with all
routes to destination - Routing uses a FIFO queuing scheme
- Buffer size 1 KB, reduced for attack/failure
- Packets are dropped when the buffer is full
28Simulation Model of Internet
- Simulation set-up
- Workstations
29Simulation Model of Internet
- Simulation set-up
- Servers
30Simulation Model of Internet
- Experimental conditions
- Independent variables
- Under attack, a device operates at a reduced
service rate - Under failure, a device ceases to process traffic
31Simulation Model of Internet
- Experimental conditions
- Dependent variables
32Simulation Model of Internet
33Simulation Model of Internet
34Simulation Model of Internet
- Some traffic data collected
- Baseline traffic
35Simulation Model of Internet
- Some traffic data collected
- Global metric IP packets dropped
36Simulation Model of Internet
- Some traffic data collected
- Regional metric IP packets received at ISP
37Simulation Model of Internet
- Some traffic data collected
- Local metric traffic received at interface
38Simulation Model of Internet
- Some traffic data collected
- Local metric traffic dropped at interface
39Simulation Model of Internet
- Some traffic data collected
- Regional metric processing delay at ISP
40Detection of Emergent Network States
- Multivariate statistical process control
techniques to detect anomalies - Chi-square disatnce test
- MEWMA
- Multivariate factor analysis to identify
significant factors - ANOVA
- Nonlinear time-series analysis techniques to
detect emergent behavior - Embedded coordinate technique find correlation
dimension, identify system dimensionality,
requires a deterministic system present in model - Multivariate Autoregressive (MVAR) models
determine coupling strengths between regions - Synchronization technique "spike synchronization
detection" or "unitary events detection, tells
whether there is a synchronization between two
time series that consist of spikes at random
times - Hilbert space technique works for stochastic
models
41Regional-Level QoS Models
- Regional-level systems
- Local area networks
- Administrative domains
- Existing work
- Centralized optimization e.g., computational
grids - Allocation and scheduling are fundamental to
performance - Allocation of data and computation in space
- Select available resources for processes
- Assign processes to resources
- Distribute processes and data
- Scheduling data and computation over time
- Order processes on resources
- Order communications between processes
- Objectives
- Promote the performance of the SYSTEM
- Job schedulers maximize throughput, minimize
communication cost - Resource schedulers maximize resource
utilization - Promote the performance of the INDIVIDUAL
APPLICATIONS - Application schedulers optimize performance,
e.g., execution time, resolution, speed, cost,
etc.
42Regional-Level QoS Models
- Existing work
- High performance schedulers
- MPP (Massive Parallel Processors) produce poor
performance for computational grids
43Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
44Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
- Program model
- Represent programs in terms of their resource
requirements - Build a program dependency graph of phased tasks
- Performance model
- Use the program dependency graph parameterized
during execution as performance model to predict
execution time - Use a generic model, e.g., execution time
computation communication - Input the data-flow program graph to expert
system - Scheduling policy
- Choose the best among candidate schedules based
on performance criteria - Centralized, FCFS
- Load balancing
45Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
- Example AppLes
- Framework and a testbed
46Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
- Example AppLes
- Strategy to develop a schedule
47Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
- Example AppLes
- Cost model to evaluate strip decomposition
48Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
- Example AppLes
- Methods of strip decomposition
49Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
- Example AppLes
- Performance results
50Regional-Level QoS Models
- Existing work
- High performance schedulers
- Grid schedulers
- Challenges
- Complexity of scheduling problem
- Variations in deliverable resource performance
due to resource sharing - Prediction of programs resource requirements
- Hardware and software heterogeneity
51Regional-Level QoS Models
- Principles for our regional-level QoS models
- Simplify the scheduling problem through resource
standardization, i.e., stabilizing performance of
resources to make them standard parts - Develop new scheduling and control strategies to
achieve the objective of performance stability - Call on reserved, redundant resources to achieve
performance stability under failure/attack - Make dynamic resource state available to process
agents - Process agents plan ahead to achieve performance
objectivesa distributed decomposition of the
scheduling problem complexity - Make network policies accordingly