Title: Selfhealing networks
1Self-healing networks
- When the going gets tough, the tough get going
2001 IPA Spring Days on
Security
L.Spaanenburg. Groningen University, Department
of Computing Science. P.O. Box 800, 9700 AV,
Groningen. Mail ben_at_cs.rug.nl,
http//www.cs.rug.nl/ben
2Motivation
What is security?
- Security involves the guaranteed access
- to all resources at all times with top quality
- Threats - from outside
- - from inside
- Here internal diseases only
3Agenda
What we need and what we cant
- The nature of the net
- Disasters with central control
- The nature of self-healing
- In-line monitoring
- A hardware / software perspective
- Research view
4The weak spot
It is the small dog that bites!
- A network is billions of tightly connected
distributed heterogeneous components - Things happen on a wide time/spatial scale with
massive interaction - A local disturbance can spread widely in zero
time - Relationships and interdependencies are too
complex for mathematical theories
5Users perspective on networks
An integrated Power Information Communication
technology
6Telephone network
A network can be a tree with central control
long distance
1st-order exchange
medium distance
2nd-order exchange
short distance
local exchange
connection
7Data Network
Connectionless communication by broadcast
Host
Router
Subnet
LAN
8Means of Communication
Sigh, there are some many ways to communicate
- Synchronous
- PDH Plesiochronous Digital Hierarchy
- SDH Synchronous Digital Hierarchy
- ISDN Integrated Services Digital Network
- Asynchronous
- FDDI Fiber Distributed Data Interface
- FR Frame Relay
- ATM Asynchronous Transfer Mode
9Sources of Abnormality
What goes wrong, will go wrong
- Attacks from the outside world(service attack)
- Hick-ups in the network communication
- Failures on the network nodes
- Its a detection problem!
10The Keeler-Allston disaster
The network is vulnerable for local abnormalities
- On 10 August 1996, the Keeler-Allston 500 kV
power line tripped creating voltage depression
and the McNary Dam went to maximum - The Ross-Lexington 230 kV line also tripped and
pushed the McNary Dam over the edge - The McNary Dam sets off oscillations that went to
500 MW within 1.5 minutes - The North-South Pacific INTER-tie isolated 11 US
states and 2 Canadian provinces
11The 1998 Galactic page out
The weak belly of the Earth
- In May 1998, the Galaxy-IV satellite was disabled
by unknown causes - US National Public Radio and 40M pagers went out,
airline flights delayed and data networks had to
be manually reconfigured - Many geo-stationary satellites are 800 1400 km
13 (60-), 35 (70-), 69 (80-) and 250 (90-) - 10 million pieces of debris gt 1 mm
12Other fault cascades
Cause/effect relations occur frequently
- Finagles Law
- Anything that can go wrong, will
- Antibiotics cause resistance (DDT)
- Code replication also works for errors
13Self-healing in history
The name has been used before
- 1993 ATT announced the self-healing wireless
network - 1998 SUN bought the RedCape Policy Framework
for self-healing software - 1998 HP released the sefl-healing version of
OpenView Network Node Manager - 2001 Concord Com. Announced self-healing for
the home
14Self-Healing ingredients
Self-healing Detection Diagnosis Self-Repair
- Application handling the communication
- Presentation message formatting
- Session controls traffic between parties
- Transport converts packets into frames v.v.
- Network controls frame routing
- Data Link frames of bit sequences
- Physical relays physical quantities
Network Test
Node Test
Recon- figure
15An Initiative in Self-Healing
The Complex Interactive Networks/Systems
Initiative
- The CIN/SI is funded by the Electronic Power
Research Institute and the US Dept. of Defense as
part of the Government-Industry Collaborative
University Research program - 28 universities in 6 consortia started Spring
1999 to spent 30 M in 5 years - The approach is multi-agent technology
16CIN/SI consortia
The different aspects of self-healing
- CalTech CIN Mathematical Foundation
- CMU Context-dependent Agents
- Cornell Failure Minimization
- Harvard Modeling and Diagnosis
- Purdue Intelligent Management
- Washington Defense to Attacks
17Key issues
Central control comes too late by definition
- Pre-programming misses the target by lack of
context dependence - No damage would have occurred if the load on the
McNary Dam would have decreased by 0.4 during
the next 30 minutes - Local agents making real-time decision would have
eliminated the Keeler-Allson disaster.
18Basic agent types
What are agents?
- Agents are called cognitive or rational when
equipped with clear rules and algorithms - Agents are called reactive when their functioning
depends on the interrogation of the environment - Both type of agents are required on the decision-
- making layers handling respectively reaction,
- coordination and deliberation
19CIN/SI architecture (1)
Operational control of the power plant
Triggering events
Plans/Decisions
Events/alarm Filtering Agents
Model update Agents
Command Agents
Controls
Events/ alarms
Faults Isolation Agents
Frequency Stability Agents
Protection Agents
Generation Agents
Power System
20CIN/SI architecture (2)
Strategic management of the power grid
Hidden Failure Monitoring Agents
Reconfiguration Agents
Restoration Agents
Vulnerability Assessment Agents
Events Identification Agents
Planning Agents
Triggering events
Plans/Decisions
Events/alarm Filtering Agents
Model update Agents
Command Agents
21Monitoring the process
Strategic decisions on tactic control
Monitor
Process
Control
Actuator
Sensor
22The network emphasis
The network glues the agents together
Network
23Defect looses all
Majority voting is a centralized consensus scheme
- But what we need is
- Mutual observation between nodes
- Group decision of testing agents
- Implied reconfiguration of the network
- How can we facilitate
- testing with agent properties?
24Agent characteristics
What is security?
mouse messages ... other agents
messages move change appearance speak
sen sors
effec tors
Behaviour
Independent, Reactive,Proactive, Social
25Built-in Block Observation
Testing complex systems requires autonomy
generator
process
verifier
26Linear Feedback Shift-register
Generation of ordered bit strings by EXORs
- When data flows over identical nodes,
- the typical function can be characterized
- by the feedback polynomial
27Friedmann model
The aim is for a locally compacted set of patterns
Process
I
O
Q
28A basic function
Proto-typical software on a small PIC controller
- A simple low-pass filter
- Takes a data sampling routine,multiplying adder
and final function 1/N.
29A neuron
Intelligence can be built from filtering
- A simple neuron
- Is similar to the low-pass filter except for the
incoming data. Operates from the same input data
ring-buffer.
30A neural network
Where there is one neuron, there can be more
- A feed-forward network
- Differs only in the layer-by-layer switching of
the I/O-blocks
31Non-Linear Feedback SR
Generation of ordered patterns by Correlators
- When data flows over identical nodes,
- the typical function can be characterized
- by the globally recurrent neural network
32Neural Observation
Analog correlation looks like digital EXOR
- Analog correlation is about finding the
functional similarity - Digital correlation is the same except for the
effect of crisping - Random access storage is always larger than
storage of an ordered function - The neurally approximated function allowes for a
dense salvage of ordered I/O-pairs
33Data-Flow Architecture
Data discrepancy is low-level abnormal behavior
- When data flows over identical nodes,the typical
function can be characterized - Built-In Logic Block Observation
- The BIFBO can also be shared with neighboring
nodes - Built-In Function Block Observation
- The local test does not differentiate between
hardware and software
34Question 1
Is there an abstractional test?
- If you can not test it, then its not worth to
design it. - Hierarchical design needs a hierarchical test.
- Abstraction gives a condensed view on reality.
- Abstraction provides for scalability.
35Question 2
Is feature interaction really a static problem?
- Interaction is good, conflicts are less
- If resources have a state, access should be
bounded by state - Conflicting services pose basically a scheduling
problem - Its hard to schedule over an arbitrary network
36Question 3
Do neural networks provide for a built-in test?
- Design should be scalable test is no exception.
- Detection can do without diagnosisDiagnosis can
not go without detection. - Testing can be based on area (coverage) or on
frontier (sensitivity) - The boundary between software and hardware is
still moving