Title: Seamless Detection of Link and Node Failures for Local Protection in MPLS
1Seamless Detection of Link and Node Failures for
Local Protection in MPLS
- Zartash Afzal Uzmi
- Computer Science and Engineering
- Lahore University of Management Sciences (LUMS)
- Visiting Professor Chonbuk National University
2Outline
- Background
- Forwarding and Routing in IP and MPLS Networks
- Network Service Requirements
- Protection Routing in MPLS
- Terminology Types of Backup Paths
- Backup Bandwidth Sharing
- Activation sets
- Failures and Backup Path Activation
- Distinguishable Failure Events Ideal Case
- Actual Failures
- Control Plane Mechanism
- Outline of Proof
3Outline
- Background
- Forwarding and Routing in IP and MPLS Networks
- Network Service Requirements
- Protection Routing in MPLS
- Terminology Types of Backup Paths
- Backup Bandwidth Sharing
- Activation sets
- Failures and Backup Path Activation
- Distinguishable Failure Events Ideal Case
- Actual Failures
- Control Plane Mechanism
- Outline of Proof
4Forwarding and Routing
- Forwarding
- Passing a packet to the next hop router
- Routing
- Computing the best path to the destination
- IP routing includes routing and forwarding
- Each router makes the routing decision
- Each router makes the forwarding decision
- IP routing is hop-by-hop
- MPLS routing
- Only one router (source) makes the routing
decision - Intermediate routers make the forwarding decision
- An MPLS path or virtual circuit from source to
destination is created and is called an LSP
(label switched path)
5Network Service Requirements
- Bandwidth Guaranteed Primary Paths
- MPLS can establish bandwidth-guaranteed paths
- Bandwidth Guaranteed Backup Paths
- BW remains provisioned in case of network failure
- Two options for recovery from network failure
- Compute backup paths AFTER failures occur
- Compute and install PRESET backup paths
- Minimal Recovery Latency
- Recovery latency is the time that elapses
between - the occurrence of a failure, and
- the diversion of network traffic on a new path
Preset backup paths needed for minimal latency
6Outline
- Background
- Forwarding and Routing in IP and MPLS Networks
- Network Service Requirements
- Protection Routing in MPLS
- Terminology Types of Backup Paths
- Backup Bandwidth Sharing
- Activation sets
- Failures and Backup Path Activation
- Distinguishable Failure Events Ideal Case
- Actual Failures
- Control Plane Mechanism
- Outline of Proof
7Protection in MPLSPreset Backup Paths
Local Protection
Path Protection
S
1
2
3
D
This type of path Protection takes 100s of ms.
We need Local Protection to quickly switch onto
backup paths!
Primary Path
Backup Path
8nhop and nnhop paths
LOCAL PROTECTION (showing one LSP only)
All links and all nodes are protected!
nnhop
A
B
D
C
E
nhop
PLR Point of Local Repair
? nhop protects link only, e.g., (D,E) ? nnhop
protects link (C,D) and node (D)
9Opportunity cost of backup paths
- Protection requires that backup paths are setup
in advance - Upon failure, traffic is promptly switched onto
preset backup paths - Bandwidth must be reserved for all backup paths
- This results in a reduction in the number of
Primary LSPs that can otherwise be placed on the
network - Can we reduce the amount of backup bandwidth
but still provide guaranteed backups? - YES Try to share the bandwidth along backup paths
10BW Sharing in backup Paths
LSP1
BW X
Sharing is possible IF Links (A,B) and (C,D) do
not simultaneously fail!
A
B
X
X
max(X, Y)
X
E
G
F
XY
Y
Y
C
D
BW Y
LSP2
11Activation Sets
Can backup paths always share the bandwidth?
A
A
E
E
B
B
C
C
D
D
Activation set for node B
Activation set for link (A,B)
backup paths in the same activation set MUST not
share the bandwidth!
12Outline
- Background
- Forwarding and Routing in IP and MPLS Networks
- Network Service Requirements
- Protection Routing in MPLS
- Terminology Types of Backup Paths
- Backup Bandwidth Sharing
- Activation sets
- Failures and Backup Path Activation
- Distinguishable Failure Events Ideal Case
- Actual Failures
- Control Plane Mechanism
- Outline of Proof
13Distinguishable Failure Events
Point of local repair (PLR) somehow knows the
type of failure!
Focus on link (I,J) and Node J and recall
? nhop protects link only i.e., (I,J) ? nnhop
protects link (I,J) and node J
nnhop p1
A
J
I
K
nhop p2
PLR Point of Local Repair
L
p3
If node I finds that link (I,J) has failed p1
and p2 are activated If node I finds that node J
has failed ONLY p1 is activated
p2 may share bandwidth with other nnhops that
protect node j
14Actual Failures
- Consider the failure of link (I,J)
- Both p1 and p2 need to be activated, anyways!
- Knowing that this is a link failure will not save
anything - Consider the failure of node J
- Only p1 needs to be activated (if failure type is
known!) - What if node I doesnt know the type of failure?
- Two options
- Wait to discover if it was a link or node
failure - High recovery latency (BAD!)
- Activate both p1 and p2 instantaneously
- Now p2 will not be able to share with p3 (BAD!)
15Control Plane Mechanism
- Routing strategy
- Do not oversubscribe
- Use sharing as if adjacent nodes can distinguish
the node failures from the link failures - That is, provide sharing between p2 and p3
- In reality
- PLRs will not be able to disambiguate link/node
failures - Activate p1 and p2 (assuming link fail situation
worst case!) - If link had failed
- p1 and p2 really needed to be activated we are
okay! - If node had failed
- p2 (nhop) has been activated by mistake
- You may notice reservation violation at some
nodes (where the backup paths p2 and p3 were
sharing) - Abort all nhop paths that are violating the
reservations
16Outline of Proof
- Define
- Guv Bandwidth reserved on link (u, v) for all
backup LSPs - Iuv Actual backup bandwidth that falls on link
(u, v), after the occurrence of a failure - A reservation violation happens if Iuv gt Guv
- No oversubscription sharing between p2 and p3
- Guv max(bw(p1)bw(p2), bw(p1)bw(p3)) worst
case - When failure occurs, activate p1 and p2
- If it was link (I, J) that had failed, we are
okay - If it was node J that had failed, p3 also gets
activated - Worst case Iuv would have been bw(p1)bw(p2)bw(p3
) - Our control plane mechanism ensures Iuv
bw(p1)bw(p3) - This implies that Guv Iuv in the worst case
17