A Survey on High Availability Mechanisms for IP Services 11 October 2005 - PowerPoint PPT Presentation

About This Presentation

Title:

A Survey on High Availability Mechanisms for IP Services 11 October 2005

Description:

A Survey on High Availability Mechanisms for IP Services. 11 October 2005. N. ... No semantic to delimit a UDP connection. Maintains multiple purpose timers ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 39

Provided by: marjou

Category:

more less

Transcript and Presenter's Notes

Title: A Survey on High Availability Mechanisms for IP Services 11 October 2005

1
A Survey on High Availability Mechanisms for IP
Services11 October 2005

N. AYARI, FT RD., D. Barbaron, FT RD
L. Lefevre, INRIA P. Primet, INRIA

2005 High Availability and Performance Computing
Workshop (HAPCW'2005) Santa FE, USA
2
IntroductionDifferent types of clusters

MPP and SMP clusters,
Scalability via CPU and Memory interconnects
Using special purpose hardware and/or software,
High availability through
Job scheduling and migration,
Fault detection and check pointing.
Clusters of independent working nodes
Pretty alternative based on commodity hardware
and/or general purpose operating systems
Scalability achieved by efficient distribution
of the incoming requests on the available nodes
High availability ?
Service non interruption and service integrity

3
IntroductionScalability issues in clusters of
commodity hw/sw nodes

The request distribution should
Increase performance by
Improving the system responsiveness
Concurrent supported connections per unit of
time,
Keeping reasonable response times
When does the bottleneck is observed?
Support upper layer session integrity
Integrity depends on the switching granularity
On a per datagram, connection or session
distribution basis.

4
Switch designs

Can be
Stateless or Statefull
Applies to
Layer 4 switching
Uses 2-4 packet information (TCP/IP Model)
Layer 5 switching
Uses 2-5 packet information (TCP/IP Model)

5
Stateless vs Statefull switch designsStateless
switch design

Stateless switch design
Achieves a better latency by
Processing each datagram independently from its
predecessors
Does not maintain any state information
Implements service integrity
On a per connection basis in Layer 4 Switching
Uses hashing to compute the same cluster node
for all datagrams originated from the same client
identified by ltIP _at_, Port Number, Protocolgt.
On a per session basis in layer 5 Switching
Depends on the IP data application
- Cookie based persistency for web traffic
- Cookie Switching
- Cookie based hashing
What about other data applications?

6
Stateless vs Statefull switch designsStateless
switch design limitations

Upper layer session integrity
A request belonging to one session goes to the
wrong server
Hash Collisions needs robust hash functions
Fault node handling
When the hash function depends on the number of
active nodes
Replaying all sessions when one or more nodes
crash
Fair load distribution
The stateless nature uses static load balancing
Source Hashing,
While request have varying service time and
service resources
SIP long sessions, FTP bandwidth consuming
transfers, etc.

7
Stateless vs Statefull switch designsStatefull
switch designs

It aims to improve both
Upper layer session integrity
Maintaining connection/session STATES
Source and destination IP _at_, port numbers,
transport protocol
- No semantic to delimit a UDP connection
Maintains multiple purpose timers
Avoid maintaining inactive sessions/connections
- DDoS counter measure
Computes statistics on the client's session
duration average
Needs to speed up the lookup for each datagram
Use index hashing
Load distribution Fairness
Using service state aware load distribution
policies

8
Stateless vs Statefull switch designsStatefull
design limitations

Cost effectiveness
Server state distribution overhead
Efficiency depends on the granularity of the
switching operation
Layer 4 or Layer 5 ?
Does layer 4 scale for all IP services?
Load distribution fairness?
Decision taken on the first datagram in a
session/connection
Need new mechanisms

9
Fair Scheduling

How to measure load?
Using a robust, simple, quickly adapted summary
metric
CPU, Memory and Disk I/O utilization,
Number of active application processes and
connections,
The availability of network protocol buffers,
Number of active users.
Policies?
Static
Randomization, (Weighted) Round Robin,
Source/Destination Hashing.
Dynamic (Server/Client state aware)
(Weighted) Least connections, Short Expected
delay, Minimum misses,

10
Fair Scheduling

Policies?
Dynamic (Server/Client state aware) (cont.)
Cache affinity,
The file is partitioned among the nodes
SIETA (Size Interval Task assignment with equal
load),
The node is determined based on the 'size' of
the request
CAP (Client Aware Policy)
Consecutive connections from the same client
assigned to the same node
Admission Control Policies
Locality-Based Least-Connection, Locality-Based
Least-Connection with Replication.

11
Fair Scheduling

Policies?
Network traffic based balancing
Focus on predicting the volume of incoming
traffic from a source based upon past history
Priority based balancing
Assigns higher priority to some data traffic
Topology based Redirection
Redirect traffic to the cluster nearest the
client in terms of
Hop count (static),
Network latency (dynamic).
Application specific Redirection
Layer 5 load balancing specialize back end
servers for special contents or services
Etc.

12
Layer 4 SwitchingHow?

Works at the TCP/IP level
Content blind switching

Layer 4 switches
13
Layer 4 Switching A kernel implementation

The IP Virtual Server implementation
Supports NAT, DR, and Tunnelling
As add-on modules in the networking layer of the
kernel
Based on the Linux packet filtering and routing
capabilities
The Linux Virtual Server
A cluster of independently working nodes,
Using the IPVS load balancer.
Some recommendations WZ

14
Layer 4 switchingPerformance Single CPU Linux
2.2 LVS-NAT vs. LVS-DR scaling

Performance Rou2001.

15
Layer 4 switchingSome Layer 4 switching products
Two Way One Way One Way One Way
Packet double rewriting Packet single Rewriting Packet Tunneling Packet Forwarding
- Cisco's Local Director (commercial) - Magic Router (Berkley) - LSNAT - F5 Network's BIG-IP 5100 - LVS - Foundry Network's Server Iron - Cyber IQ's Hyper Flow - Coyote Point's Equalizer - TCP Router - LVS - IBM Network Dispatcher (Component of IBM Websphere NetEdge server) - OneIP (BellLabs) - LSMAC - Intel NetStructure - Traffic Director - Nortel Network's Alteon 780 series - Foundry Network's Server Iron - Radware WSD Pro - LVS - VA Balance (VA Linux Systems Japan)
16
Layer 4 SwitchingThe Net filter Capabilities and
Return Code
Return Code Meaning
NF_DROP Discard the packet
NF_ACCEPT Keep the packet
NF_STOLEN Forget about the packet
NF_QUEUE Queue packet for user space
NF_REPEAT Call this hook function again
17
Layer 4 SwitchingThe IPVS Architecture
18
Layer 4 SwitchingPersistency handling
19
Layer 4 switchingIssues

The persistence template for layer 4 switching
may not scale
Example VoIP data exchange using SIP
Different transport connections for different
transaction within the same SIP session.
Session corruption implies datagram losses
More Latency (TCP AIMD)

20
Layer 5 SwitchingThe solution?

The switch is also the single view of the cluster
The request distribution is done on the basis of
the load estimation on the cluster's nodes
the connection identifiers of the request
ltsource and destination IP _at_, source and
destination port nb, protocolgt
the session identifiers of the request and the
content type
Layer 5 header informations
Additional delay
Need to complete the connection to parse the data

21
Layer 5 SwitchingThe solution?
Layer 5 switches
22
Layer 5 SwitchingTCP Gateway, the problems

Cost effective
Multiple copies and context switching
The proxy becomes rapidly the bottleneck because
it is a two way architecture.

23
Layer 5 SwitchingTCP Splicing, the Packet
Mapping operations.

Modifications also affect
IP pseudo Header
Socket options

24
Layer 5 SwitchingTCP Splicing message timeline,
the Delayed Binding.

25
Layer 5 SwitchingTCP Splicing, the issues

Delayed binding
Double processing overhead
Two way switch mechanism
Buffer size for large scale forwarders
The transition between the control mode and the
forwarder mode
Delay the activation of the spliced connection
until the buffers got drained.
Forwarding data concurrently with draining the
buffers.
End-to-end Flow Control
From Small/Big AdvWin to Big/Small AdvWin

26
Layer 5 SwitchingTCP Splice improvements

Pre forking TCP splice
Reduce the three way handshake cost
Pre-allocate Server Scheme
Guess Real Server on receipt of the TCP Sync
Etc.

27
Layer 5 SwitchingTCP Handoff

One way mechanism
Migrate the TCP connection from the Front end to
the back end servers using the Handoff protocol
Msg/Ack
MagicNberHdPrIdentifier, ConnMagicNxtSeqNber,
AckMsg informs of the hdoff result
The connection is done without going through the
Three Way handshake procedure.

28
Layer 5 SwitchingTCP Handoff message timeline
29
TCP Handoff vs TCP Splice

Based on LVS TCPSP and TCPHA 2.4 kernel
implementations
Throughput (13 KB file)
Overhead due to L7 processing front-end -gt
bottleneck -gt low scalability

Apache throughput (conn/sec)
Back End nodes in cluster
30
Layer 5 SwitchingThe limitations

Highly available connections?
Connection failover
One way vs two way architectures
Improvements on TCP Handoff
Actual implementations do not cover all data
traffic

31
Layer 5 SwitchingSome layer 5 switching products
Two Way Architecture Two Way Architecture One Way Architecture One Way Architecture
TCP Gateway TCP Splicing TCP Handoff TCP Connection Hop
- IBM Network Dispatcher CBR - CAP (Client Aware Proxy) - Vovida's Load balancer Proxy - Foundry Network Server Iron - Radware WSD Pro - Hydra WS Hydra2500 - Alteon Applications switching series from Nortel - Sharp Corporation Super Proxy - Resonate's Central Dispatcher (with redirection capabilities) - Cisco's CSS 11500 (Content Service switch) - OpenFusion Load balancing service for Corba based applications and services from PrismTech Kemp technologies LoadMaster series (2460, 2860, etc.) Sun Fire B10n Content Load Balancing Blade switch (Tunneling based) Procera MLXP Layer 5 switch OctaGate Smart Web switch - Extreme Network layer 5 CA switch device - ScalaServer - TCPHA - Resonate's Central Dispatch
32
High Availability

How to detect that a member has failed?
Pings, timeouts,
Heartbeat message exchange
Status, cluster transition and retransmission
messages
TCPHA include state message exchange
The accuracy of the failure detection
Timeouts with multiple retries detect failure
accuracy with high probability
How to recover from failover
a load balancer failover
State synchronization
Subsystem failover
IP Takeover through channel bonding
Application Failover
The Linux watchdog timer interface, etc.

33
High Availability

More on connection failover
Through connection migration and reliable sockets
Different from TCP Handoff
Include
Migratory TCP
Fault tolerant TCP
Connection passing

34
High AvailabilityThe accuracy in distributed
architectures

DNS scalability through site redundancy
DNS SRV RR used in service location
Localizing available SIP proxies
The effectiveness of DNS based scalability and
failover are corrupted by the DNS cache updates
frequency.

35
High AvailabilityThe accuracy in distributed
architectures

RSerPool

36
High availabilityOther tips for distributed
architectures

Multicast
Needs explicit support of all routers within the
client server path
IP Anycast route redundancy
Different servers running the same service can
all have the same anycast _at_ on one of their
interfaces
If server fails, the router will update its route
to the nearest available node
Depends on router's update frequency

37
Conclusion and Future directions

Further work will address
Kernel implementation of layer 5 switching to
handle session oriented data transfers.
Improvements on the forwarder kernel component
Fair load distribution in session oriented data
transfers.
IPv6 compliance?
Security concerns in connection failover