Edward Chow Content Switch 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Edward Chow Content Switch 1

Description:

Persistence Handling in LVS Some network applications require ... NP Embeded Processor Complex ... rsync rsync File System Replication WWW WWW ... – PowerPoint PPT presentation

Number of Views:281
Avg rating:3.0/5.0
Slides: 107
Provided by: csUccsEd86
Learn more at: http://cs.uccs.edu
Category:

less

Transcript and Presenter's Notes

Title: Edward Chow Content Switch 1


1
Introduction to Content SwitchC. Edward
ChowDepartment of Computer ScienceUniversity of
Colorado at Colorado Springschow_at_cs.uccs.eduThi
s tutorial is available at http//cs.uccs.edu/ch
ow/pub/agere/contentswitch.ppt
  • With agere as login and ag2003ere as password

2
Outline of the Talk
  • Overview of Content Delivery Network and Linux
    Virtual Server Technologies.
  • Overview of Content Switching Concepts
  • TCP Delayed Binding and Their Improvement
  • Conflict Detection in Content switching Rule Set
  • Persistent Issues
  • Problems Encountered in Content Processing and
    their Solutions
  • Specific Implementations and Their Performance
  • Achieving High Availability with Content Switch.

3
Content Delivery Network (CDN)
Slow Response
Huge Requests
_at_Home
Clients
PSINet
Server Crash
MindSpring
Clients
4
Content Delivery Problems
http//www.akamai.com
5
Use Client Cache/Client Side Cache Server
Fewer Requests
Clients
_at_Home
PSINet
Fast Response
Sprint
UUnet
Client Cache
Gloobix
MindSpring
Client Side Cache Server
Clients
Clients
6
Use Mirror Sites
Need improvement by guiding the selection of
mirror servers with server load/network bandwidth
measurement
Mirror Site
Clients
_at_Home
PSINet
Clients
MindSpring
Fast Response
Clients
7
Edge Network Cache Servers
Fast Response
Clients
_at_Home
PSINet
Client Cache
MindSpring
Edge Network Cache Server
Client Side Cache Server
Clients
Clients
8
Content Delivery Problem
  • Cache Location Problem Where to put cache
    servers?
  • How many are needed?
  • When/where/how to push/delivery the content?
  • How about dynamic content?

9
Akamai Edge Delivery Service
Date of Edge Servers of Networks of Countries
11/2000 6000 335 54
6/2001 9700 650 56
  • Peering Bottleneck Problem Access traffic
    evenly spread over 7400 networks (no one over
    5 most ltlt 1)? Need to put edge servers in
    many networks.
  • 11/2000, 4 billion bits/day for 2800 sites.
  • Source Http//www.akamai.com

10
Caching Dynamic Content at Web Proxies
  • Active Cache Project PeiCao 98 Univ.
    Wisconsin
  • Cache Java applet to be executed at proxies
  • Choice of passing to server, delivery cached
    copy, or generate dynamically.
  • Edge Side Include (ESI)
  • XML tag to specify ESI fragment in a web page.
  • Each ESI fragment can have different cache/

11
Edge Side Include Examplehttp//www.esi.org/
lttablegtlttrgtlttd colspan2gtltesitrygt
ltesiattemptgt ltesiinclude
srchttp//www.myxyz.com/news/top.html
onerrorcontineu /gt lt/esiattemptgt
ltesiexceptgt lt!- -esi This spot is
reserved for your companys advertising.
For more info lta hrefwww.myxyz.comgt click
here lt/agt - - gt lt/esiexceptgtlt/esitrygt
lt/tdgtlt/trgtlt/tablegt
12
Solution to First Mile Problem
  • First Mile Problem Hugh requests at web site of
    CDN
  • High Bandwidth Connection
  • Caching
  • End System Cache
  • Client Cache
  • Client Site Proxy Cache Server
  • Mirror Site Caches
  • Cache Servers in Internet
  • Hierarchical Cache Servers, e.g.,
    Squid/Harvest/Adaptive Web
  • Edge Servers of Akamai
  • Faster Server/Server Farm (Server Side
    CachingCluster)
  • Layer4 Load balancerReal Servers
  • Content SwitchReal Servers
  • Distributed Packet Rewrite

13
Web Server Cluster
  • Load balancer can run at
  • Application Level Reverse Proxy
  • Kernel level Linux Virtual Server
  • Load balancer can distribute requests based on
  • Layer 3-4 info fixe field/fast hash
  • Layer 7 info var. length/slow parsing

14
Comparison of Load Balancers
  • Reverse Proxy runs as application process
    requires more memory/packet copying.
  • Linux Virtual Server runs in kernel?no memory
    copying

Name Type Level Layer Info
Reverse Proxy/Apache/Tomcat/Servlet SW Application 3-7
Linux Virtual Server SW Kernel 3-4
Linux Content Switch SW Kernel/Appl. 3-7
Layer4 Switch (narrow def.) HW Embedded OS 3-4
Content/Web Switch HW Embedded OS 3-7
15
Linux Virtual Server (LVS)
  • Virtual server is a highly scalable and highly
    available server built on a cluster of real
    servers. The architecture of the cluster is
    transparent to end users, and the users see only
    a single virtual server with Virtual IP address
    (VIP).
  • Http//www.linuxvirtualserver.org/

RIP1
Real Server1
RIP2
WAN/LAN
Internet
VIP
RIP3
CIP
Load Balancer/Director Linux Box
Real Server3
Client
CIP Client IP Address VIP Virutal IP
Address RIP Real Server IP Address
16
LVS-NAT Configuration (Network Address
Translation)
  • All return traffic go through Director?Slow
  • Modify IP addr/port /Checksum at Director
  • Director and real servers at same LAN
  • No modification needed on real-servers
  • Port remapping real web server can run on 8080

RIP1
RIP2
Internet
VIP
RIP3
Switch
Director
CIP
Client
17
LVS-NAT Configuration Step 2. Director routes Pkt
  • Based on CIP, source port, VIP and dst port,
    director selects one of the real servers
  • Change the dst IP addr or port of pkt.

RIP1
2. Scheduling/Rewrite packet
1. request
RIP2
Internet
VIP
Director
RIP3
Switch
CIP
ipvsadm cmd
Client
LVS RoutingScheduling Rules
18
LVS-NAT Configuration Step 3. Real Server Replies
  • Real server retrieves response.
  • All real servers set default gateway to Director
    like any other NAT or IP masquerade setup
  • Packet will be sent back to Director.

3. Process Request
RIP1
2. Scheduling/Rewrite packet
1. request
RIP1 CIP
RIP2
Internet
VIP
Director
RIP3
Switch
CIP
Client
19
LVS-NAT Configuration Step 4. Director rewrites
reply
  • Director changes the dst IP addr. (RIP1) of pkt
    to VIP
  • Modify port if needed.
  • Modify the checksum send back pkt.

3. Process Request
RIP1
2. Scheduling/Rewrite packet
1. request
RIP1 CIP
RIP2
Internet
VIP
RIP3
Switch
Director
CIP
4. Rewrite reply
Client
20
LVS-NAT Configuration (Network Address
Translation)
  • All return traffic go through Director?Slow
  • Modify IP addr/port /Checksum at Director.
  • Director and real servers at same LAN

3. Process Request
RIP1
2. Scheduling/Rewrite packet
1. request
RIP1 CIP
RIP2
Internet
VIP
Director
RIP3
Switch
CIP
Client
4. Rewrite reply
5. Receive reply
21
LVS-NAT Setup Commands
  • make the director forward the masquerading
    packets
  • echo 1 gt /proc/sys/net/ipv4/ip_forward
  • ipchains -A forward -j MASQ -s 172.16.0.0/24 -d
    0.0.0.0/0
  • Add virtual service and link a scheduler to it
  • ipvsadm -A -t 202.103.106.580 -s wlc (Weighted
    Least-Connection scheduling)
  • ipvsadm -A -t 202.103.106.521 -s wrr (Weighted
    Round Robin scheduling )
  • Add real servers and select forwarding method
    and weight
  • ipvsadm -a -t 202.103.106.580 -R 172.16.0.280
    -m
  • ipvsadm -a -t 202.103.106.580 -R 172.16.0.38000
    -m -w 2
  • ipvsadm -a -t 202.103.106.521 -R 172.16.0.221
    -m

22
LVS-Tunnel Configuration(IP Tunneling)
  • Real Servers need to handle IP over IP packets.
  • Real Servers can be geographically separated and
    return traffic go through different routes.
  • Security implication!

RIP1
2. Scheduling/Put packet in IP Tunnel
3. Process Request
IP Tunnel
1. request
RIP2
IP Tunnel
RIP0
RIP0 RIP2
Internet
VIP
Load Balancer Linux Box
CIP
RIP3
IP Tunnel
Client
4. Receive reply
23
LVS-Tunnel Setup Commands
  • The load balancer (LinuxDirector), kernel 2.2.14
  • echo 1 gt /proc/sys/net/ipv4/ip_forward ipvsadm
    -A -t 172.26.20.11023 -s wlc ipvsadm -a -t
    172.26.20.11023 -r 172.26.20.112 -i
  • The real server 1, kernel 2.2.14
  • echo 1 gt /proc/sys/net/ipv4/ip_forward
  • insert it if it is compiled as module insmod
    ipip ifconfig tunl0 172.26.20.110 netmask
    255.255.255.255 broadcast 172.26.20.110 up
    route add -host 172.26.20.110 dev tunl0 echo 1
    gt /proc/sys/net/ipv4/conf/all/hidden echo 1 gt
    /proc/sys/net/ipv4/conf/tunl0/hidden

24
LVS-DR Configuration (Direct Routing)
  • Real servers need to configure a non-arp alias
    interface with virtual IP address and that
    interface must share same physical segment with
    load balancer.
  • Only Directors interface replies to VIP ARP
    request.
  • Director only rewrites server MAC address IP
    packet not changed? Fast!

2. Scheduling/Rewrite packet
VMAC
Director
RMAC1
1. request
RMAC2
Internet
RMAC3
CIP
Route/Switch
Client
GMAC Gateway MAC address
25
LVS-DR Configuration Step 3. Process Request
  • Real server returns request.
  • Request goes directly throughswitch/router
    not Director.

2. Scheduling/Rewrite packet
LinuxDirector
VMAC
RMAC1
1. request
RMAC2
Internet
RMAC3
CIP
Switch
3. Process Request
Client
4. Receive reply
GMAC Gateway MAC address
26
LVS-DR Setup Commands
  • The load balancer (LinuxDirector), kernel 2.2.14
    or laterecho 1 gt /proc/sys/net/ipv4/ip_forward
    ipvsadm -A -t 172.26.20.11023 -s wlc ipvsadm
    -a -t 172.26.20.11023 -r 172.26.20.112 g
  • The real server 1, 172.26.20.112, kernel 2.2.14
    or later
  • echo 1 gt /proc/sys/net/ipv4/ip_forward ifconfig
    lo0 172.26.20.110 netmask 255.255.255.255
    broadcast 172.26.20.110 up route add -host
    172.26.20.110 dev lo0 echo 1 gt
    /proc/sys/net/ipv4/conf/all/hidden echo 1 gt
    /proc/sys/net/ipv4/conf/lo/hidden

27
Performance of LVS-based Systems
  • We ran a very simple LVS-DR arrangement with one
    PII-400 (2.2.14 kernel)directing about 20,000
    HTTP requests/second to a bank of about 20 Web
    servers answering with tiny identical dummy
    responses for a few minutes. Worked just fine.
    Jerry Glomph Black, Director, Internet
    Technical Operations, RealNetworks.
  • I had basically (1024) four class-Cs of virtual
    servers which were loadbalanced through a
    LinuxDirector (two, actually -- I used redundant
    directors) onto four real servers which each had
    the four different class-Cs aliased on them.
    "Ted Pavlic" lttpavlic_at_netwalk.comgt

28
LVS Usage Survey 2/15/2001 Lorn Key
Clusters 20 1 2 2 2
Directors Per Cluster 2 2 2 2 2
Total Real Servers 170 12 4 15 6
RoutingMethods DR/NAT DR NAT DR NAT
ScheduleMethods RR/WLC WRR LC WLC WLC
Types of Real Servers RH6.2 Linux WinLinux LinuxSolaris RH
ServiceOffered WWW WWW/other WWWDB WWWSMTP WWW
File SystemReplication rsync rsync CodaNFS Custom rsynccustom
MonitoringSoftware Heartbeatldirectord Nanny/Pulse HeartbeatMon NannyPulse Heartbeat
29
  • C. Edward ChowDepartment of Computer
    ScienceUniversity of Colorado at Colorado
    Springs
  • Sponsored by Computer Comm. Lab/ITRI

30
Content Switch Topics
  • What is a Content Switch?
  • What Services it Can Provide
  • Content Switch Example
  • Related Technologies
  • Content Switch Architecture and Basic Operations
  • TCP Delay Binding and Related Improvement
  • Content Switch Rule and Conflict Detection
  • Conclusion

31
Content Switch (CS)
  • Route packets based on high layer (Layer 5/7)
    headers and content.
  • Examples
  • Direct Web traffic based on pattern of
  • URLs, cookies URL Switching
  • XML Tag Value Web Switching
  • Can Route incoming email based on email
    addressConnect POP/IMAP based on login
  • Web switches and Intel XML Director/accelerator
    are special cases of content switch.

32
What Services It Can Provide
  • Enabling premium services for e-commerce, ISP,
    and Web hosting providers
  • Load Balancing and High Available Server
    Clusters Web, E-commerce, Email, Computing,
    File, SAN
  • Policy-based networking, differential/QoS
    services.
  • Firewall, Strengthening DoS protection,
    cache/firewall load-balancing
  • Flash-crowd' management
  • Email Spam Protection, Virus Detection/Removal
  • Applet Authentication/Filtering

33
F5 VRM Solution
34
ServerIron 100 Web Switch
  • Integrated Layer 2 through Layer 7 switching
  • Support for up to 7,000,000 concurrent sessions,
    and 20 Gbps of throughput
  • High-availability server load balancing with
    active/active configuration and stateful
    fail-over
  • Industry's most powerful content switching
    capabilities, including URL, Cookie and SSL
    Session ID based switching
  • Content-aware cache switching
  • High performance VPN/Firewall load balancing
  • Robust protection against Denial of Service (DoS)
    attacks
  • Most comprehensive global server load balancing
    with DNS Proxy and client proximity measurements

35
Cisco CSS11000 Content Service Switch
comprises four high-speed RISC processors, with
512 MB of memory, and 20.0 Gbps of throughput,
Distributed flow forwarding engines feature up to
16 port-level network processors with up to 128
MB of memory for wire-speed delivery of Web
content. Support for "sticky" connections based
on IP address, Secure Socket Layer (SSL) session
ID, and cookies ensures reliability and security
for e- commerce transactions. The unique Cisco
content replication technology enables dynamic
expansion of site capacity in response to sudden
"flash crowds" for "hot" content or seasonal
peaks in traffic that can overwhelm servers.
36
Nortel Alteon Web Switch
  • Provides wire-speed Layer 2/3 Ethernet switching,
    plus high-speed processing based on Layer 4
    through 7 information (TCP ports, URLs, HTTP
    headers and cookies, SSL session ID, etc.)
  • Processes hundreds of thousands of concurrent
    sessions each second on eight multi-rate Ethernet
    ports, (rate selectable per port), with one
    Gigabit or 100/1000 Mbps Ethernet uplink port
  • Performs local and global server load balancing,
    application redirection, content filtering,
    streaming media load balancing, wireless Internet
    load balancing and content-aware Layer 7
    switching
  • Filters packets based on up to 2048 filtering
    rules (224 filtering rules for Alteon AD3/180e
    Web Switches), uniquely definable per switch and
    per port
  • Meters, controls, and accounts for bandwidth
    use-by client, server farm, virtual service,
    application, user class, content type and other
    traffic classes-and supports guaranteed minimum,
    metered available, and maximum burst bandwidth
    rates

37
Intel Netstructure XML Director 7280
  • Example of RuleServer1 create /order.asp
    //AmountValue gt 10000

38
Phobos In-Switch
  • Only load balancing switch in a PCI card form
    factor
  • Plugs directly into any server PCI slot
  • Supports up to 8,192 servers, ensuring
    availability and maximum performance
  • Six different algorithms are available for
    optimum performance Round Robin, Weighted
    Percentage, Least Connections, Fastest Response
    Time, Adaptive and Fixed.
  • Provides failover to other servers for
    high-availability of the web site
  • U.S. Retail 1995.00

39
E-Commerce Example 1. Client
  • Client submits via HTTP/Post (or SOAP) the
    following purchase in XML
  • ltpurchasegt
  • ltcustomerNamegtCCLlt/customerNamegt
  • ltcustomerIDgt111222333lt/customerIDgt
  • ltitemgtltproductIDgt309121544lt/productIDgt
  • ltproductNamegtIBM Thinkpad T21lt/productNamegt
  • ltunitPricegt5000lt/unitPricegt
  • ltnoOfUnitsgt10lt/noOfUnitsgt
  • ltsubTotalgt50000lt/subTotalgt
  • lt/itemgt
  • ltitemgtltproductIDgt309121538lt/productIDgt
  • ltproductNamegtIntel wireless LAN PC
    Cardlt/productNamegt
  • ltunitPricegt200lt/unitPricegt
  • ltnoOfUnitsgt10lt/noOfUnitsgt
  • ltsubTotalgt2000lt/subTotalgt
  • lt/itemgt
  • lttotalAmountgt52000lt/totalAmountgt
  • lt/purchasegt

40
E-Commerce Example 2. Content Switch
  • Content switch receives the packet.
  • Recognize it is a http post request from http
    request line POST /purchase.cgi HTTP/1.1
  • Recognize it is an XML document from the meta
    headercontent-type TEXT/XML
  • Parsing XML content
  • Extract values of tag sequences
    52000 purchase/totalAmount
    CCL
    purchase/customerName
  • Rule 1 is matched and packet is routed to one of
    highSpeedServers.Rule 1 if (xml.purchase/totalAm
    ount gt 5000) routeTo(highSpeedServers)Rule 2
    if (xml.purchase/customerName CCL)
    routeTo(specialCustomerServers)

41
No Free LunchPenalty of Having Content Switch
  • ? Increased packet processing time.
  • For XML Director/Accelerator, it needs to parse
    XML document and match tag sequences.? 1-3?
    order of processing time

42
Related Technologies
  • Application level solution Proxy server
    Apache/Tomcat/Servlet Microsoft NLB
  • Kernel level layer 4 load balancing solution
    http//www.linuxvirtualserver.org/
  • Joseph Marks presentation
  • LVS-NAT(Network Address Translation) web page
  • LVS-IP Tunnel web page
  • LVS-DR (Direct Routing) web page
  • Hardware solution Cisco 11000, F5 (Big IP),
    Alteon Web Systems, Foundry Networks
    (ServerIron),Excellent information on Foundry
    ServerIron Installation and Configuration Guide,
    May 2000. http//www.foundrynet.com/services/docum
    entation/siug/

43
Basic Operations of Content Switching
CS Content Switching
CS RuleEditor
CS Rules
Incoming Packets
Packet Classification
Header ContentExtraction
CS Rule Matching Algorithm
Forward Packet To Servers
Packet Routing(Load Balancing)
Network Path Info
Server Load Status
44
Content Switch Architecture
  • Apostolopoulos Infocom 2000

45
Content Switch Architecture
Case A Controller finds there is an entry in
its Hash Table, Route request to sticky
connection outgoing port
Hash Table
46
Content Switch Architecture
Case B Step 1. Controller finds there is no
entry in Hash Table, Route request to content
switch processor
Hash Table
47
Content Switch Architecture
Step2. CS processora. Extract content/Match CS
rulesb.Route requestc. Setup Sequence
modification on server side port
Case B Step 1. Controller finds there is no
entry in Hash Table, Route request to content
switch processor
Hash Table
48
Content Switch Architecture
Step2. CS processora. Extract content/Match CS
rulesb.Route requestc. Setup Sequence
modification on server side port
Case B Step 1. Controller finds there is no
entry in Hash Table, Route request to content
switch processor
Step 3. At server side port, Return pkts are
modified Sequence/IP addr/ChksumRoute back to
client
Hash Table
49
Efficient Content Switching Architecture
  • Tasks Million packets with thousand of rules to
    match and load balancing algorithms to run.
  • How to assign tasks to the (network) processors
    and threads?
  • Packet Extraction (Understand header formats,
    XML parsing)
  • Content Switching Rule Matching
  • Packet Routing (Load Balancing, Bandwidth
    Control)
  • How Much Packet Processing Should Controllers
    Do?
  • What a controller can do?
  • A Typical Parallel Processing Problem?

50
TCP Delay Binding (Splicing)
client

server

content switch


SYN(CSEQ)
step1


step2

SYN(DSEQ)


ACK(CSEQ1)
step4


SYN(CSEQ)

step5

SYN(SSEQ)

step6


ACK(CSEQ1)

step7

ACK(SSEQ1)


step8
DATA(CSEQ1)

ACK(SSEQ1)

DATA(SSEQ1)
DATA(DSEQ1)
step9



ACK(CSEQlenR1)
ACK(CSEQLenR1)
step10


ACK(DSEQ
lenD1)

ACK(SSEQlenD1)

step11
lenR size of http request.

.

lenD size of return document
51
Improve Content Switching
  • Setup CS-Real Server connections ahead of time
    (Persistent HTTP Connections). NetScale? Reduce
    TCP 3-way handshake time
  • Pre-allocate Server Scheme (Guess Real Server
    based on the TCP Sync)
  • Sequence modification on every return pkt ? Need
    to recompute checksum also.
  • Filter Scheme (Offload Sequence
    modification/rule matching to real servers).
  • Buffering/Pipeline (aggregate) Requests

52
Pre-Allocate Server Scheme
Pre-allocated server
client


content switch


SYN(CSEQ)
SYN(CSEQ)

step1

SYN(SSEQ)
SYN(SSEQ)
step2




ACK(CSEQ1)

ACK(CSEQ1)

step4
DATA(CSEQ1)
DATA(CSEQ1)
  • Guess routing decision based on IP/Port/History
  • Advantage
  • Faster than TCP delay binding.
  • Possible direct route between client and server
  • Reduce session processing overhead no need to
    convert server sequence

.
53
Degenerated to TCP Delayed Binding If Guess is
Wrong
Pre-allocated server
client


content switch


SYN(CSEQ)
SYN(CSEQ)
step1


SYN(SSEQ)/ ACK(CSEQ1)
step2
SYN(SSEQ)/ ACK(CSEQ1)








step4
DATA(CSEQ1)/ACK(SSEQ1)
DATA(CSEQ1)/ ACK(SSEQ1)

Server sent HTTP 404
FIN(CSEQlenR1))
step6
Right server
step7
SYN(CSEQ)


SYN(RSEQ)/ ACK(CSEQ1)

step8


Sequence conversion needed for right server now
ACK(RSEQ1)

step9



step10

DATA(SSEQ1)/ACK(CSEQLenR1)
DATA(RSEQ1)/ACK(CSEQlenR1)
step11

step12
ACK(SSEQlenD1
ACK(RSEQlenD1)
54
Filter Process Scheme
Filter Processrun on server
client

server

content switch


SYN(CSEQ)
step1





step3


DATA(CSEQ1)/ACK(DSEQ1)
step4
step5b
SYN(CSEQ)
Migrate(Data, CSEQ, DSEQ)

step5
a

SYN(SSEQ)/ ACK(CSEQ1)

step6


step7

DATA(CSEQ1)/ACK(SSEQ1)

step8






ACK(DSEQ
lenD1)

ACK(SSEQlenD1)

step10
55
Pre-allocate performance plot
Series 1 - Basic scheme with no rule matching
module inserted, i.e., using default
IPVS.Series 2 - Basic scheme with the rule
matching module inserted.Series 3 -
Pre-allocate scheme with all hits, i.e., where
all pre-allocate guesses were correct.Series 4
- Pre-allocate scheme with all misses, i.e.,
where all pre-allocate guesses were wrong.
56
Handling multiple requestsin a Keep-Alive
connection
  • Determine when new request arrives
  • Verify that previous request has been completely
    received
  • Request data size is gt 0
  • Key assumption is only one outstanding request is
    sent at a time by client, i.e., requests are not
    pipelined
  • Reuse connections
  • Store each connection control information in a
    hash table keyed by real server address, once it
    is established.

57
Quiz
  • Web server keeps the TCP connection alive,
    expecting the browser to return for images and
    in-line media files.
  • How many keep-alive connections are setup on IE5
    and Netscape 4.7 for web page with many .jpg/.gif
    images?
  • Can these image requests be pipelined from client
    browser to web server?

58
Multiple HTTP Requests from One TCP Connection
NAT approach
server1
uccs.gif
ContentSwitch
server2
client
. .
cs.jpg
Index.htm
.
rocky.mid
server9
  • A keep alive TCP connection may include multiple
    HTTP GET requests.
  • Content Switch examines each GET request and
    makes new routing decision.
  • Content Switch establishes another connection
    with a different server based on the
    routing decision.
  • Those HTTP responses from different servers need
    to be interleaved and seen by the user as if
    from the same server.
  • Solutions In order delivery (buffer
    requirement) Out of order delivery (seq
    tracking)?
  • Problems Should we throw away earlier html
    requests if receive later requests?

59
Multiple HTTP Requests from One TCP Connection
server1
uccs.jpg
ContentSwitch
server2
client
. .
.
server9
rocky.mid
cs.gif
  • Can servers return documents directly to client
    in keep-alive session case?
  • Can equivalent VS-Tunnel or VS-DR be implemented
    using Content Switch?

60
Content Switch Rule Survey
  • Survey shows that existing switches support
  • rules in basic (condition action) or (action
    condition) form
  • some define condition as class, then specify the
    action in separate statement or command
  • simple single conditional term
  • command line interface (to facilitate incremental
    update?)
  • Actions can include reject, forward, put in queue
    (for bandwidth control, scheduling)

61
Content Switch Rule Design
  • Rule syntax generic to support all Intended
    features.
  • Use simple C if statement syntax rule if
    (condition) action
  • Easy to read
  • Allow optimization using c compiler
  • Condition consists of multiple terms of
  • variable relational_operator value e.g.
    xml.purchase/totalAmount gt 50000 smtp.to
    chow_at_cs.uccs.edu cookie.name
    servlet1 bitmatch(64, 8, 0xff) 64
    above mean TTL64 idea from netfilter
    universal filter
  • suffix(variable, string) e.g. suffix(url,
    gif)
  • regex(variable, pattern) e.g. regex(url,
    /purchase)
  • Action consists of reject, forward(server
    queue)loadBalance(serverGroup,
    loadBalancingAlgorihtm)

62
Efficient CS Rule Matching
  • Brute force, strict priority Rules are executed
    in sequential manner.
  • Efficient Rule Matching Method
  • Organize Rules so that rules can be skipped
    based on existing content types.
  • Utilize compiler optimization technique.

63
Simple CS Rule Editor GUI
64
Conflict Detection on Content Switching Rules
  • Detect conflicts among rules or rule set.
  • Absolute conflict type r1 if
    (xml.purchase/customerName CCL)
    routeTo(r1)r2 if (xml.purchase/customerName
    CCL) routeTo(r2)
  • Potential conflict type r1 if
    (xml.purchase/totalAmount gt 5000)
    routeTo(quickServers)r2 if (xml.purchase/total
    Amount gt20000) routeTo(superServers)
  • Algorithm Build tree with the same variable,
    check operator and value to see if they are the
    same or lead to potential conflict, compare
    actions to decide conflict type or duplication.
  • Developed conflict detection algorithm for rules
    with multiple term condition. Can be applied to
    policy-based rules conflict detection.
  • Editor can build these trees while a user enters
    rules and warns about conflict right away.

65
XML Tag Value Extraction
  • A xmlContentExtract() is built to extract the tag
    values of a list of unique tag sequences.
  • It is based on clark coopers expat 1.0
    xmlparser.
  • Its argument include the pointer to an XML
    document, the pointer to the array of strings
    (unique xml tag squences we follow the xsl
    selector syntax), and the number of sequences.
  • It return the list of a structure node, with the
    tag sequence, its attribute, and its value.
  • Currently, it supports one attribute and tag
    sequece needs to be unique.

66
Persistence Handling in LVS
  • Some network applications require packets from
    same users/sessions be routed to same real
    servers.
  • For consistent treatment?
  • For fast performance, e.g. servers maintain
    persistent data/info for sessions
  • Tomcat web server returns cookie value so that
    return client requests can be routed to the same
    Tomcat web server.
  • But cookie value is in HTTP header, a Layer 7
    info. Layer 4 switch cannot access it.
  • This is so called persistence handling problem.
  • One solution Sticky connection. Same IP address
    served by same server.

67
Persistent handling Problems
  • FTP Case
  • Normally FTP uses port 21 for control, port 20
    for data.
  • But for passive FTP, the server tells the clients
    the port that it listens to. The client
    initiates the data connection connecting to that
    port.
  • For the LVS/TUN and LVS/DR, LinuxDirector is
    only on the client-to-server half of the
    connection, so it is impossible for LinuxDirector
    to get the data port from the packet that goes to
    the client directly.
  • SSL Session Case
  • port 443 for secure Web servers and port 465 for
    secure mail server,
  • key for connection must be chosen/exchanged and
    only the initial real server has the key.
  • Persistent or sticky connection is needed.

68
Persistent Connection Solution
  • When the client first accesses the service,
    LinuxDirector creates a template between the
    given client and the selected server, then create
    an entry for the connection in the hash table.
  • The connections for any port from the client will
    send to the server before the template expires.
  • The template expires in a configurable time, and
    the template won't expire until all its
    connections expire.
  • The timeout of persistent templates can be
    configured by users, and the default is 300
    seconds

69
Problems Encountered in The Design of Linux-based
Content Switch
  • Handle a Request Contained in Multiple Packets
  • Handle Different Data Encoded Methods
  • Allow Referencing Specific XML Tags
  • Handle Long Transactions in SSL and Email network
    services

70
Handle a Request Contained in Multiple Packets
  • For a long request, its headers and content will
    be carried by the multiple packets due to packet
    size limitation.
  • We have observed Netscape 4.7 spliting a short
    request lt1000 into two packets
  • Due to interleaving with other sessions, packets
    of the same session may not be allocated
    consecutive memory.
  • Even packets of the same session arrives without
    interleaved with packets of other sessions,
    application level data will be fragmented in
    kernel packet buffer such as skbuf.
  • Matching application data pattern in the kernel
    is tricky.

71
Example Determine Content Length
  • TCP Segment n contains
  • POST /cgi-bin/cs622/purchase.pl HTTP/1.0\r\n
  • Referer http//archie.uccs.edu/acsd/lcs/xmldemo.
    html\r\n
  • Connection Keep-Alive\r\n
  • User-Agent Mozilla/4.75 en (X11 U Linux
    2.2.16-22enterprise i686) \r\n
  • Host viva.uccs.edu\r\n
  • Accept image/gif, image/x-xbitmap, image/jpeg,
    image/pjpeg, image/png, /\r\n
  • Accept-Encoding gzip\r\n
  • Accept-Language en\r\n
  • Accept-Charset iso-8859-1,,utf-8\r\n
  • Content-type application/x-www-form-urlencoded\r\
    n
  • Content-length 7
  • TCP Segment n1 contains
  • 53\r\n
  • data (753 bytes)

72
Potential Solutions
  • Allocate application data of a session in the
    consecutive memory? Major rework on most kernel
    packet buffer allocation scheme.
  • Use carry lookahead memory hardware.
  • Coding complicated pattern matching code that can
    match pattern over fragmented data.
  • Use application level content switching? bear the
    overhead of data copying from kernel to
    application level.

73
Handle Different Data Encoding Methods
  • XML data can be passed in plain/text.
  • When submitting it with form, the XML request
    data are encoded using the x-www-form-urlencoding
    method
  • When extracting XML data for rule matching,
    different data encoding methods need to be
    detected through the content-type header.

74
An E-Commerce XML Example
  • Client submits via HTTP/Post (or SOAP) the
    following purchase in XML
  • ltpurchasegt
  • ltcustomerNamegtCCLlt/customerNamegt
  • ltcustomerIDgt111222333lt/customerIDgt
  • ltitemgtltproductIDgt309121544lt/productIDgt
  • ltproductNamegtIBM Thinkpad T21lt/productNamegt
  • ltunitPricegt5000lt/unitPricegt
  • ltnoOfUnitsgt10lt/noOfUnitsgt
  • ltsubTotalgt50000lt/subTotalgt
  • lt/itemgt
  • ltitemgtltproductIDgt309121538lt/productIDgt
  • ltproductNamegtIntel wireless LAN PC
    Cardlt/productNamegt
  • ltunitPricegt200lt/unitPricegt
  • ltnoOfUnitsgt10lt/noOfUnitsgt
  • ltsubTotalgt2000lt/subTotalgt
  • lt/itemgt
  • lttotalAmountgt52000lt/totalAmountgt
  • lt/purchasegt

75
Allow Referencing Specific XML Tags
  • An ambiguous XML tag sequence specification can
    match multiple instances.
  • To avoid that and to speed up the matching, we
    propose the use of XML tag sequence specification
    that enables us to specify the specific XML tag
    sequence.
  • For example, To specify a rule based on subTotal
    value present in the second item tag within the
    first purchase tag, the condition of the rule
    will be specified as purchase1.item2.subTotal
    gt 5000.
  • As another example, purchase2.totalAmount lt
    15000 specifies the condition of a rule based on
    the totalAmount tag present within the second
    purchase tag.

76
Handle Long Transactions in SSL and Email network
services
  • some of the packet processing functions are
    better handled at the application level.
  • For example, there are a lot of packages,
    including McAfees uvscan and AMAVis scanmail,
    mutt (recombine email component), for detecting
    and removing email virus, but almost all of them
    are implemented in application level and interact
    with the sendmail program. It will require
    significant effort to rewrite them as kernel
    modules.
  • Same observations were derived on SSL processing.

77
Web Switching/SSL processing overhead and
Performance differences btw Prefork and Dynamic
fork
  • Significant SSL processing overhead. 240 req/sec
    vs. 38 req/sec
  • Content switching processing overhead may reduce
    the performance to lower than single web server.
    What we gain here? How we can improve it?

78
IXP1200-based Content Switch
  • We have ported OpenSSL and our Linux Secure Web
    System to run on IXP12EB with VxWork.
  • Using WindRivers Tornado II IDE.
  • Preliminary version run purely on StrongArm core.
  • Currently working on offload header extraction
    and rule matching code to run as hardware threads
    on microengines.

79
Intel IXP1200 NP and IXP12EB
  • The IXP 1200 Network Processor
  • The IXP12EB Evaluation Board
  • PCI form factor board based on IXP1200 Network
    Processor
  • eight 10/100 Mbps ports
  • two Gigabit Ethernet ports
  • PCI back-plane and an Ethernet Network Interface
    Card (NIC)

80
IXP 1200 Network Processor
81
Packets Receiving Transmitting
82
Agere Network Processor
The following figures are from Douglas Comers
new text Network System Design using Network
Processors
83
Ageres FPP
84
Ageres RSP
85
Alchemys Au1000
86
Applied Micro Circuit CorpnP7510
87
Cisco ParalleleXpressForwarding(PXF)
88
Cognigines Reconfigurable Communication Unit
(RCU)
89
EZChip NP-1
90
IBM PowerNP
91
IBM NPEmbeded Processor Complex
92
Motorolas C-Port
93
MotorolaSingle CP
94
Packet Flow and IXP2400
95
Intel IXP2400
96
HA-LVS ConfigurationHigh Available
CIP
LinuxDirector
Heart Beat
1. When Backup Director detects Linux Director
failurethrough heart beat protocol, graciously
negotiate the take-over of VIP ? Provide
fault-tolerant
Real Server3
BackupDirector
2. Monitor server processes run on real
servers ? Route requests to server processesthat
are alive. Initiate restart/repair
97
High Available Web Server Cluster
WebSwitch1
CIP
Heart Beat
1. Web Switch detects the failure of other web
switchTake over the processing of routing
request.
Real Server3
WebSwitch2
  • 2. Web switch monitors server processes run on
    real servers.When they die,
  • route requests to server processes that are
    alive.
  • Rewrite web switching rule. Initiate
    restart/repair

98
Status of UCCS ACSD Project
  • Two versions of Linux Kernel -based LCS content
    switch, LCS01, LCS02 were developed.
  • A Linux Application level secure web switch
    (LSWS) was developed using OpenSSL package.
  • LSWS is ported to run on Intel IXP12EB and
    IXP1200 network processor with Windriver VxWork.
  • Part of the above research projects are sponsored
    by CCL/ITRI.
  • Based on Linux-2.2.16-3, current release LCS02.
  • Being ported to Linux-2.4.18 and integrated with
    KTCPVS.
  • ip_forward.c, ip_masq.c, ip_vs.c are modified to
    implement basic TCP delay binding.
  • ip_cs.c are added for most of the content
    switching functions with http header extraction
    and xml content extraction.
  • A simple Java-based ruleEdit program was created
    for rule editing and conflict detection. A
    C-based program can detect conflicts among rules
    with regular expression in their condition
    expression.
  • Rule translate program to convert the rule set
    into a Linux kernel module and allow dynamic
    replacement of rule without restarting the
    system.
  • Currently working on integrating KTCPVS and
    provide unified configuration/monitor command

99
LCS Demo
  • We set up viva.uccs.edu as a content switch and
    wait and ace as two real servers.
  • URL Switching demohttp//viva.uccs.edu/lcs1/
    route to ace.uccs.eduhttp//viva.uccs.edu/lcs2/
    route to wait.uccs.edu
  • XML Web Switching (E-commerce applications)http/
    /archie.uccs.edu/acsd/lcs/xmldemo.htmlWhen the
    2nd subtotal tag gt50000, route to ace.When the
    2nd subtotal tag lt50000, route to wait.
  • Let us know if you have problem accessing
    them.My students may be working on LCS extension.

100
LCS Rule Example
  • R4 if (atoi(rule_fields1.value) gt 50000)
  • return route_to("ace", NON_STICKY,
    saddr)
  • R5 if ((atoi(rule_fields1.value) gt 0)
  • (atoi(rule_fields1.value) lt
    50000))
  • IP_RULE_MSG("serevrwait\n")
  • return route_to("wait", NON_STICKY,
    saddr)
  • R10 if (strstr(url, "lcs1") ! NULL)
  • IP_RULE_MSG("serverace\n")
  • return route_to("ace", NON_STICKY,
    saddr)
  • R11 if(strstr(url, "lcs2") ! NULL)
  • IP_RULE_MSG("serverwait\n")
  • return route_to("wait", NON_STICKY,
    saddr)

101
Intel 7280 Demo
  • http//cs.uccs.edu/chow/pub/master/ycai/doc/csdem
    o.html

102
Related Load Balancing Research Results
  • Modified Apache status module to report
  • Total bytes to be transferred by child processes
  • Average document transfer speed
  • Modified LB-DNS to receive server status and
    bandwidth probing results.
  • LB-DNS returns IP-address of the best server
    based a weight contributed by both server load
    and bandwidth.
  • Modified WebStone benchmark to test the
    performance of load balancing web server clusters.

103
Load balancing Systems
Bandwidth Probe Results
Modified Web Server 1
Statistics Gathering Daemon
Server Delay
Server Ranking /tmp/StatFile
Modified Web Server n
LBA Modified DNS
Request for Web pages
104
Connection Rate LBA vs. Round-Robin
Round robin only run once
105
Conclusion
  • Content Delivery Network improves internet
    content retrieval
  • LVS provides a low cost layer 4 switching service
    for cluster.
  • Linux Content Switch with generic rules can be
    easily configured for wide-variety of value-added
    services
  • Premium services
  • Load balancing/High Available server farm.
  • Firewall
  • Bandwidth control/Traffic shaping
  • Require efficient SW/HW architecture and rule
    matching algorithms to reduce processing
    overhead.
  • Content rule design/conflict detection are
    important and challenging.
  • TCP delay binding can be improved.

106
References
  • http//www.linuxvirtualserver.org/
  • http//www.akamai.com/
  • http//cs.uccs.edu/chow/pub/contentsw/talk/conten
    tswitching.ppt
  • Aron2000 Aron, Mohit, Differential and
    predictable QoS in web server systems, Ph.D
    dissertation Rice University, Oct. 2000.
  • Zhang97 Lixia Zhang, Sally Floyd, and Van
    Jacobson, Adaptive Web Caching, April 25, 1997.
    http//www-nrg.ee.lbl.gov/floyd/web.html
  • Esi2001 Edge Side Includes, http//www.esi.org/.
  • Chow2001a C. Edward Chow and Indira Semwal,
    Web Load Balancing Through More Accurate Server
    Report, Proceeding of PDCAT 2001, Taipei,
    Taiwan.
  • Chow2001b C. Edward Chow, Ganesh Godavari, and
    Jianhua Xie, Content Switch Rules and their
    Conflict Detection, Proceeding of PDCAT 2001,
    Taipei, Taiwan.
  • Chow2001c C. Edward Chow and Weihong Wang, The
    Design and Implementation of Linux LVS-based
    Content Switch, Proceeding of PDCAT 2001,
    Taipei, Taiwan.
  • Aversa2000 Luis Aversa and Azer Bestavros,
    Load Balancing a Cluster of Web Servers Using
    Distributed Packet Rewriting, Proceedings of
    IPCCC 2000. 
  • Cao98 PeiCao, Jin Zhang and Kevin Beach,
    Active Cache Caching Dynamic Contents on the
    Web http//www.cs.wisc.edu/cao/papers/active-cac
    he.ps
Write a Comment
User Comments (0)
About PowerShow.com