Title: A Study of Applications for Optical CircuitSwitched Networks
1A Study of Applications for Optical
Circuit-Switched Networks
Supported by NSF ITR-0312376, NSF EIN-0335190,
and DOE DE-FG02-04ER25640 grants
2Outline
- Introduction
- CHEETAH Background
- CHEETAH concept and network
- CHEETAH end-host software
- Analytical Models of GMPLS Networks
- Application (App) I Web Transfer App
- App II Parallel File Transfers
- Summary and Conclusions
3Introduction
- Many optical connection-oriented (CO) testbeds
- E.g., CANARIE's CAnet 4, UKLight, and CHEETAH
- Primarily designed for e-Science apps
- Use Generalized Multiprotocol Label Switching
(GMPLS) - Immediate request, call blocking
- Motivation extend these GMPLS networks to
million of users - Problem Statement
- What apps are well served by GMPLS networks?
- Design apps to use GMPLS networks efficiently
4Circuit-switched High-speed End-to-End Transport
ArcHitecture (CHEETAH)
- Designed as an add-on service to the Internet
and leverages the services of the Internet
Packet-switched Internet
IP router
IP router
End host
NIC I
End host
NIC I
Optical circuit-switched CHEETAH network
NIC II
NIC II
Ethernet-SONET gateway
Ethernet-SONET gateway
CHEETAH concept
5CHEETAH Network
NYC HOPI Force10
CUNY Foundry
UVa
WASH HOPI Force10
CUNY Host
UVa Catalyst 4948
mvstu6
CUNY
WASH Abilene T640
NCSU M20
NC
ORNL, TN
Centuar FastIron FESX448
zelda4
zelda5
MCNC Catalyst 7600
1G
Sycamore SN16000
wukong
SN16000
Atlanta, GA
zelda1
Direct fibers
zelda2
VLANs
MPLS tunnels
zelda3
OC-192 lambda
Sycamore SN16000
6CHEETAH End-host Software
OCS Optical Connectivity Service RD routing
decision RSVP-TE ReSerVation Protocol-Traffic
Engineering C-TCP Circuit-TCP
7Outline
- Introduction
- CHEETAH Background
- CHEETAH concept and network
- CHEETAH end-host software
- Analytical Models of GMPLS Networks
- Application (App) I Web Transfer App
- App II Parallel File Transfers
- Summary and Conclusions
8Analytical Models of GMPLS Networks
- Problem what apps are suitable for GMPLS
networks?
- App properties
- Per-circuit BW
- Call-holding time,
- Measure of suitability
- Call-blocking probability, Pb
- Link utilization, U
- Assumptions
- Call arrival rate, (Poisson process)
- Single link
- Single class all apps are of the same type
- A link of capacity C m circuits per-circuit
BWC/m - m is a measure of high-throughput vs.
moderate-throughput - For high-throughput (e.g., e-Science apps), m is
small
9BW sharing models
Two kinds of apps whether is dependent on
File size distribution
,
shape , k scale
The Erlang-B formula
crossover file size
10Numerical Results is independent of
- Two equations, four variables
- Fix U and m, compute Pb and
11Numerical Results is independent of
- Conclusions to get high U
- Small m (10) high Pb, thus book-ahead or call
queuing - Large m (1000) high , thus
large N - Intermediate m (100) large is preferred
12Numerical Results is dependent on ,
when
- Conclusions to get high U
- Small m (10) high Pb, thus book-ahead or call
queuing - As m increases, N does not increase
- m100, to get U80, Pb
13Conclusions for Analysis
- Ideal apps require BW on the order of
one-hundredth the link capacity as per-circuit
rate - Apps where is independent of
- long call-holding time is preferred
- Apps where is dependent on
- need short call-holding time
14Outline
- Introduction
- CHEETAH Background
- CHEETAH concept and network
- CHEETAH end-host software
- Analytical Models of GMPLS Networks
- Application (App) I Web Transfer App
- App II Parallel File Transfers
- Summary and Conclusions
15APP I Web Transfer App on CHEETAH
- Why web transfer?
- Web-based apps are ubiquitous
- Based on the previous analysis, m100 is suitable
for CHEETAH - Consists of a software package WebFT
- Leverages CGI for deployment without modifying
web client and web server software - Integrated with CHEETAH end-host software APIs to
allow use of the CHEETAH network in a mode
transparent to users
16WebFT Architecture
Web server
Web client
URL
Web Server (e.g. Apache)
CGI scripts (download.cgi redirection.cgi
Web Browser (e.g. Mozilla)
Response
RSVP-TE daemon
WebFT sender
OCS daemon
OCS API
RD API
WebFT receiver
Control messages via Internet
RD daemon
RSVP-TE API
RSVP-TE API
Data transfers via a circuit
RSVP-TE daemon
C-TCP API
C-TCP API
Cheetah end-host software APIs and daemons
Cheetah end-host software APIs and daemons
17Experimental Testbed for WebFT
Internet
IP routers
IP routers
zelda3
NIC I
wukong
NIC I
CHEETAH Network
NIC II
NIC II
NCSU
Atlanta, GA
Sycamore SN16000 MCNC, NC
Sycamore SN16000 Atlanta, GA
- zelda3 and wukong Dell machines, running Linux
FC3 and ext2/3, with RAID-0 SCCI disks - RTT between them 24.7ms on the Internet path,
and 8.6ms for the CHEETAH circuit. - load Apache HTTP server 2.0 on zelda3
18Experimental Results for WebFT
The web page to test WebFT
- Test parameters
- Test.rm 1.6 GB, circuit rate 1 Gbps
- Test results
- throughput 680 Mbps, delay 19 s
19Outline
- Introduction
- CHEETAH Background
- CHEETAH concept and network
- CHEETAH end-host software
- Analytical Models of GMPLS Networks
- Application (App) I Web Transfer App
- App II Parallel File Transfers
- Summary and Conclusions
20APP II Parallel File Transfers on CHEETAH
- Motivation E-Science projects need to share
large volumes of data (TB or PB) - Goal achieve multi-Gb/s throughput
- Two factors limit throughput
- TCPs congestion-control algorithm
- End-host limitations
- Solutions to relieve end-host limitations
- Single-host solution
- Cluster solution, which has two variations
- General case non-split source file
- Special case split source file
21General-Case Cluster Solution
transfer
Host 1
Host 1
assemble
split
transfer
Original Source
Host i
Host i
Original Sink
transfer
Host n
Host n
22Software Tools GridFTP and PVFS2
- GridFTP a data-transfer protocol on the Grid
- Extends FTP by adding features for partial file
transfer, multi-streaming and striping - We mainly use the GridFTP striped transfer
feature. - PVFS Parallel Virtual File System
- An open source implementation of a parallel file
system - Stripes a file across multiple I/O servers like
RAID0 - A second version PVFS2
23globus-url-copy
SPAS
a list of host-port pairs
SPOR
response to SPOR
GridFTP server
GridFTP server
sending front end
receiving front end
Sending data nodes initiate data connections to
receiving nodes
GridFTP striped transfer
24General-Case Cluster SolutionDesign
25General-Case Cluster SolutionImplementation
- To get a high throughput, we need to make data
nodes responsible for data blocks in their local
disks
- Make PVFS2 and GridFTP have the same stripe
pattern - Problems
- PVFS2 1.0.1 does not provide a utility to inspect
data distribution - Data connections between sending and receiving
nodes are random
PVFS2
data node S1
data node Sn
26Random data connections
PVFS2
data node S1
data node Sn
27Random data connections
PVFS2
data node S1
data node Sn
28Implementation - Modifications to PVFS2
- Goal know a priori how a file is striped in
PVFS2 - Use strace command to trace systems calls called
by pvfs2-cp - Pvfs2-fs-dump gives the (non-deterministic) I/O
server order of file distribution - Pvfs2-cp ignores the s option for configuring
stripe size - Modify PVFS2 code
- For load balance, PVFS2 stripes files starting
with a random server jitter (rand()
num_io_servers) - Set jitter -1 to get a fixed order of data
distribution - Change the default stripe size (original
64KBytes)
29Implementation - Modifications to GridFTP
- Goal use a deterministic matching sequence
between sending and receiving data nodes
- Method modify the implementation of SPAS and
SPOR commands - SPAS sort the list of host-port pairs based on
the IP-address order for receiving data nodes - SPOR request sending data nodes to initiate data
connections sequentially to receiving data nodes
30Experimental Results
- Conducted on a 22-node cluster, sunfire
- Reduced network-and-disk contention
- Performance of PVFS2 implementation was poor
31Summary and Conclusions
- Analytical Models of GMPLS Networks
- Ideal apps require BW on the order of
one-hundredth the link capacity as per-circuit
rate - Application I Web Transfer Application
- provided deterministic data services to CHEETAH
clients on dedicated end-to-end circuits - No modifications to the web client and web server
software by leveraging CGI - Application II Parallel File Transfers
- Implemented a general-case cluster solution by
using PVFS2 and GridFTP striped transfer - Modified PVFS2 and GridFTP code to reduce
network-and-disk contention
32Publication Lists
- M. Veeraraghavan, X. Fang, and X. Zheng, On the
suitability of applications for GMPLS networks,
submitted to IEEE Globecom2006 - X. Fang, X. Zheng, and M. Veeraraghavan,
Improving web performance through new networking
technologies, IEEE ICIW'06, February 23-25, 2006
Guadeloupe, French Caribbean
33Future Work
- Analytical Models of GMPLS Networks
- Multi-class
- Multiple links and network models
- Application I Web Transfer Application
- Design a Web partial CO transfer to enable
non-CHEETAH hosts to use CHEETAH - Connect multiple CO networks to further reduce
RTT - Application II Parallel File Transfers
- Test the general-case cluster solution on CHEETAH
- Work on PVFS2 or try GPFS to get a high I/O
throughput
34A Classification of Networks that Reflects
Sharing Modes
35The flow chart for the WebFT sender
36The WebFT Receiver
- Integrates with the CHEETAH end-host software
modules similar to the WebFT sender. - Runs as a daemon in the background on the client
host to avoid manual intervention. - Also provides the WebFT sender a desired circuit
rate.
37Experimental Results for WebFT
38PVFS2 Architecture
39Experimental Configuration
- Configuration of PVFS2 I/O servers
- The 1st PVFS2 sunfire1 through sunfire5
- The 2nd PVFS2 sunfire10, and sunfire6 through 9
- Configuration of GridFTP servers
- Sending front end sunfire1 with data nodes
sunfire1 through sunfire5 - Receiving front end sunfire10 with data nodes
sunfire10, sunfire6 through sunfire9 - GridFTP striped transfer
- globus-url-copy -vb dbg -stripe
ftp//sunfire150001/pvfs2/test_1G - ftp//sunfire1050002/pvfs2/test_1G1
2dbg1.txt
40Four Conditions to Avoid Unnecessary
Network-and-disk Contention
- Know a priori how data are striped in PVFS2
- PVFS2 I/O servers and GridFTP servers run on the
same hosts - GridFTP stripes data across data nodes in the
same sequence as PVFS2 does across PVFS2 I/O
servers - GridFTP and PVFS2 have the same stripe size
41(No Transcript)
42The Specific Cluster Solution for TSI
43Numerical Results for is dependent on
- Conclusions
- Large m (1000) does not increase N