Web100 - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Web100

Description:

Pittsburgh Supercomputing Center. National Center for Supercomputing Applications ... Snap shot of a group (facilitates atomic reading of a group of variables) ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 36
Provided by: WendyH54
Category:

less

Transcript and Presenter's Notes

Title: Web100


1
Web100
  • Wendy Huntoon - PSC
  • Jim Ferguson - NCSA
  • I2 Members Meeting
  • May 2002

2
Outline
  • Project Overview
  • Motivation What is the problem
  • Web100 Collaboration
  • Progress to Date
  • Standardization Process
  • Code Release
  • Code Capabilities
  • Overview of Users
  • Web100 Resources

3
Motivations Whats the Problem?
  • High performance flows slower than line rate
  • Delays continue/increase even with higher
    bandwidth
  • TCP tuning issues are non-trivial
  • Poorly conceived stacks
  • Router/switch buffer queues inadequate
  • Slow start and AIMD algorithm
  • Eliminate/dramatically reduce the wizard gap
  • Need for kernel instrumentation set for TCP
    variables

4
The Wizard Gap
  • TCP over a long haul path
  • Year Wizards Non-wizards Ratio
  • 1Mb/s 300kb/s 31
  • 10Mb/s
  • 1995 100Mb/s
  • 1Gb/s 3Mb/s 3001
  • Scientists/researchers not happy with this

5
(No Transcript)
6
TCP tuning is painful debugging
  • All problems limit performance
  • IP routing, long round trip times
  • Improper MSS negotiations or path MTU discovery
  • IP Packet reordering
  • Packet losses, congestion, lame hardware
  • TCP sender or receive buffer space
  • Inefficient applications
  • Any one problem can mask all the others and
    confound all but the best (and few) tuning gurus
  • Need for better diagnostics and visibility into
    problems

7
Goal and Method
  • Make it easy (transparent) for non-experts to
    achieve higher throughput performance
  • Enhance TCP capabilities with better (finer
    grain) kernel instrumentation and automatic
    controls
  • Real time triage capability determines sender,
    receiver, and/or network bottlenecks

8
Why Focus on TCP
  • TCP has an ideal vantage point into throughput
    problem space
  • TCP can identify bottleneck subsystem(s)
  • TCP already measures the network (some)
  • TCP can measure the application
  • TCP can adjust itself (auto-tuning feedback)

9
Web100 Collaboration
  • Funded by the NSF
  • Currently Year 2 of a 3 Year grant.
  • Cisco URP for initial seed funding.
  • Collaborators
  • PSC (Matt Mathis, R. Reddy, Janet Brown, John
    Heffner)
  • NCAR (Peter ONeil, Marla Meehl)
  • NCSA (John Estabrook, Tanya Brethour, Stephen
    Engelhardt, Jim Ferguson)

10
What is in the code
  • Web100 software consists of
  • TCP Kernel Instrument Set (TPC-KIS)
  • Instruments coded directly in to the Operating
    System kernel.
  • Derived Instrument Set (DIS)
  • Information that is collected based on KIS
    parameters.
  • Application Code
  • Tools, applications, etc. that use the
    information provided by the KIS and DIS.

11
Kernel Instrument Set
  • Definition
  • Set of instruments designed to collect as much of
    the information as possible to enable a user to
    isolate the performance problems of a TCP
    connection.
  • How it is implemented
  • Each instrument is a variable in a "stats"
    structure that is linked through the kernel
    socket structure.
  • The Linux /proc interface is used to expose these
    instruments outside the kernel.

12
What is the TCP-KIS?
  • TCP-KIS instruments group naturally into
    categories.
  • Currently roughly 19 categories.
  • Already more than 125 instruments have been
    developed.
  • For each instrument
  • Precise (standards ready) definition.
  • Instrument code in the kernel
  • Implementation verification tests
  • Does the kernel implementation meet the
    definition.
  • Prototype diagnostic tool(s) to demonstrate
    functionality and effectiveness.

13
TCP-KIS
  • Basic instrumentation examples
  • Connection ID 5-tuple that uniquely identifies a
    connection.
  • State determines what protocol features or
    algorithms are enabled.
  • Traffic out statistics aggregate packets and
    traffic sent out on a connection.

14
Local Sender Triage
  • Group of instruments associated with the local
    sender.
  • Determine what subsystems are throttling TCP data
    transmission.
  • Three parallel sets of instruments that measure
  • Receiver Window
  • Network Congestion
  • Senders Availability

15
Local Sender Groups
  • Other groups of instruments associated with the
    Local Sender
  • Local Sender Congestion Model
  • Local Sender Loss Model
  • Local Sender Re-order Model
  • Local Sender RTT
  • Local Sender Segment Size
  • Local Sender Bottlenecks
  • Local Sender Tuning

16
Other Instruments
  • Similar instruments for the Local Receiver.
  • Observed Receiver instruments
  • Often inferred from the data stream.
  • E.g, Observed Receiver - receivers state is
    inferred from the ACK stream.
  • Application Interface
  • Future instruments to collect statistics on how
    the application is using the network.

17
Userland Distribution
  • Released asynchronously with kernel distribution
  • Currently at Alpha 1.1
  • Version 1.2 release imminent
  • Consists of
  • The web100 library
  • Command line utilities
  • GUI utilities

18
Web100 Library
  • Web100 kernel exposes critical TCP
    variables/instruments through /proc
  • Web100 library provides the necessary access
    functions to access these variables/instruments
  • Functions
  • Read the value of a variable/instrument
  • Snap shot of a group (facilitates atomic reading
    of a group of variables)
  • Modify tunable variables (ex. send buffer size)
  • Etc

19
Utilities
  • Command line utilities
  • Useful in batch scripts
  • Serve as demo codes for the usage of web100
    library
  • GUI utilities
  • Based on GTK
  • Useful for troubleshooting network applications
  • Serve as examples for application developers

20
GUI Sample Screens DTB
21
Connection Selector
22
Looking at a Variable
23
Timeline - Year 1
  • Alpha code development
  • Establish User Support
  • www.web100.org
  • Initial User Community
  • Very limited to begin with.
  • Knowledgeable users, expected to provide
    technical input on the code.
  • Understand and develop applications.

24
Timeline - Year 2
  • Began standardization process.
  • Develop MIB
  • Submit to IETF
  • Develop public code
  • Fix bugs in alpha versions
  • Add instrumentation
  • Code release
  • Continue code development
  • Identify and add new instruments

25
Code Releases - To date
  • Initial Release
  • Alpha0.2, released May 23, 2001
  • Alpha0.3, released Sept. 19, 2001
  • Alpha 1.0-Separation of Kernel and Userland code
  • Kernel Patch
  • Alpha 1.1 for Linux 2.4.16, released March 18,
    2002
  • Alpha 1.0, released March 1, 2002
  • Alpha 1.0, released February 26,2002
  • Userland
  • Alpha 1.1, released February 28, 2002
  • Alpha 1.0, released February 26,2002

26
Timeline - Year 3
  • New pathprobe diagnostic tool (wip, unreleased).
  • Add another 10-12 instruments.
  • Review instruments and code with other wizards.
  • Gain vendor support for ideas and code.
  • Finalize IETF draft by December IETF meeting.

27
Milestones
  • Over a year of 30 alpha testers
  • Including SLAC, ORNL, LBNL, and universities
  • www.net100.org
  • Modified Linux kernel supports 2.4.16
  • Separation between KIS and library functions
  • draft-ietf-tsvwg-tcp-mib-extension-00.txt
  • draft-ietf-ipngwg-rfc2012-update-01.txt

28
Web100 Collaborator Activity
  • Rich Carlson, ANL
  • Tom Dunnigan, ORNL
  • Tom Hacker, U. of Michigan
  • Doug Chang, SLAC
  • Andreas Burkhardt Matt Grob, Qualcomm
  • Larry Dunn Scott Dier, Cisco/U. of Minnesota
  • Jason Lee, LBL

29
Collaborator Assistance
  • Bugs!
  • Kernel
  • Utilities
  • Release
  • Request new features
  • Review and criticize documentation
  • Way too easy on us

30
Collaborator Activity
  • Carlson/ANL working on a troubleshooting guide
    for LANs.
  • Set up network of 13 identically equipped PIII
    connected via Cisco 5500 network switch, running
    Web100-enabled Linux.
  • Introduces typical network faults (duplex
    mismatches, other config errors) and analyzes
    data for signatures of these faults.
  • Modified Iperf 1.2 to collect variables and
    reverse flow.

31
Collaborator Activity
  • Dunnigan/ORNL has found web100 helpful in seeing
    losses/retransmission and congestion avoidance
    parameters of individual TCP flows, and for
    tuning flows
  • Has developed a Web100-enabled ttcp
  • Has developed a daemon that logs web100 variables
    for designated paths when a flow closes
  • Has developed an autotuning daemon that uses
    web100 to tune flows, including modifications to
    web100 to support "event notification", so the
    daemon knows when a new flow/socket is opened

32
Collaborator Activity
  • Hacker/U.Michigan has been using the web100
    software to help tune and diagnose end-to-end
    network performance problems across the U-M
    campus network as well as across Abilene for the
    Visible Human and Atlas projects at U-M.
  • Chang/SLAC is looking to fix performance problem
    between Linux and Solaris machines.

33
Collaborator Activity
  • Qualcomm is using Web100 to measure TCP
    performance over certain types of high speed
    wireless links under development. Web100 is
    partially integrated into some other tools - in
    the sense that output reports are published
    automatically in a format similar to other tools
    Qualcomm uses.
  • Dunn/Cisco currently using Web100 for a class at
    U.Minnesota. Includes accounts on test machine at
    NCSA.

34
Collaborator Activity
  • Lee/LBL has obtained accounts at SLAC and ANL for
    WAN testing, and have co-located one of our
    machines in Washington D.C. to do testing over
    SuperNet. Still in the process of testing all
    this out.
  • Keith Jackson at LBL has written Python wrappers
    to the Web100 calls using swing.

35
Web100 Summary
  • Main WWW site www.web100.org
  • Freely available software distribution
  • www.web100.org/download
  • hundreds of downloads
  • Please be cognizant of impacts on others
  • Please use, test, provide feedback, contribute
    code
  • IETF standards process to benefit all
  • Attention turning to working with OS vendors to
    incorporate standards enhancements into their
    stacks
Write a Comment
User Comments (0)
About PowerShow.com