Lessons Learned Monitoring the WAN

About This Presentation

Title:

Lessons Learned Monitoring the WAN

Description:

120 countries (99% world's connected population) 35 monitor sites in 14 countries ... Documentation (tutorials, help, FAQs), publicity (brochures, papers, maps, ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 48

Provided by: jul9193

Category:

more less

Transcript and Presenter's Notes

Title: Lessons Learned Monitoring the WAN

1
Lessons Learned Monitoring the WAN

Les Cottrell, SLAC
ESnet RD Advisory Workshop April 23, 2007
Arlington, Virginia

Partially funded by DOE and by Internet2
2
Uses of Measurements

Automated problem identification trouble
shooting
Alerts for network administrators, e.g.
Baselines, bandwidth changes in time-series,
iperf, SNMP
Alerts for systems people
OS/Host metrics
Forecasts for Grid Middleware, e.g. replica
manager, data placement
Engineering, planning, SLA (set verify),
expectations
Also (not addressed here)
Security spot anomalies, intrusion detection
Accounting

3
WAN History

PingER (1994), IEPM-BW (2001), Netflow
E2E, active, regular, end user view,
all hosts owned by individual sites,
core mainly centrally designed developed
(homogenous), contributions from FNAL, GATech,
NIIT (close collaboration)
Why are you monitoring
network trouble management, planning,
auditing/setting SLAs, Grid forecasting are very
different though may use same measurements

4
PingER (1994)

PingER project originally (1995) for measuring
network performance for US, Europe Japanese HEP
community - now mainly RE sites
Extended this century to measure Digital Divide
Collaboration with ICTP Science Dissemination
Unit http//sdu.ictp.it
ICFA/SCIC http//icfa-scic.web.cern.ch/ICFA-SCIC/

gt120 countries (99 worlds connected population)
gt35 monitor sites in 14 countries
Uses ubiquitous ping facility

Monitor 44 sites in S. Asia
Maybe most extensive active E2E monitoring in
world

5
PingER Design Details

PingER Design (1994 no web services or RRD,
security not a big thing, etc.)
Simple, no remote software (ping everywhere), no
probe development, monitor host install 0.5 day
effort for sys-admin
Data centrally gathered, archived, analyzed, so
hard jobs (archiving, analysis, viz) do NOT
require distribution, only one copy
Database flat ASCII files, rawdata, analyzed
data, file/pair/day. Compressed saves factor 6
(100GB)
Data available via web (lot of use, some uses
unexpected, often analysis by Excel)

6
PingER Lessons

Measurement code rewritten twice, once to add
extra data, once to document (perldoc) /
parameterize / simplify installation
Gathering code (uses LYNX or FTP) pull from
archive, no major mods in 10 years
Most of development for download, analyze data,
viz, manage
New ways to use data jitter, out of order,
duplicates, derive throughput, MOS all required
study of data then implement and integrate
Dirty data (pathologies not related to network)
require filtering or filling before analysis
Had to develop easy make/install download,
instructions, FAQ, still new installs require
communication
pre-reqs, getting name registration, getting cron
jobs running, getting web server running,
unblock, clarify documentation (often non-native
English speakers)
Documentation (tutorials, help, FAQs), publicity
(brochures, papers, maps, presentations/travel),
get funding/proposals
Monitor availability of (developed tools to
simplify/automate)
monitor sites (hosts stop working security
blocks, hosts replaced, site forgets), nudge
contacts
critical remote sites (beacons), choose new one
(automatically updates monitor sites)
Validate/update meta data (name, address,
institute, lat/long, contact ) in database (need
easy update)

7
IEPM-BW (2001)

40 target hosts in 13 countries
Bottlenecks vary from 0.5Mbits/s to 1Gbits/s
Traverse 50 AS, 15 major Internet providers
5 targets at PoPs, rest at end sites

Added Sunnyvale for UltraLight
Covers all USATLAS tier 0, 1, 2 sites
Recently Added FZK, QAU
Main author (Connie Logg) retired

8
IEPM-BW Design Details

IEPM-BW (2001)
More focused (than PingER), fewer sites (e.g.
BaBar collaborators), more intense, more probe
tools (iperf, thrulay, pathload, traceroute,
owamp, bbftp ), more flexibility
Complete code set (measurement, archive, analyze,
viz) at each monitoring site. Data distributed.
Needs dedicated host
Remote sites need code installed
Originally executed remote via ssh, still needed
code installed
Security, accounts (require training), recovery
problems
Major changes with time
Use servers rather than ssh for remote hosts
Use mysql for configuration data bases rather
than require perl scripts
Provide management tools for configuration data
etc.
Add/replace probes

9
IEPM-BW Lessons (1)

Problems recommendations
Need right versions of mysql, gnuplot, perl (and
modules) installed on hosts
All possible failure modes for probe tools need
to be understood and accomodated
Timeout everything, clean up hung processes
Keep logfiles for day or so for debugging
Review how processes run with Netflow (mainly
manual)
Scheduling
dont run file transfer, iperf, thrulay, pathload
at same time on same path
Limit duration and frequency of intensive probes
so do not impact network
Host loses disk, upgrades OS, loses DNS,
applications upgraded (e.g. gnuplot), IEPM
database zapped etc.
Need backup
Have a local host as target for sanity check
(e.g. monitoring host based issues)
Monitor monitoring host load (e.g. Ganglia,
Nagios)

10
IEPM-BW Lessons (2)

Different paths need different probes
(performance and interest related)
Experiences with probes (lot of work to
understand, analyze compare)
Owamp vs ping owamp needs server and accurate
time ping only round trip available everywhere,
may be blocked
Traceroute need to analyze significance of
results
Packet pair separation
Abwe noisy, inaccurate especially on Gbps paths
Pathchirp better, pathload best (most intense
approaches iperf), problems at 10Gbps, look at
pathneck
TCP
thrulay more information, more manageable than
iperf,
need to keep TCP buffers optimized/updated
File transfer
Disk to disk close to iperf/thrulay
disk measures file/disk system not network, but
end user important
Adding new hosts still not easy

11
Other Lessons (1)

Traceroute no good for layers 2 1
Packet pair surpasses time granularity at 10Gbps
Net admin cannot review thousands of graphs each
day
need event detection, alert notification, and
diagnosis assistance
Comparing QoS vs best effort requires adding path
reservation
Keeping TCP buffer parameters optimized difficult
Network configurations not static
Forecasting hard if path is congested, need to
account for diurnal etc. variations

12
Examples of real data
Caltech thrulay

Misconfigured windows
New path
Very noisy

800
Mbps
0
Mar06
Nov05
UToronto miperf

Seasonal effects
Daily weekly

250
Mbps
0
Jan06
Nov05
Pathchirp
UTDallas

Some are seasonal
Others are not
Events may affect
multiple-metrics

120
thrulay
Mbps
0
iperf
Mar-10-06
Mar-20-06

Events can be caused by host or site congestion
Few route changes result in bandwidth changes
(20)
Many significant events are not associated with
route changes (50)

13
Netflow et. al.

Switch identifies flow by sce/dst ports, protocol
Cuts record for each flow
src, dst, ports, protocol, TOS, start, end time
Collect records and analyze
No intrusive traffic, real traffic,
collaborators, applications
No accounts/pwds/certs/keys
No reservations etc
Characterize traffic top talkers, applications,
flow lengths etc.
May be able to use for forecasting for some sites
and event detection (also security wants it)
LHC-OPN requires edge routers to provide Netflow
data
Internet 2 backbone
http//netflow.internet2.edu/weekly/
SLAC
www.slac.stanford.edu/comp/net/slac-netflow/html/S
LAC-netflow.html

14
Netflow limitations

Can be a lot of data to collect each day, need
lots cpu
Hundreds of MBytes to GBytes
Use of dynamic ports makes harder to detect app.
GridFTP, bbcp, bbftp can use fixed ports (but may
not)
P2P often uses dynamic ports
Discriminate type of flow based on headers (not
relying on ports)
Types bulk data, interactive
Discriminators inter-arrival time, length of
flow, packet length, volume of flow
Use machine learning/neural nets to cluster flows
E.g. http//www.pam2004.org/papers/166.pdf
Aggregation of parallel flows (needs care, but
not difficult)

15
NG PerfSONAR

Our future focus (for us 3rd Generation), NSF
proposal
Open source, open community
Both end users (LHC, GATech, SLAC, Delaware) and
network providers (ESnet, I2, GEANT, Eu NRENs,
Brazil, ), achieve critical mass?
Many developers from multiple fields
Requires from the get go shared code,
documentation, collaboration
Hopefully not as dependent on funding as a single
team, so persistent?
Transparent gathering and storage of measurement,
both from NRENs and end users
Sharing of information across autonomous domains
Uses standard formats
More comprehensive view
AAA to provide protection for of sensitive data
Reduces debugging time
Access to multiple components of the path
No need to play telephone tag
Currently mainly middleware, needs
Data mining and viz
Topology also at layers 1 2
Forecasting
Event detection and event diagnosis

16
Challenges

Probe tools fail at gt 1Gbps
Dedicated circuits, QoS, Layers 2 1
Impact of new (e.g. TOS NICs)
Auto event detection, alerts, diagnosis
Integrate passive active
Tie in end systems, file/disk systems, Apps
Sustainability (funding disappears)
Factorise components (measure, archive, analyze)
AND tasks
Provide standard, published interfaces
Engage community (multiple developers, users,
providers has its own challenges)

17
Questions, More information

Comparisons of Active Infrastructures
www.slac.stanford.edu/grp/scs/net/proposals/infra-
mon.html
Some active public measurement infrastructures
www-iepm.slac.stanford.edu/
www-iepm.slac.stanford.edu/pinger/
www.slac.stanford.edu/grp/scs/net/talk06/IEPM-BW2
0Deployment.ppt
e2epi.internet2.edu/owamp/
amp.nlanr.net/ (no longer available)
Monitoring tools
www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html
www.caida.org/tools/
Google for iperf, thrulay, bwctl, pathload,
pathchirp
Event detection
www.slac.stanford.edu/grp/scs/net/papers/noms/noms
14224-122705-d.doc

18
More Slides
19
Active E2E Monitoring
20
E.g. Using Active IEPM-BW measurements

Focus on high performance for a few hosts needing
to send data to a small number of collaborator
sites, e.g. HEP tiered model
Makes regular measurements with probe tools
ping (RTT, connectivity), owamp (1 way delay)
traceroute (routes)
pathchirp, pathload (available bandwidth)
iperf (one multi-stream), thrulay, (achievable
throughput)
supports bbftp, bbcp (file transfer applications,
not network)
Looking at GridFTP but complex requiring renewing
certificates
Choice of probes depends on importance of path,
e.g.
For major paths (tier 0, 1 some 2) use full
suite
For tier 3 use just ping and traceroute
Running at major HEP sites CERN, SLAC, FNAL,
BNL, Caltech, Taiwan, SNV to about 40 remote
sites
http//www.slac.stanford.edu/comp/net/iepm-bw.slac
.stanford.edu/slac_wan_bw_tests.html

21
IEPM-BW Measurement Topology

40 target hosts in 13 countries
Bottlenecks vary from 0.5Mbits/s to 1Gbits/s
Traverse 50 AS, 15 major Internet providers
5 targets at PoPs, rest at end sites

Taiwan

Added Sunnyvale for UltraLight
Adding FZK Karlsruhe

TWaren
22
Top page
23
Probes Ping/traceroute

Ping still useful
Is path connected/node reachable?
RTT, jitter, loss
Great for low performance links (e.g. Digital
Divide), e.g. AMP (NLANR)/PingER (SLAC)
Nothing to install, but blocking
OWAMP/I2 similar but One Way
But needs server installed at other end and good
timers
Now built into IEPM-BW
Traceroute
Needs good visualization (traceanal/SLAC)
No use for dedicated ? layer 1 or 2
However still want to know topology of paths

24
Probes Packet Pair Dispersion

Used by pathload, pathchirp, ABwE available bw
Send packets with known separation
See how separation changes due to bottleneck
Can be low network intrusive, e.g. ABwE only 20
packets/direction, also fast lt 1 sec
From PAM paper, pathchirp more accurate than
ABwE, but
Ten times as long (10s vs 1s)
More network traffic (factor of 10)
Pathload factor of 10 again more
http//www.pam2005.org/PDF/34310310.pdf
IEPM-BW now supports ABwE, Pathchirp, Pathload

25
BUT

Packet pair dispersion relies on accurate timing
of inter packet separation
At gt 1Gbps this is getting beyond resolution of
Unix clocks
AND 10GE NICs are offloading function
Coalescing interrupts, Large Send Receive
Offload, TOE
Need to work with TOE vendors
Turn off offload (Neterion supports multiple
channels, can eliminate offload to get more
accurate timing in host)
Do timing in NICs
No standards for interfaces
Possibly use packet trains, e.g. pathneck

26
Achievable Throughput

Use TCP or UDP to send as much data as can memory
to memory from source to destination
Tools iperf (bwctl/I2), netperf, thrulay (from
Stas Shalunov/I2), udpmon
Pseudo file copy Bbcp also has memory to memory
mode to avoid disk/file problems

27
BUT

At 10Gbits/s on transatlantic path Slow start
takes over 6 seconds
To get 90 of measurement in congestion avoidance
need to measure for 1 minute (5.25 GBytes at
7Gbits/s (todays typical performance)
Needs scheduling to scale, even then
Its not disk-to-disk or application-to
application
So use bbcp, bbftp, or GridFTP

28
AND

For testbeds such as UltraLight, UltraScienceNet
etc. have to reserve the path
So the measurement infrastructure needs to add
capability to reserve the path (so need API to
reservation application)
OSCARS from ESnet developing a web services
interface (http//www.es.net/oscars/)
For lightweight have a persistent capability
For more intrusive, must reserve just before make
measurement

29
Visualization Forecasting in Real World
30
Examples of real data
Caltech thrulay

Misconfigured windows
New path
Very noisy

800
Mbps
0
Mar06
Nov05
UToronto miperf

Seasonal effects
Daily weekly

250
Mbps
0
Jan06
Nov05
Pathchirp
UTDallas

Some are seasonal
Others are not
Events may affect
multiple-metrics

120
thrulay
Mbps
0
iperf
Mar-10-06
Mar-20-06

Events can be caused by host or site congestion
Few route changes result in bandwidth changes
(20)
Many significant events are not associated with
route changes (50)

31
Scattter plots histograms
Scatter plots quickly identify correlations
between metrics
Thrulay
Pathchirp
Iperf
Thrulay (Mbps)
RTT (ms)
Pathchirp iperf (Mbps)
Throughput (Mbits/s)
Pathchirp
Thrulay
Histograms quickly identify variability or
multimodality
32
Changes in network topology (BGP) can result in
dramatic changes in performance
Hour
Samples of traceroute trees generated from the
table
Los-Nettos (100Mbps)
Remote host
Snapshot of traceroute summary table
Notes 1. Caltech misrouted via Los-Nettos
100Mbps commercial net 1400-1700 2. ESnet/GEANT
working on routes from 200 to 1400 3. A
previous occurrence went un-noticed for 2
months 4. Next step is to auto detect and notify
Drop in performance (From original path
SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos
(100Mbps) -Caltech )
Back to original path
Dynamic BW capacity (DBC)
Changes detected by IEPM-Iperf and AbWE
Mbits/s
Available BW (DBC-XT)
Cross-traffic (XT)
Esnet-LosNettos segment in the path (100 Mbits/s)
ABwE measurement one/minute for 24 hours Thurs
Oct 9 900am to Fri Oct 10 901am
33
On the other hand

Route changes may affect the RTT (in yellow)
Yet have no noticeable effect on on available
bandwidth or throughput

Available Bandwidth
Achievable Throughput
Route changes
34
However

Elegant graphics are great to understand problems
BUT
Can be thousands of graphs to look at (many site
pairs, many devices, many metrics)
Need automated problem recognition AND diagnosis
So developing tools to reliably detect
significant, persistent changes in performance
Initially using simple plateau algorithm to
detect step changes

35
Seasonal Effects on events

Change in bandwidth (drops) between 1900 2200
Pacific Time (700-1000am PK time)
Causes more anomalous events around this time

36
Forecasting

Over-provisioned paths should have pretty flat
time series
Short/local term smoothing
Long term linear trends
Seasonal smoothing

But seasonal trends (diurnal, weekly need to be
accounted for) on about 10 of our paths
Use Holt-Winters triple exponential weighted
moving averages

37
Experimental Alerting

Have false positives down to reasonable level
(few per week), so sending alerts to developers
Saved in database
Links to traceroutes, event analysis, time-series

38
Passive

Active monitoring
Pro regularly spaced data on known paths, can
make on-demand
Con adds data to network, can interfere with
real data and measurements
What about Passive?

39
Netflow et. al.

Switch identifies flow by sce/dst ports, protocol
Cuts record for each flow
src, dst, ports, protocol, TOS, start, end time
Collect records and analyze
Can be a lot of data to collect each day, needs
lot cpu
Hundreds of MBytes to GBytes
No intrusive traffic, real traffic,
collaborators, applications
No accounts/pwds/certs/keys
No reservations etc
Characterize traffic top talkers, applications,
flow lengths etc.
LHC-OPN requires edge routers to provide Netflow
data
Internet 2 backbone
http//netflow.internet2.edu/weekly/
SLAC
www.slac.stanford.edu/comp/net/slac-netflow/html/S
LAC-netflow.html

40
Typical days flows

Very much work in progress
Look at SLAC border
Typical day
28K flows/day
75 sites with gt 100KB bulk-data flows
Few hundred flows gt GByte

Collect records for several weeks
Filter 40 major collaborator sites, big (gt
100KBytes) flows, bulk transport apps/ports
(bbcp, bbftp, iperf, thrulay, scp, ftp )
Divide by remote site, aggregate parallel streams
Look at throughput distribution

41
Netflow et. al.

Peaks at known capacities and RTTs
RTTs might suggest windows not optimized, peaks
at default OS window size(BWWindow/RTT)

42
How many sites have enough flows?

In May 05 found 15 sites at SLAC border with gt
1440 (1/30 mins) flows
Maybe Enough for time series forecasting for
seasonal effects
Three sites (Caltech, BNL, CERN) were actively
monitored
Rest were free

Only 10 sites have big seasonal effects in
active measurement
Remainder need fewer flows
So promising

43
Mining data for sites

Real application use (bbftp) for 4 months
Gives rough idea of throughput (and confidence)
for 14 sites seen from SLAC

44
Multi months
Bbcp throughput from SLAC to Padova

Bbcp SLAC to Padova

Fairly stable with time, large variance
Many non network related factors

45
Netflow limitations

Use of dynamic ports makes harder to detect app.
GridFTP, bbcp, bbftp can use fixed ports (but may
not)
P2P often uses dynamic ports
Discriminate type of flow based on headers (not
relying on ports)
Types bulk data, interactive
Discriminators inter-arrival time, length of
flow, packet length, volume of flow
Use machine learning/neural nets to cluster flows
E.g. http//www.pam2004.org/papers/166.pdf
Aggregation of parallel flows (needs care, but
not difficult)
Can use for giving performance forecast
Unclear if can use for detecting steps in
performance

46
Conclusions

Some tools fail at higher speeds
Throughputs often depend on non-network factors
Host interface speeds (DSL, 10Mbps Enet,
wireless), loads, resource congestion
Configurations (window sizes, hosts, number of
parallel streams)
Applications (disk/file vs mem-to-mem)
Looking at distributions by site, often
multi-modal
Predictions may have large standard deviations
Need automated assist to diagnose events

47
In Progress

Working on Netflow viz (currently at BNL SLAC)
then work with other LHC sites to deploy
Add support for pathneck
Look at other forecasters e.g. ARMA/ARIMA, maybe
Kalman filters, neural nets
Working on diagnosis of events
Multi-metrics, multi-paths
Signed collaborative agreement with Internet2 to
collaborate with PerfSONAR
Provide web services access to IEPM data
Provide analysis forecasting and event detection
to PerfSONAR data
Use PerfSONAR (e.g. router) data for diagnosis
Provide viz of PerfSONAR route information
Apply to LHCnet
Look at layer 1 2 information