PIPE Dreams - PowerPoint PPT Presentation

About This Presentation
Title:

PIPE Dreams

Description:

The vision of science grids allocating resources to analyze ... INFN-Milan. CESnet. APAN. Geant. EDG. PPDG/GriPhyN. Monitoring Site. ORNL. Stanford. UTAH. DNVR ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 24
Provided by: warr174
Learn more at: https://chep03.ucsd.edu
Category:
Tags: pipe | dreams | milan

less

Transcript and Presenter's Notes

Title: PIPE Dreams


1
PIPE Dreams
  • Trouble Shooting Network Performance for
    Production Science Data Grids
  • Presented by Warren Matthews at CHEP03, San
    Diego March 24-28, 2003

2
Abstract
The vision of science grids allocating resources
to analyze huge quantities of HENP data clearly
depends on reliable network performance. Tools
developed at SLAC in conjunction with the
Internet2 PIPES project will help to ensure this.
In this talk, these tools will be discussed and
the procedure for publishing performance data, in
particular using the Globus toolkit's MDS and web
services will be reviewed. The subsequent
analysis and trouble-shooting methodology will be
discussed with real world examples from the
particle physics data grid (PPDG) and the
European data grid (EDG).
3
Overview
  • What is the problem ?
  • What is PIPES ?
  • Network performance monitoring
  • Problem identification

4
Network Monitoring for the Grid
  • The Data Grid consists of many components that
    must interoperate

Farm
Data
requestor
Farm
Data
The Network
Data
Farm
requestor
Resource Broker
5
Allocate Resources
  • The resource broker must be fully informed
  • Measurement is required !

Farm
Data
requestor
12 pkt loss
Farm
Data
The Network
OC48
80 Utilization
Data
Farm
requestor
Resource Broker
6
What is PIPES ?
  • Internet2
  • End-to-end performance initiative
  • PI Performance Evaluation System (PIPES)
  • PIPES Monitoring Platform (PMP)
  • Overlap with goals of HENP
  • Tremendous resources

7
IEPM-BW
  • Package developed at SLAC
  • Measurement Engine
  • Iperf, bbftp, bbcp, ping, traceroute
  • Abwe, owamp, udpmon, gridftp
  • Job Manager
  • Data Storage and data server
  • Analysis Engine

8
LANL
EDG
KEK
CERN
TRIUMF
NIKHEF
NERSC
FNAL
IN2P3
ANL
CHI
CERN
PPDG/GriPhyN
SNV
ESnet
ORNL
RAL
JLAB
NY
UCL
ORNL
SLAC
UManc
SLAC
Imperial
JAnet
DL
NNW
BNL
APAN
Stanford
RIKEN
Stanford
INFN-Roma
APAN
INFN-Padua
Geant
CalREN
INFN-Milan
Abilene
SEA
CESnet
NY
NASA
WASH
SNV
Monitoring Site
SOX
HSTN
ATL
DNVR
CLV
IPLS
UTAH
SDSC
UFL
CALTECH
I2
UTDallas
UMich
Rice
NCSA
9
NNW
BaBar Grid
Manchester
10 Gbps
TVN
622Mbps
RAL
Janet
ESnet
SWERN
SLAC
Bristol
Geant
Stanford
DFN
Dresden
Calren
Abilene
1 Gbps
2.5 Gbps
Renater
IN2P3
10
(No Transcript)
11
Problem Identification
  • Typical Scenario
  • User complains file transfer is slow
  • Net admin runs ping, traceroute, iperf test
  • Complain to upstream provider
  • Proactive
  • What do we mean by throughput?
  • How do we know there was a performance hit?
  • Our approach is diurnal changes

12
(No Transcript)
13
Alarms
  • Too much to keep track of
  • Rather not wait for complaints
  • Automated Alarms
  • Rolling average à la RIPE-TT
  • May not be the best approach
  • AMP Automated Detection System

14
(No Transcript)
15
(No Transcript)
16
Limitations
  • Could be over an hour before alarm is generated
  • More frequent measurements impact the network and
    measurements overlap
  • Low impact tools allow finer grained measurement
  • Use NWS multi-variate method
  • Use SCIDAC ABwE tool
  • Use PingER, OWAMP

17
(No Transcript)
18
Publishing
  • Many monitoring projects, publish data to allow
    them to inter-operate
  • MDS
  • EDG NM Schema
  • Web Services
  • GLUE NE Schema
  • GGF NMWG
  • Hierarchy Doc
  • Tools Doc

./get_data 2003 3 18 6 1 41 1.61 1.601 1.62 0
19
Net Rat
  • Alarm System
  • Multiple tools
  • Multiple measurement points
  • Trigger further measurements
  • Cross reference off site stats
  • Informant database
  • No measurement is authoritative
  • Cannot even believe a measurement

20
Log
03/20/2003 201346 ALARM pcgiga
throughput305.224 ctresh512.95
athresh312.91 03/20/2003 201348 TRACE no
change in route detected 03/20/2003 201607 CALM
Throughput within acceptable limits. ALARM
CANCELLED
21
Toward a Monitoring Infrastructure
  • MAGGIE
  • Measurement and Analysis package built on
    NIMI/Akenti
  • EDEE
  • production-quality Data Grid for Europe

22
More Information
  • IEPM Home Page
  • IEPM-BW
  • I2 E2E and PIPES
  • RIPE-TT
  • AMP Automated Event Detection
  • NWS
  • ABWE

23
End
This talk made possible by the IEPM team at SLAC
(Les Cottrell, Connie Logg, Jiri Navratil, Jerrod
Williams, Fabrizio Coccetti), and the many
developers and maintainers around the world.
Write a Comment
User Comments (0)
About PowerShow.com