Title: DataTAG project presentation -Internet2 Spring meeting (Arlington)
1DataTAG Project
IST Copenhagen, 4/6 Nov- 2002 Slides collected
by Cristina Vistoli INFN-CNAF
2- WP1
- Establishment of a high performance
intercontinental Grid testbed CERN - WP2
- High performance networking PPARC
- WP3
- Bulk data transfer validations and application
performance monitoring UvA - WP4
- Interoperability between Grid domains INFN
3Project focus
- Grid related network research (WP1, WP2, WP3)
- High Performance Transport protocols
- Inter-domain QoS
- Advance bandwidth reservation
- Interoperability between European and US Grids
(WP4) - N.B. In principle open to other EU Grid projects
as well as ESA for demonstrations
4DataTAG project
NewYork
Abilene
32.5G
STAR-LIGHT
ESNET
CERN
2.5G
10G
MREN
STAR-TAP
Major 2.5/10 Gbps circuits between Europe USA
5WP1 Status
- 2.5 Gbps transatlantic lambda between CERN
(Geneva) and StarLight (Chicago) - Circuit in place since August 20
- Part of the Amsterdam-Chicago-Geneva wave
triangle - Phase 1 with Cisco ONS15454 layer 2 Muxes
(August-September iGRID2002) - Phase 2 with Cisco 7606 routers (October)
- Phase 3 with Alcatel 1670 layer2 Muxes (November)
- Also extending to the French optical testbed VTHD
(2.5Gps to INRIA/Lyon) - And through VTGD to EU/ATRIUM
- And, of course, to GEANT
6Multi-vendor testbed with layer3 layer2
capabilities
INFN (Bologna)
STARLIGHT (Chicago)
CERN (Geneva)
Abilene
GEANT
ESnet
1.25Gbps
Juniper
Juniper
Research2.5Gbps
Cisco 6509
M
M
Alcatel
Alcatel
Starlight
GBE
Cisco
Cisco
M Layer 2 Mux
7Testbed deployment status
- multi-vendor testbed with layer2 and layer 3
capabilities, - interesting results already achieved
- Triumf-CERN 2Gbps lightpath demo (disk to disk)
- Terabyte file transfer (Monte Carlo simulated
events) - Single stream TCP/IP (with S.Ravot/Caltech
patches) - 8 Terabytes in 24 hours (memory to memory)
8Phase I (iGRID2002)
9Phase II (October 2002)Generic configuration
Servers
CERN
StarLight
Servers
GigE switch
GigE switch
2.5Gbps
C7606
C7606
10Phase III (November 2002)
VTHD
Routers
Servers
GigE switch
A1670 Multiplexer
GigE switch
A7770
C7606
nGigE
STARLIGHT
J-M10
C-ONS15454
Amsterdam
GEANT
CERN
Servers
Ditto
Abilene
ESNet
Canarie
11WP2
- The deployment of for Grid applications across
multiple domains for the optimal utilization of
network resources. Network services include
advance reservation and differentiated packet
treatment, while optimal utilization requires
the tuning of existing transport protocols,
elements of traffic engineering and the
identification and test of new ones. - Task 1 Transport applications for high
bandwidth-delay connections - Task 2 End-to-end inter-domain QoS
- Task 3 Advanced Reservation
12WP2.1
- Transport applications
- The goal of this task is the demonstration and
deployment of high performance transport
applications for efficient and reliable data
exchange over highbandwidth-delay connections. - Demonstration of sustained,reliable and robust
multi-gigabit/s data replication over long
distances is in itself an important goal, however
it is also essential to ensure that these
applications are deployed within the
intercontinental testbed context, and usable from
the middleware layer by applications.
13WP2.1
- In addition there are several other related areas
which will be investigated - 1. Application-controllable Transport Protocols
(ATPs) , for example, Parallel TCP (PTCP) and
Forward Error Corrected UDP (FEC-UDP) but not
limited to those - 2. Protocols for high frequency but relatively
small bulk data transfers Cached Parallel TCP
(CPTCP) - 3. Enhanced TCP implementations in combination
with (enhanced) congestio control mechanisms ,
for example Explicit Congestion Notification
(ECN) - 4. The potential for the use of bandwidth
brokers together with a review of their current
specification/implementation status
14WP2.2
- End-to-end inter-domain QoS
- Why QoS ?
- Grid Traffic needs
- Critical Data Access.
- Services lookup across WAN
- Interactive and Video applications
- Differentiated Services in the testbed...BUTBUT
.. - QoS mechanisms just work inside the domain.
- Demonstration of
- QoS propagation in more than one domain.
- QoS available from Grid middleware
15jan.2003
16Wp2.3
- Advance Reservation
- Evaluation of the different advance reservation
approaches and their interoperability between
Grid domains. This should lead to the deployment
of an advance reservation service in the
international testbed. - From a functional point of view the main blocks
that have to be studied and defined are - the user/application protocol
- the admission control algorithm
- the Intra-domain protocol
- the Inter-domain protocol
17Example of Generic AAA Architecture RFC2903
Rule Based Engine
Rule Based Engine
Rule Based Engine
Policy Repository
Policy Repository
Policy Repository
Application Specific Module
Application Specific Module
Application Specific Module
Users
Contracts Budgets
AAA Server
AAA Server
AAA Server
User
Bandwidth Broker
Registration Dept.
Purchase Dept.
(Virtual) User Organization
QoS Enabled Network
Service
Bandwidth Provider
Service Organization
18Generic AAA (RFC2903) based Bandwidth on Demand
192.168.1.5
192.168.1.6
192.168.2.3
192.168.2.4
802.1Q VLAN Switch Enterasys Matrix E5
A
C
802.1Q VLAN Switch Enterasys Matrix E5
1 GB SX
B
D
Policy DB
AAA
AAA Request
iGrid2002
19- Upcomming work
- Separate ASM and RBE and allow ASMs to be
loaded/unloaded dynamically. - Implement pre-allocation mechanisms (based on
GARA collaboration with Volker Sander). - Create ASM for other B/W manager (e.g. Alcatel
BonD, Cisco CTM, Level-3 Ontap) - Create ASM to talk to other domain OMNInet
- Allow RBEs to talk to each other (define
messages). - Integrate BoD AAA client into middleware eg by
allowing integration with GridFTP and integration
with VOMS authentication and user authorization
system. - Build WS interface abstraction for pre-allocation
and subsequent usage.
20WP3 objectives
- Bulk data transfer and application performance
monitoring - innovative monitoring tools are required to
measure and understand the performance of high
speed intercontinental networks and their
potential on real Grid application.
21Tasks in WP3
- Task 3.1. Performance validation (month 1-12)
- Create, collect, test network-tools to cope with
the extreme Lambda environment (high RTT, BW) - measure basic properties and establish a baseline
performance benchmark - Task 3.2 End user performance validation/monitorin
g/optimization (month 6-24) - Use out of band tools to measure and monitor
what performance a user in principle should be
able to reach - Task 3.3 Application performance validation,
monitoring and optimization (month 6-24) - Use diagnostic libraries and tools to monitor and
optimize real applications to compare their
performance with task 3.2 outcome.
22Task 3.1 experiences usingNetherlight SURFnet
Lambda AMS-CHI full duplex GE ? 2.5 Gbps SDH ,
100 ms RTT
- single stream TCP max. throughput 80 - 150 Mbps,
dependent on stream duration - similar for BSD and Linux and different adaptors
- UDP measurements show effects of hardware
buffer-size in ONS equipment when assigning lower
SDH bandwidths,see - www.science.uva.nl/wsjouw/datatag/lambdaperf.htm
l
23Summary of status
- Tools have been investigated, selected and where
necessary adapted to benchmark and characterize
Lambda networks - New tools appear and need to be studied to add to
the set - Influence layer 1 and 2 infrastructure properties
on TCP throughput is under study on SURFnet
Lambda testbed and NetherLight - Monitoring setup is underway, inclusion of WP7
toolset is next step - Application performance monitoring and
optimization should start soon
24WP4
- Interoperability between Grid domains
- To address issues of middleware
interoperability between the European and US Grid
domains and to enable a selected set of
applications to run on the transatlantic Grid
test bed.
25FRAMEWORK AND RELATIONSHIPS
- US partner iVDGL
- Grid middleware
- DataGRID Release 1
- GriPhyN/PPDG ? VDT v1
- Programme GLUE
- Applications
- LHC experiments Alice, Atlas, CMS
- Virgo (Ligo)
- CDF, D0, BaBar
- Plan for each experiment
26Framework
27Interoperability approach
- Grid services scenario and basic interoperability
requirements - Common VO scenario for the experiments in EU and
US - Set of mechanism application-independent as basic
grid functions - accessing storage or computing resources in a
grid environment requires resource discovery and
security mechanisms, requires logic for moving
data reliably from place to place, scheduling
sets of computational and data movement
operations, monitoring the entire system for
faults and responding to those faults. - specific Data Grid mechanisms as built on top of
a general basic grid infrastructure. - These basic protocols are the basis for
interoperability between different grid domains.
One implementation of them, representing the de
facto standards for Grid system, is the Globus
toolkit, that has been adopted by DataGRID and
GriPhyN/PPDG. - This situation has certainly facilitated the
interoperability approach definition.
28grid architectural model
29Grid resource access
- The first and most important requirement for
grid-interoperability in this scenario is the
need to access grid-resources wherever they are
with - common protocols,
- common security, authentication and authorization
basic mechanisms, and - common information describing grid resources.
- The Globus Toolkit provides access protocols
(GRAM), information protocols (GIS) and the
public key infrastructure (PKI)-based Grid
Security Infrastructure (GSI).
30User oriented requirements
- On top of the core services several flavours of
grid scheduling, job submission, resource
discovery and data handling can be developed that
must guarantee interoperability with the core
services and their coexistence within the same
grid domain. These services together with
sophisticated metadata catalogue, virtual data
system etc., are of particular interest for the
HEP experiment applications.
31CORE services
- GLUE PROGRAMME first results
- Information System
- CE schema defined and implemented
- SE on going (almost complete)
- NE not yet started
- Authorization System
- VO/LDAP server in common
- discussion and comparison between VOMS and CAS on
going - Resource discovery systems review for future
plans. - New network service (bandwith on demand) as grid
resource, NE, with interoperable AA mechanism.
32Grid optimization or collective services
- State of the art in DataGRID and GriPhyN/PPDG
- how to schedule
- how to access distributed and replicated data
- EU-DataGRID and US-GriPhyn/PPDG projects provide
different solutions to the above issues as
detailed below.
33Joint EU-US grid demos
- IST2002 4-6 November, Copenaghen
- SC2002 16-22 November, Baltimore
- Goals
- Basic collaboration between European and US grid
projects - Interoperability between grid domains for
applications submitted by users from different
virtual organizations - Controlled use of shared resources subject to
agreed policy - Integrated use of heterogeneous resources from
iVDGL and EDG testbed domains -
- Infrastructure
- ? WEB site http//www.ivdgl.org/demo
- Hypernews http//atlassw1.phy.bnl.gov/
HyperNews/get/intergrid.html - Mailing list igdemo_at_datagrid.cnr.it
- Archive http//web.datagrid.cnr.
it/hypermail/archivio/igdemo - ? GLUE testbed with common schema
- ? VO (DataTAG and iVDGL) LDAP Servers in EU
and US - ? PACMAN cache with software distribution
(DataTAG or iVDGL) - ? Planning document outline
GLUE
34Joint EU-US grid demos
- GLUE testbed with Common GLUE schema and
authorization/authentication tools. - EDG 1.2 extensions, VDT 1.1.3 extensions
- Old authentication/authorization tools VO
LDAP servers, mkgridmap, etc. - We rely on the availability of the new GLUE
schema, RB and Information Providers - ?Concentrate on visualization CMS/GENIUS,
ATLAS/GRAPPA, EDG/MAPCENTER (EDG/WP7), Farm
Monitoring tools (EDG/WP4), iVDGL/GANGLIA,
DataTAG/NAGIOS - Use WEB portals for job submission (CMS/Genius,
ATLAS/Grappa) - Provide a World Map of the sites involved
(EDG-WP7/MapCenter, DataTAG/Nagios) - Monitor job status and statistics (EDG-WP4,
DataTAG/Nagios). In Nagios implemented top
users, top applications, total resources,
averages over time, per VO, - Monitor farms (EDG/WP4, DataTAG/Nagios)
- Developing plugins/sensors for WP1/LB info (only
on EDG part) - Services Monitoring (EDG/WP4, DataTAG/Nagios
using MDS info) - CMS and ATLAS demo
- ATLAS simulation jobs, GRAPPA modified to use
both RB or explicit resources - Pythia CMSIM simulation jobs submitted to
intercontinental resources with IMPALA/BOSS
interfaced to VDT/MOP, EDG/JDL, Genius portal. - Definition of the application demos in progress.
-
GLUE
35LHC experiments
- The activity of the WP4s task has to be focused
on what is already deployed and used by the LHC
experiments for their current needs. - A coordinated plan must be settled to further
develop and integrate the current tools (both
from GRID Projects and from specific Experiments
software) into a common (and/or interoperable)
scenario. - Experiments individual plans were discussed with
each of them and agreed for the strategy to be
followed. High-level requirements and areas of
intervention have been identified through many
discussions and meetings with all of the LHC
Experiments. - One of the first results that came out is that
dedicated test layouts to experiment integration
of specific components must be deployed. The
scope of each test layout and the results
expected have been preliminarily defined and some
of them are already active (having agreed with
the interested experiments the details and the
resources needed). - Those test layouts are already in progress and
mostly concerning CMS and ATLAS. ALICE has also
defined its goals and is rapidly ramping-up.