Title: ESnet4: Enabling Next Generation Distributed Experimental and Computational Science for the US DOEs
1ESnet4Enabling Next GenerationDistributed
Experimental and Computational Science for the US
DOEs Office of ScienceSummer, 2006
- William E. Johnston ESnet Department Head and
Senior ScientistLawrence Berkeley National
Laboratory - www.es.net
2 DOE Office of Science and ESnet
- The Office of Science is the single largest
supporter of basic research in the physical
sciences in the United States, providing more
than 40 percent of total funding for the
Nations research programs in high-energy
physics, nuclear physics, and fusion energy
sciences. (http//www.science.doe.gov) - The large-scale science that is the mission of
the Office of Science is dependent on networks
for - Sharing of massive amounts of data
- Supporting thousands of collaborators world-wide
- Distributed data processing
- Distributed simulation, visualization, and
computational steering - Distributed data management
- ESnets mission is to enable those aspects of
science that depend on networking and on certain
types of large-scale collaboration
3ESnet Layer 2 Architecture Provides Global
High-Speed Internet Connectivity for DOE
Facilities and Collaborators (spring, 2006)
Japan (SINet) Australia (AARNet) Canada
(CAnet4 Taiwan (TANet2) Singaren
CAnet4 France GLORIAD (Russia, China)Korea
(Kreonet2
MREN Netherlands StarTapTaiwan (TANet2, ASCC)
PNWGPoP/PAcificWave
CERN (USLHCnetCERNDOE funded)
ESnet Science Data Network (SDN) core
AU
ESnet IP core
NYC
MAE-E
SNV
CHI
Equinix
PAIX-PA Equinix, etc.
SNV SDN
ATL
SDSC
AU
ALB
42 end user sites
Office Of Science Sponsored (22)
ELP
NNSA Sponsored (12)
International (high speed) 10 Gb/s SDN core 10G/s
IP core 2.5 Gb/s IP core MAN rings ( 10
G/s) OC12 ATM (622 Mb/s) OC12 / GigEthernet OC3
(155 Mb/s) 45 Mb/s and less
Joint Sponsored (3)
Other Sponsored (NSF LIGO, NOAA)
ESnet IP core Packet over SONET Optical Ring and
Hubs
Laboratory Sponsored (6)
commercial and RE peering points
ESnet core hubs
IP
high-speed peering points with Internet2/Abilene
4Challenges Facing ESnet - A Changing Science
Environment is the key Driver of ESnet
- Large-scale collaborative science big
facilities, massive data, thousands of
collaborators is now a key element of the
Office of Science (SC) - SC science community is almost equally split
between Labs and universities, and SC facilities
have users worldwide - Very large international (non-US) facilities
(e.g. LHC, ITER) and international collaborators
are now also a key element of SC science - Distributed systems for data analysis,
simulations, instrument operation, etc., are
becoming common
5Changing Science Environment ? New Demands on
Network
- Increased capacity
- Needed to accommodate a large and steadily
increasing amount of data that must traverse the
network - High network reliability
- For interconnecting components of distributed
large-scale science - High-speed, highly reliable connectivity between
Labs and US and international RE institutions - To support the inherently collaborative, global
nature of large-scale science - New network services to provide bandwidth
guarantees - For data transfer deadlines for remote data
analysis, real-time interaction with instruments,
coupled computational simulations, etc.
6Future Network Planning by Requirements
- There are many stakeholders for ESnet
- SC supported scientists
- DOE national facilities
- SC programs
- Lab operations
- Lab general population
- Lab networking organizations
- Other RE networking organizations
- non-DOE RE institutions
- Requirements are determined by
- data characteristics of instruments and
facilities - examining the future process of science
- observing traffic patterns
7Requirements from Instruments and Facilities
- Typical DOE large-scale facilities Tevatron
(FNAL), RHIC (BNK), SNS (ORNL), ALS (LBNL),
supercomputer centers NERSC, NCLF (ORNL), Blue
Gene (ANL) - This is the hardware infrastructure of DOE
science types of requirements can be summarized
as follows - Bandwidth Quantity of data produced,
requirements for timely movement - Connectivity Geographic reach location of
instruments, facilities, and users plus network
infrastructure involved (e.g. ESnet, Abilene,
GEANT) - Services Guaranteed bandwidth, traffic
isolation, etc. IP multicast - Data rates and volumes from facilities and
instruments bandwidth, connectivity, services - Large supercomputer centers (NERSC, NLCF)
- Large-scale science instruments (e.g. LHC, RHIC)
- Other computational and data resources (clusters,
data archives, etc.) - Some instruments have special characteristics
that must be addressed (e.g. Fusion) bandwidth,
services - Next generation of experiments and facilities,
and upgrades to existing facilities bandwidth,
connectivity, services - Addition of facilities increases bandwidth
requirements - Existing facilities generate more data as they
are upgraded - Reach of collaboration expands over time
- New capabilities require advanced services
8Requirements from Case Studies
- Advanced Scientific Computing Research (ASCR)
- NERSC (LBNL) (supercomputer center)
- NLCF (ORNL) (supercomputer center)
- ACLF (ANL) (supercomputer center)
- Basic Energy Sciences
- Advanced Light Source
- Macromolecular Crystallography
- Chemistry/Combustion
- Spallation Neutron Source
- Biological and Environmental
- Bioinformatics/Genomics
- Climate Science
- Fusion Energy Sciences
- Magnetic Fusion Energy/ITER
- High Energy Physics
- LHC
- Nuclear Physics
- RHIC
- There is a high level of correlation between
network requirements for large and small scale
science the only difference is bandwidth - Meeting the requirements of the large-scale
stakeholders will cover the smaller ones,
provided the required services set is the same
9Science Network Requirements Aggregation Summary
10Science Network Requirements Aggregation Summary
11Requirements from Observing the Network
ESnet is currently transporting 600 to 700
terabytes/monthand this volume is increasing
exponentially (approximately 10x every 46 months)
ESnet Monthly Accepted Traffic February, 1990
December, 2005
Terabytes / month
12Footprint of SC Collaborators - Top 100 Traffic
Generators
- Universities and research institutes that are the
top 100 ESnet users - The top 100 data flows generate 30 of all ESnet
traffic (ESnet handles about 3x109 flows/mo.) - 91 of the top 100 flows are from the Labs to
other institutions (shown) (CY2005 data)
13Who Generates ESnet Traffic?
ESnet Inter-Sector Traffic Summary, Nov 05
62
10
Commercial
9
DOE is a net supplier of data because DOE
facilities are used by universities and
commercial entities, as well as by DOE researchers
ESnet
13
14
RE (mostlyuniversities)
DOE sites
16
Peering Points
50
25
International(almost entirelyRE sites)
DOE collaborator traffic, inc. data
13
Traffic coming into ESnet Green Traffic leaving
ESnet Blue Traffic between ESnet sites of
total ingress or egress traffic
- Traffic notes
- more than 90 of all traffic is Office of Science
- less that 15 is inter-Lab
14Network Observation Large-Scale Science Flows
by Site(Among other things these observations
set the network footprint requirements)
15Dominance of Science Traffic (1)
- Top 100 flows are increasing as a percentage of
total traffic volume - 99 to 100 of top 100 flows are science data
(100 starting mid-2005) - A small number of large-scale science users
account for a significant and growing fraction of
total traffic volume
1/05
2 TB/month
7/05
2 TB/month
1/06
2 TB/month
16Requirements from Observing the Network
- In 4 years, we can expect a 10x increase in
traffic over current levels without the addition
of production LHC traffic - Nominal average load on busiest backbone links is
1.5 Gbps today - In 4 years that figure will be 15 Gbps if
current trends continue - Measurements of this kind are science-agnostic
- It doesnt matter who the users are, the traffic
load is increasing exponentially - Bandwidth trends drive requirement for a new
network architecture - Current architecture not scalable in a
cost-efficient way
17Requirements from Traffic Flow Characteristics
- Most of ESnet science traffic has a source or
sink outside of ESnet - Drives requirement for high-bandwidth peering
- Reliability and bandwidth requirements demand
that peering be redundant - Multiple 10 Gbps peerings today, must be able to
add more flexibly and cost-effectively - Bandwidth and service guarantees must traverse
RE peerings - Collaboration with other RE networks on a common
framework is critical - Seamless fabric
- Large-scale science is becoming the dominant user
of the network - Satisfying the demands of large-scale science
traffic into the future will require a
purpose-built, scalable architecture - Traffic patterns are different than commodity
Internet - Since large-scale science traffic will be the
dominant user, the network should be architected
to serve large-scale science
18Virtual Circuit Characteristics
- Traffic isolation and traffic engineering
- Provides for high-performance, non-standard
transport mechanisms that cannot co-exist with
commodity TCP-based transport - Enables the engineering of explicit paths to meet
specific requirements - e.g. bypass congested links, using lower
bandwidth, lower latency paths - Guaranteed bandwidth Quality of Service (QoS)
- Addresses deadline scheduling
- Where fixed amounts of data have to reach sites
on a fixed schedule, so that the processing does
not fall far enough behind that it could never
catch up very important for experiment data
analysis - Reduces cost of handling high bandwidth data
flows - Highly capable routers are not necessary when
every packet goes to the same place - Use lower cost (factor of 5x) switches to
relatively route the packets - End-to-end connections are required between Labs
and collaborator institutions
19OSCARS Guaranteed Bandwidth VC Service For SC
Science
- ESnet On-demand Secured Circuits and Advanced
Reservation System (OSCARS) - To ensure compatibility, the design and
implementation is done in collaboration with the
other major science RE networks and end sites - Internet2 Bandwidth Reservation for User Work
(BRUW) - Development of common code base
- GEANT Bandwidth on Demand (GN2-JRA3),
Performance and Allocated Capacity for End-users
(SA3-PACE) and Advance Multi-domain Provisioning
System (AMPS) Extends to NRENs - BNL TeraPaths - A QoS Enabled Collaborative Data
Sharing Infrastructure for Peta-scale Computing
Research - GA Network Quality of Service for Magnetic
Fusion Research - SLAC Internet End-to-end Performance Monitoring
(IEPM) - USN Experimental Ultra-Scale Network Testbed for
Large-Scale Science - In its current phase this effort is being funded
as a research project by the Office of Science,
Mathematical, Information, and Computational
Sciences (MICS) Network RD Program - A prototype service has been deployed as a proof
of concept - To date more then 20 accounts have been created
for beta users, collaborators, and developers - More then 100 reservation requests have been
processed
20 ESnets Place in U. S. and International Science
- ESnet, Internet2/Abilene, and National Lambda
Rail (NLR) provide most of the nations transit
networking for basic science - Abilene provides national transit networking for
most of the US universities by interconnecting
the regional networks (mostly via the GigaPoPs) - ESnet provides national transit networking and
ISP service for the DOE Labs - NLR provides various science-specific and network
RD circuits - GÉANT plays a role in Europe similar to Abilene
and ESnet in the US it interconnects the
European National Research and Education
Networks, to which the European RE sites connect - GÉANT currently carries essentially all ESnet
traffic to Europe - LHC use of LHCnet to CERN is still ramping up
21Federated Trust Services
- Remote, multi-institutional, identity
authentication is critical for distributed,
collaborative science in order to permit sharing
computing and data resources, and other Grid
services - The science community needs PKI to formalize the
existing web of trust within science
collaborations and to extend that trust into
cyber space - The function, form, and policy of the ESnet trust
services are driven entirely by the requirements
of the science community and by direct input from
the science community - Managing cross site trust agreements among many
organizations is crucial for authentication in
collaborative environments - ESnet assists in negotiating and managing the
cross-site, cross-organization, and international
trust relationships to provide policies that are
tailored to collaborative science
22DOEGrids CA (one of several CAs) Usage Statistics
Report as of Jun 19, 2006
23Technical Approach for Next Generation ESnet
- ESnet has developed a technical approach designed
to meet all known requirements - Significantly increased capacity
- New architecture for
- Increased reliability
- Support of traffic engineering (separate IP and
circuit oriented traffic) - New circuit services for
- Dynamic bandwidth reservation
- Traffic isolation
- Traffic engineering
- Substantially increased US and international RE
network connectivity and reliability - All of this is informed by SC program and
community input
24ESnet4 Architecture and Configuration
Core networks 40-50 Gbps in 2009, 160-400 Gbps
in 2011-2012
Europe (GEANT)
CERN (30 Gbps)
Canada (CANARIE)
Canada (CANARIE)
Asia-Pacific
CERN (30 Gbps)
Asia Pacific
GLORIAD (Russia and China)
Europe (GEANT)
Asia-Pacific
Science Data Network Core
Seattle
Cleveland
Boston
Chicago
Australia
IP Core
Boise
New York
Kansas City
Denver
Sunnyvale
Washington DC
Atlanta
Tulsa
Albuquerque
LA
South America (AMPATH)
Australia
San Diego
South America (AMPATH)
Jacksonville
IP core hubs
Production IP core (10Gbps) SDN core
(20-30-40Gbps) MANs (20-60 Gbps) or backbone
loops for site access International connections
SDN hubs
Houston
Primary DOE Labs
High Speed Cross connects with Ineternet2/Abilene
Fiber path is 14,000 miles / 24,000 km