Hans Hoffman has described the scale of the problems that we are facing, I will try to describe the how we are trying to solve it - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Hans Hoffman has described the scale of the problems that we are facing, I will try to describe the how we are trying to solve it

Description:

... turn acquire their information by interrogating individual GRISes and National GIISes ... The JSS is based on CondorG. Logging information is kept at each stage. ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 21
Provided by: COLL68
Category:

less

Transcript and Presenter's Notes

Title: Hans Hoffman has described the scale of the problems that we are facing, I will try to describe the how we are trying to solve it


1
Developing an operational Grid
Hans Hoffman has described the scale of the
problems that we are facing, I will try to
describe the how we are trying to solve it
2
Developing an operational Grid
  • GridPP is involved in two projects
  • EU DataGrid Project
  • SAMGrid Project
  • These projects are build on top common products
    such as the globus toolkit and CondorG.

I will describe some of the aspects of the
middleware developed in these projects and how we
are deploying it.
3
Developing an operational Grid
In 15 minutes this will have to be a very brief
overview for more information see the many
posters we have here, talk to the people on the
booth or look at the GridPP website http//www.gr
idpp.ac.uk/
4
The DataGrid Project
I will only talk about a few of these boxes
5
The DataGrid Project
  • Resource Management
  • The user describes their jobs using a set of
    Condor ClassAds. The job is then submitted to a
    resource broker from any User Interface (UI)
    Machine.
  • Resource broker (RB) is at the centre of the
    resource management. The RB matches the
    requirement of the job to the resources. This
    uses the Condor ClassAd Libraries.
  • Information about available resources is cached
    by the Information Index (II) which the RB
    queries.
  • II in turn acquire their information by
    interrogating individual GRISes and National
    GIISes
  • Information about the location of data is stored
    in a replica catalogue.

6
The DataGrid Project
An example Executable "WP1testF" StdOutput
"sim.out" StdError "sim.err" InputSandbox
"/home/datamat/sim.exe", "/home/datamat/DATA/"
OutputSandbox "sim.err","sim.err","testD.out"
Rank other.TotalCPUs other.AverageSI00 Requ
irements other.LRMSType "PBS" \
(other.OpSys "Linux RH 6.1" other.OpSys
"Linux RH 6.2") RetryCount 2 Arguments
"file1" InputData "LFtest10099-1001" ReplicaC
atalog "ldap//sunlab2g.cnaf.infn.it2010/rcWP2
INFN Test Replica Catalog,dcsunlab2g, dccnaf,
dcinfn, dcit" DataAccessProtocol
"gridftp" OutputSE "grid001.cnaf.infn.it"
7
The DataGrid Project
Resource Management If the RB is able to match
the job to a resource it then passes the job over
to the Job Submission Service (JSS), which then
submits the job to the selected resource. The JSS
is based on CondorG. Logging information is
kept at each stage. All user interaction is via
UI and he/she is able list resources that match
their requirements, submit jobs, examine the
status of submitted jobs, access all logging
information about their jobs and cancel jobs.
8
The DataGrid Project
Information Services (R-GMA)
Design and implementation based Grid Monitoring
Architecture of the GGFwith the term directory
replaced with registry to avoid any implied
structure.
9
The DataGrid Project
The current implementation uses servelet
technology, with APIs in Java, C, C, Perl and
Python.
10
The DataGrid Project
Information Services (R-GMA)
  • Gives the impression of one RDBMS per VO.
  • Currently there are 2 types of producer
  • A circular buffer producer
  • No RDBMS is used and SQL queries are handled by
    the code. A consumer may miss records if it is
    too slow
  • A data base producer
  • Uses a databse to hold data so data is never
    lost, however it is slower and requires a clean
    up strategy to avoid indefinite growth.
  • More producers are being implement at the request
    of the users.

11
The DataGrid Project
Information Services (R-GMA)
An Example CPU load at various sites. Note all
information is timestamped.
SELECT FROM CPULoad WHERE Country UK AND
Site RAL Would give the output of producer
1.
12
The DataGrid Project
Network Monitoring
Essential if network information is to be used in
brokering. SLACs IEPM uses Pinger to measure
round trip time, iperf, bbftp and bbcp to measure
TCP throughput and UDPmon to measure UDP
throughput.
Sample results from IEPM monitoring between SLAC
and Daresbury.
13
The DataGrid Project
Network Monitoring
The network monitoring information can then be
published via an LDAP service.
14
The DataGrid Project
Installation and configuration (LCFG) Configurati
on of large numbers of different machines can be
very troublesome. DataGrid uses LCFG. Each
Machine has its own profile which can include
general site profiles and individual
configuration opinions Profile then published in
XML
15
The DataGrid Project
/ gw31
BARRY'S WN / / Host
specific definitions / define HOSTNAME gw31 /
Some useful macros / include "macros-cfg.h" /
Site specific definitions / include
"site-cfg-farm.h.ic" / Linux default resources
/ include "linuxdef-cfg.h / LCFG client
specific resources / include "client_testbed-cfg
.h" / Well, obviously, if you read the title
!!!!!!! / include "WorkerNode-cfg.h /
Specific NIC / update.modlist
label update.mod_label alias eth0 eepro100
XML published profile
16
How do we know what is working?
We monitor each site.
17
SAMGrid
SAM is used by the DØ experiment so SAM is
operational but but is it a Grid? Currently SAM
is mainly data management tool. Locations of
replicas are stored in a central database. Files
are moved to a running job as and when needed.
Currently 1TB/day. SAM can only submit jobs to
local resources. Has been modified to use
gridftp. Being updated for remote submission
using CondorG ready very soon (in testing now).
18
SAMGrid
Real Data files from FNAL
MC files from NIKHEF
19
Developing an operational Grid
Conclusions
DataGrid being deployed across Europe and just
starting to be used ATLAS Experiment last
week. SAM already being used by many users and
is being modified for remote submission and to
use transfer protocols such as gridftp.
20
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com