Upgrade D0 farm - PowerPoint PPT Presentation

About This Presentation
Title:

Upgrade D0 farm

Description:

Upgrade D0 farm – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 35
Provided by: vanl151
Category:
Tags: farm | grep | reimport | upgrade

less

Transcript and Presenter's Notes

Title: Upgrade D0 farm


1
Upgrade D0 farm
2
Reasons for upgrade
  • RedHat 7 needed for D0 software
  • New versions of
  • ups/upd v4_6
  • fbsng v1_3fp2_1
  • sam
  • Use of farm for MC and analysis
  • Integration in farm network

3
MC production on farm
  • Input requests
  • Request translated in mc_runjob macro
  • Stages
  • mc_runjob on batch server (hoeve)
  • MC job on node
  • SAM store on file server (schuur)

4
1.2 TB
fbs(rcp,sam)
mcc request
farm server
SAM DB
file server
fbs job 1 mcc 2 rcp 3 sam
fbs(mcc)
datastore
mcc input
FNAL SARA
mcc output
node
100 cpus
control
40 GB
data
metadata
5
cron sam
1.2 TB
fbs(rcp,sam)
mcc request
farm server
SAM DB
file server
fbs job 1 mcc 2 rcp
fbs(mcc)
datastore
mcc input
FNAL SARA
mcc output
node
100 cpus
control
40 GB
data
metadata
6
hoeve
node
schuur
fbsuser mc_runjob
cron
fbs submit
fbsusercp fbsusermcc
fbsuser rcp
fbs submit
willemsam
data
control
7
SECTION mcc EXEC/d0gstar/curr/minbias-0
2073214824/batch NUMPROC1 QUEUEFastQ
STDOUT/d0gstar/curr/minbias-02073214824/stdou
t STDERR/d0gstar/curr/minbias-02073214824/st
dout SECTION rcp EXEC/d0gstar/curr/minb
ias-02073214824/batch_rcp NUMPROC1
QUEUEIOQ DEPENDdone(mcc)
STDOUT/d0gstar/curr/minbias-02073214824/stdout_rc
p STDERR/d0gstar/curr/minbias-02073214824/st
dout_rcp
8
!/bin/sh . /usr/products/etc/setups.sh cd
/d0gstar/mcc/mcc-dist . mcc_dist_setup.sh mkdir
-p /data/curr/minbias-02073214824 cd
/data/curr/minbias-02073214824 cp -r
/d0gstar/curr/minbias-02073214824/ . touch
/d0gstar/curr/minbias-02073214824/.uname -n sh
minbias-02073214824.sh pwd gt log touch
/d0gstar/curr/minbias-02073214824/uname
-n /d0gstar/bin/check minbias-02073214824
batch_rcp runs on schuur
!/bin/sh iminbias-02073214824 if -f
/d0gstar/curr/i/OK then mkdir -p
/data/disk2/sam_cache/i cd /data/disk2/sam_cache/
i nodels /d0gstar/curr/i/node nodebasename
node jobecho i awk 'print
substr(0,length-8,9)' rcp -pr
node/data/dest/d0reco/recojob . rcp -pr
node/data/dest/reco_analyze/rAtpljob . rcp
-pr node/data/curr/i/Metadata/.params . rcp
-pr node/data/curr/i/Metadata/.py . rsh -n
node rm -rf /data/curr/i rsh -n node rm -rf
/data/dest//job touch /d0gstar/curr/i/RCP f
i
batch runs on node
9
runs on schuur called by fbs or cron
!/bin/sh locate() filegrep "import "
import_1_job.py awk -F \" 'print
2' sam locate file fgrep -q return ? .
/usr/products/etc/setups.sh setup
sam SAM_STATIONhoeve export SAM_STATION tosam1
LISTcat tosam for job in LIST do cd
/data/disk2/sam_cache/job list'gen d0g
sim' for i in list do until locate i
(sam declare import_i_job.py locate
i) do sleep 60 done done list'reco
recoanalyze' for i in list do sam store
--descripimport_i_job.py --sourcepwd
return? echo Return code sam store
return done done echo Job finished ...
declare gen, d0g, sim
store reco, recoanalyze
10
Filestream
  • Fetch input from sam
  • Read input file from schuur
  • Process data on node
  • Copy output to schuur

11
hoeve
node
schuur
attach filestream
mc_runjob
cron
fbs submit
rcp d0exe
rcp
fbs submit
sam
data
control
12
Analysis on farm
  • Stages
  • Read files from sam
  • Copy files to node(s)
  • Perform analysis on node
  • Copy files to file server
  • Store files in sam

13
1.2 TB
fbs(1), fbs(3)
farm server
SAM DB
file server
  1. sam rcp
  2. analyze
  3. rcp sam

fbs(2)
datastore
FNAL SARA
node
100 cpus
control (fbs)
40 GB
data
metadata
14
triviaal
node-2
willemsam
input
fbsuserrcp
fbsuser analysis program
output
fbsuserrcp
willemsam
15
SECTION sam EXEC/home/willem/batch_sam
NUMPROC1 QUEUEIOQ
STDOUT/home/willem/stdout
STDERR/home/willem/stdout
batch.jdf
batch_sam
!/bin/sh . /usr/products/etc/setups.sh setup
sam SAM_STATIONtriviaal export SAM_STATION sam
run project get_file.py --interactive gt
log /usr/bin/rsh -n -l fbsuser triviaal rcp -r
/stage/triviaal/sam_cache/boo node-2/data/test
gtgt log
16
1.2 TB
fbs(1), fbs(3)
farm server
SAM DB
file server
fbs(2)
  1. sam
  2. rcp analyze rcp
  3. rcp sam

datastore
FNAL SARA
node
100 cpus
control (fbs)
40 GB
data
metadata
17
triviaal
node-2
willemsam
fbsuserfbs submit
fbsuser rcp analysis program rcp
input
output
willemsam
18
rsh -l fbsuser triviaal fbs submit
willem/batch_node.jdf
SECTION sam EXEC/d0gstar/batch_node
NUMPROC1 QUEUEFastQ
STDOUT/d0gstar/stdout STDERR/d0gstar/stdout
!/bin/sh uname -a date
19
SECTION ana EXEC/d0gstar/batch_node
NUMPROC1 QUEUEFastQ
STDOUT/d0gstar/stdout STDERR/d0gstar/stdout
SECTION sam EXEC/home/willem/batch
NUMPROC1 QUEUEIOQ STDOUT/home/willem
/stdout STDERR/home/willem/stdout
!/bin/sh . /usr/products/etc/setups.sh setup
fbsng setup sam SAM_STATIONtriviaal export
SAM_STATION sam run project get_file.py
--interactive gt log /usr/bin/rsh -n -l fbsuser
triviaal fbs submit /home/willem/batch_node.jdf
!/bin/sh rcp -pr server/stage/triviaal/sam_cache
/boo /data/test . /d0/fnal/ups/etc/setups.sh setup
root -q KCC_4_0exceptionoptthread setup
kailib root -b -q /d0gstar/test.C
gSystem-gtcd("/data/test/boo") gSystem-gtExec("pw
d") gSystem-gtExec("ls -l")
20
This file sets up and runs a SAM
project. import os, sys, string, time,
signal from re import from globals import
import run_project from commands import

Set the following variables to appropriate
values Consult database for valid
choices sam_station "triviaal"
Consult Database for valid choices project_definit
ion "op_moriond_p1014" A particular snapshot
version, last or new snapshot_version
'new' Consult database for valid
choices appname "test" version
"1" group "test"
get_file.py
The maximum number of files to get from
sam max_file_amt 5 for additional
debug info use "--verbose" verbosity
"--verbose" verbosity "" Give up
on all exceptions give_up 1 def
file_ready(filename) Replace this
python subroutine with whatever you
want to do to process the file that was
retrieved. This function will only be
called in the event of a successful
delivery. print "File ",filename," has
been delivered!" os.system('cp
'filename' /stage/triviaal/sam') return
21
Disk partitioning hoeve
/d0
/fnal
/mcc
/fbsng
/mcc-dist
/mc_runjob
/d0dist
/d0usr
/curr
/fnal -gt /d0/fnal /d0usr -gt /fnal/d0usr /d0dist
-gt /fnal/d0dist /usr/products -gt /fnal/ups
22
ana_runjob
  • Is analogous to mc_runjob
  • Creates and submits analysis jobs
  • Input
  • get_file.py with SAM project name
  • Project defines files to be processed
  • analysis script

23
Integration with grid (1)
  • At present separate clusters
  • D0, LHCb, Alice, DAS cluster
  • hoeve and schuur in farm network

24
Present network layout
ajax
hefnet
schuur
hoeve
router
switch
surfnet
node
node
node
NFS
25
New network layout
hefnet
ajax
lambda
farmrouter
booder
switch
switch
switch
hoeve
schuur
LHCb
D0
alice
NFS
26
New network layout
hefnet
ajax
lambda
farmrouter
das-2
booder
switch
switch
switch
hoeve
schuur
LHCb
D0
alice
NFS
27
Server tasks
  • hoeve
  • software server
  • farm server
  • schuur
  • fileserver
  • sam node
  • booder
  • home directory server
  • in backup scheme

28
Integration with grid (2)
  • Replace fbs with pbs or condor
  • pbs on Alice and LHCb nodes
  • condor on das cluster
  • Use EDG installation tool LCGF
  • Install d0 software with rpm
  • Problem with sam (uses ups/upd)

29
Integration with grid (3)
  • Package mcc in rpm
  • Separate programs from working space
  • Use cfg commands to steer mc_runjob
  • Find better place for card files
  • Input structure now created on node

30
Grid job
PBS job
submit
!/bin/sh macro1 pwdpwd cd
/opt/fnal/d0/mcc/mcc-dist . mcc_dist_setup.sh cd
pwd dir/opt/fnal/d0/mcc/mc_runjob/py_script pyth
on dir/Linker.py scriptmacro
willem_at_tbn09 willem cat test.pbs PBS batch
job script PBS -o /home/willem/out PBS -e
/home/willem/err PBS -l nodes1 Changing to
directory as requested by user cd
/home/willem Executing job as requested by
user ./submit minbias.macro
31
RunJob class for grid
class RunJob_farm(RunJob_batch) def
__init__(self,nameNone) RunJob_batch.__init
__(self,name) self.myType"runjob_farm"
def Run(self) self.jobname
self.linker.CurrentJob() self.jobnaam
string.splitfields(self.jobname,'/')-1 comm
'chmod x ' self.jobname
commands.getoutput(comm) if
self.tdconf'RunOption' 'RunInBackground'
RunJob_batch.Run(self) else bq
self.tdconf'BatchQueue' dirn
os.path.dirname(self.jobname) print dirn
comm 'cd ' dirn ' sh ' self.jobnaam
' pwd gt stdout' print comm
runcommand(comm)
32
To be decided
  • Location of minimum bias files
  • Location of MC output

33
Job status
  • Job status is recorded in
  • fbs
  • /d0/mcc/curr/ltjob_namegt
  • /data/mcc/curr/ltjob_namegt

34
SAM servers
  • On master node
  • station
  • fss
  • On master and worker nodes
  • stager
  • bbftp
Write a Comment
User Comments (0)
About PowerShow.com