Farm Batch System FBS and Fermi InterProcess Communication and Synchronization Toolkit FIPC - PowerPoint PPT Presentation

About This Presentation
Title:

Farm Batch System FBS and Fermi InterProcess Communication and Synchronization Toolkit FIPC

Description:

List of strings (double-ended queue) ... end # reader.csh. fipc create flag /test/writing_f 1 ... end. Igor Mandrichenko, FNAL. CHEP 2000. 20. FIPC and FBS ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 21
Provided by: IVM5
Category:

less

Transcript and Presenter's Notes

Title: Farm Batch System FBS and Fermi InterProcess Communication and Synchronization Toolkit FIPC


1
Farm Batch System (FBS) andFermi Inter-Process
Communication and Synchronization Toolkit (FIPC)
  • M.Breitung, J.Fromm, T.Levshina, I.Mandrichenko,
    M.Schweitzer
  • Fermi National Accelerator Laboratory

2
CHEP 2000 Presentation
  • Off-Line Data Processing for Run II
  • FBS
  • Requirements
  • Design and features
  • FIPC
  • Why FIPC ?
  • Design and features
  • FBS and FIPC

3
Off-line Data Processing for Run II
  • CDF and D0 off-line processing power estimate
  • 100 250 thousand MIPS
  • 350 900 Pentium 500 MHz CPUs
  • Linux PC farms will be used for off-line
    processing
  • 170 450 dual-CPU PCs
  • Number of processes
  • 350 900 concurrent processes
  • Typical farm job is parallel job
  • 10 parallel processes per job
  • Duration 10 hours
  • Number of jobs
  • 35 90 concurrent jobs

4
Farm Batch System(FBS)
  • Requirements and Design
  • Features
  • Status and Future

5
FBS Requirements
  • Scalability FBS should scale up to
  • 2000 processors
  • 2000 concurrent user processes
  • 200 simultaneously running jobs
  • 200 jobs per hour started
  • Unit of operation - parallel job
  • Typical job size is 10 processes

6
FBS Requirements
  • Cost and Reliability
  • Low maintenance and support cost
  • Low cost per node
  • Robust with respect to node shutdowns
  • Should not require 24/7 support of all farm nodes
  • Should recover after failure of FBS components
  • Portability
  • Linux and other UNIX OS flavors

7
FBS Design Load Balancing
  • Traditional Solution Load Measuring
  • FBS Solution Resource Allocation

8
FBS Design Farm Model
9
FBS Design Job and Sections
  • FBS job consists of sections.
  • Each section is an array of identical processes
    of certain type.
  • Sections are identified by name.
  • Sections can depend on one or more other sections
    of the job.
  • Dependency types
  • Done successfully
  • Failed
  • Finished
  • Started

10
FBS Sample Job Description File
SECTION Init QUEUE IO_QUEUE EXEC
my_bin/dump_tape.sh XYZ1234 /mnt/stage/XYZ1234 NUM
PROC 1 STDERR /dev/null STDOUT
logs/j.n.out DISK 3 SECTION Process QUEUE
CPU_QUEUE EXEC my_bin/do_processing.sh
/mnt/stage/XYZ1234 NUMPROC 5 STDERR
logs/j.n.errors STDOUT logs/proc_j.n.log DIS
K 10 NEED 1 DEPEND done(Init) SECTION
CleanUp QUEUE FAST_QUEUE EXEC
my_bin/std_cleanup.sh /mnt/stage/XYZ1234 NUMPROC
1 DEPEND exited(Process)
11
FBS Design Components
12
FBS Status
  • In production since fall 1998
  • Fixed target experiments (15 nodes, Linux OSF1)
  • Prototype farm for CDF and D0 (18 nodes, Linux)
  • Currently, 2 fixed target farms (37 and 21 nodes,
    Linux, IRIX)
  • CDF and D0 are setting up 50-node farms (Linux,
    IRIX)
  • Successfully used for more than a year for
    off-line data processing

13
FBS Re-design Project (FBSNG)
  • Goals
  • Stop using LSF as scheduler and job storage
  • Reduce support and maintenance cost
  • Make room for new features
  • Abstract resources
  • Customizable scheduler
  • Make FBS more farm-friendly and farm-aware
  • Avoid possible scalability problems
  • Status
  • We plan to release first version in April-May 2000

14
Fermi Inter-process Communication and
Synchronization Toolkit (FIPC)
  • Why FIPC ?
  • Design and Features
  • FBS and FIPC

15
Why FIPC ?
  • FBS Long-term resource allocation
  • Batch system provides long-term control.
    Resources are allocated for job lifetime.
  • FIPC Short-term resource allocation
  • Some resources are used for only short intervals
    during job execution
  • Transfer data over network watch for network
    overload
  • Access shared disk areas uploading output data
  • and non-resource related synchronization and
    communication

16
FIPC Objects
  • Gate, counted semaphore
  • Has room for certain number of clients
  • Client can wait at the gate, enter the gate, exit
    the gate
  • Lock, binary semaphore
  • Equivalent to Gate with room for 1 client
  • Client can lock and unlock the lock
  • Client queue
  • Client enters the queue, waits in queue, exits
    the queue
  • Integer flag
  • Client can wait for value to reach threshold, and
    optionally increment, or decrement, or set new
    value
  • List of strings (double-ended queue)
  • Client can append or insert a string to the
    lists tail or head
  • Remove first or last item of the list
  • String variable
  • Client can perform set or match-and-set
    operations using Regular Expressions notation

17
FIPC Design
  • FIPC Servers run on some farm nodes (server
    nodes).
  • Server node can run one or more FIPC Servers.
  • Servers communicate via Ring Protocol.
  • Servers are redundant have the same information
    about all FIPC objects.
  • FIPC objects are truly distributed.
  • Servers can go down and then re-join the Ring at
    any time.
  • Client communicates with randomly selected
    server.
  • All operations on FIPC objects are atomic

18
Using FIPC
  • FIPC is written in Python
  • Portability
  • Command line user interface
  • Shell level commands
  • GUI
  • Monitoring, simple operations
  • API
  • Python binding
  • Plans for C/C bindings

19
FIPC Example, readers/writers problem
  • writer.csh
  • fipc create flag /test/writing_f 1
  • fipc create queue /test/writer_q
  • while (1)
  • fipc append /test/writer_q
  • while (fipc qwait -t 100 /test/writer_q)
  • fipc clean queue /test/writer_q
  • end
  • fipc fwait /test/writing_f \gt 0
  • write_file
  • fipc fset /test/writing_f 0
  • fipc remove /test/writer_q
  • end
  • reader.csh
  • fipc create flag /test/writing_f 1
  • fipc create queue /test/reader_q
  • while (1)
  • fipc append /test/reader_q
  • while (fipc qwait -t 100 /test/reader_q)
  • fipc clean queue /test/reader_q
  • end
  • fipc fwait /test/writing_f lt 1
  • read_file
  • fipc fset /test/writing_f 1
  • fipc remove /test/reader_q
  • end

20
FIPC and FBS
  • FIPC was designed as complimentary product for
    FBS users.
  • However, FBS and FIPC are completely independent.
  • FIPC can be used in batch or non-batch
    distributed environment.
  • FBS and FIPC form a suite of farm batch data
    processing tools that have been successfully used
    by fixed target experiments and will be used for
    Run II data processing.
Write a Comment
User Comments (0)
About PowerShow.com