Using Stork Barcelona, 2006 - PowerPoint PPT Presentation

About This Presentation
Title:

Using Stork Barcelona, 2006

Description:

http://www.cs.wisc.edu/condor. 2. Meet Friedrich* Friedrich is a scientist with a BIG problem. *Frieda's twin brother. 3 ... 11. Personal Condor works well with Stork ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 38
Provided by: Csw5
Category:

less

Transcript and Presenter's Notes

Title: Using Stork Barcelona, 2006


1
Using StorkBarcelona, 2006
2
Meet Friedrich
  • Friedrich is a scientist with a BIG problem.

Friedas twin brother
3
I have a lot of data to process.
4
Friedrich's problem
  • Friedrich has many large data sets to process.
    For each data set
  • stage the data in from a remote server
  • run a job to process the data
  • stage the data out to a remote server

5
The Classic Data Transfer Job
!/bin/sh globus-url-copy source dest Scripts
often work fine for short, simple data transfers,
but
6
Many things can go wrong!
  • These errors are more likely with large data
    sets
  • The network is down.
  • The data server is unavailable.
  • The transferred data is corrupt.
  • The workflow does not know that the data was bad.

7
Stork Solves Problems
  • Creates the concept of the
  • data placement job
  • Managed and scheduled the same as any Condor job
  • Friedrichs jobs benefit from built-in fault
    tolerance

8
Supported Data Transfer Protocols
  • local file system
  • GridFTP
  • FTP
  • HTTP
  • SRB
  • NeST
  • SRM
  • and, it is extensible to other protocols

9
Fault Tolerance
  • Retries failed jobs
  • Can also retry a failed data transfer job using
    an alternate protocol.
  • For example, first try GridFTP, then try FTP
  • Retry stuck jobs
  • Configurable fault responses

10
Getting Stork
  • Stork is part of Condor, so get Condor. . .
  • Available as a free download from
  • http//www.cs.wisc.edu/condor
  • Currently available for Linux platforms

11
Personal Condor works well with Stork
  • This is Condor/Stork on your own workstation, no
    root access required, no system administrator
    intervention needed
  • After installation, Friedrich submits his jobs to
    his Personal Stork

12
Friedrichs Personal Condor
Friedrich's workstation
Central Mgr.
Master
StartD
SchedD
Stork
data jobs
CPU jobs
DAG
N compute elements
external data servers
13
Stork will ...
  • Keep an eye on data placement jobs, and it will
    keep you posted on their progress
  • Throttle the maximum number of jobs running
  • Keep a log of job activities
  • Add fault tolerance to all jobs
  • Detect and retry failed data placement jobs

14
The Submit Description File
  • Just like the rest of Condor, a plain ASCII text
    file, but with a different format
  • Written in new ClassAd language
  • Neither Stork nor Condor care about file name
    extensions
  • Contents of file tells Stork about jobs
  • data placement type, source/destination
    location/protocol, proxy location, alternate
    protocols to try

15
Simple Submit File
// c style comment lines // file name is
stage-in.stork dap_type "transfer"
src_url http//server/path" dest_url
"file///dir/file" log
"stage-in.log"
Note different format from Condor submit files
16
Another Simple Submit File
// c style comment lines // file name is
stage-in.stork dap_type "transfer"
src_url gsiftp//server/path" dest_url
"file///dir/file" x509proxy "default"
log "stage-in.log"
Note different format from Condor submit files
17
Running stork_submit
  • Give stork_submit the name of the submit file
  • stork_submit stage-in.stork
  • stork_submit parses the submit file, checks for
    it errors, and sends the job to the Stork server.
  • stork_submit returns the created job id (a job
    handle)

18
Sample stork_submit
stork_submit stage-in.stork Sen
ding request dest_url
"file///dir/file" src_url
http//server/path" dap_type
"transfer" log "path/stage-in.log"
Request assigned id 1
job id
19
The Job Queue
  • stork_submit sends the job to the Stork server
  • The Stork server manages the local job queue
  • View the queue with stork_q, or stork_status

20
Job Status
  • stork_q queries all active jobs
  • stork_q
  • stork_status queries the given job id, which may
    be active, or complete
  • stork_status 12

21
Removing jobs
  • To remove a data placement job from the queue,
    use stork_rm
  • You may only remove jobs that you own
  • (Unix root may remove anyones jobs)
  • Give a specific job ID
  • stork_rm 21 removes a single job

22
Use Log Files
// c style comment lines dap_type
"transfer" src_url "gsiftp//server/path"
dest_url "file///dir/file" x509proxy
"default" log "stage-in.log"
23
Sample Stork User Log
000 (001.-01.-01) 04/17 193000 Job submitted
from host lt128.105.121.5354027gt ... 001
(001.-01.-01) 04/17 193001 Job executing on
host lt128.105.121.539621gt ... 008 (001.-01.-01)
04/17 193001 job type transfer ... 008
(001.-01.-01) 04/17 193001 src_url
gsiftp//server/path ... 008 (001.-01.-01) 04/17
193001 dest_url file///dir/file ... 005
(001.-01.-01) 04/17 193002 Job terminated.
(1) Normal termination (return value 0)
Usr 0 000000, Sys 0 000000 - Run Remote
Usage Usr 0 000000, Sys 0 000000 -
Run Local Usage Usr 0 000000, Sys 0
000000 - Total Remote Usage Usr 0
000000, Sys 0 000000 - Total Local Usage
0 - Run Bytes Sent By Job 0 - Run Bytes
Received By Job 0 - Total Bytes Sent By
Job 0 - Total Bytes Received By Job ...
24
Stork and DAGMan
  • Data placement jobs are integrated with Condors
    DAGMan, and Friedrich benefits

25
Defining Friedrich's DAG
26
Friedrichs DAG
input1
input2
crunch
result
27
The DAG Input File
  • file name is friedrich.dag
  • DATA input1 input1.stork
  • DATA input2 input2.stork
  • JOB crunch process.submit
  • DATA result result.stork
  • PARENT input1 input2 CHILD crunch
  • PARENT crunch CHILD result

28
One of the Stork Submit Files
// file name is input1.stork dap_type
"transfer" src_url http//north.cs.wisc.ed
u/ freidrich/data1" dest_url
"file///home/friedrich/in1" log
"in1.log"
29
Condor Submit Description File
  • file name is process.submit
  • universe vanilla
  • executable process
  • input in1
  • output crunch.result
  • error crunch.err
  • log crunch.log
  • queue

30
Stork Submit File
// file name is result.stork dap_type
"transfer" src_url
"file///home/friedrich/crunch.result"
dest_url http//north.cs.wisc.edu/
friedrich/final.results" log
"result.log"
31
Friedrich Submits the DAG
  • While Friedrichs current working directory is
    /home/friedrich
  • condor_submit_dag friedrich.dag

32
In Review
  • With Stork Friedrich now can
  • Submit data processing jobs and go home! Because,
  • Stork manages the data transfers, including fault
    detection and retry
  • Condor DAGMan manages dependencies.

33
Additional Resources
  • http//www.cs.wisc.edu/condor/stork/
  • Condor Manual, Stork section
  • stork-announce_at_cs.wisc.edu list
  • stork-discuss_at_cs.wisc.edu list

34
Additional Slides
35
Important Parameters
  • STORK_MAX_NUM_JOBS limits number of active jobs
  • STORK_MAX_RETRY limits job attempts, before job
    marked as failed
  • STORK_MAXDELAY_INMINUTES specifies hung job
    threshold

36
Current Restrictions
  • Currently, best suited for Personal Stork mode
  • Local file paths must be valid on Stork server,
    including submit directory.
  • To share data, successive jobs in DAG must use
    shared filesystem

37
Future Work
  • Enhance multi-user fair share
  • Enhance support for DAGs without shared file
    system
  • Enhance scheduling with configurable job
    requirements and rank
  • Add DAP job matchmaking
  • Additional platform ports
Write a Comment
User Comments (0)
About PowerShow.com