SAMGrid: the merging mechanism - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

SAMGrid: the merging mechanism

Description:

For every request, montecarlo thumbnails are produced 'in parallel' at each cluster. ... We should improve the logic for storing merged thumbnails ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 11
Provided by: igo73
Category:

less

Transcript and Presenter's Notes

Title: SAMGrid: the merging mechanism


1
SAM-Grid the merging mechanism
  • Gabriele Garzoglio for the SAM-Grid Team

2
Overview
  • Context
  • Montecarlo Production
  • The Merging mechanism
  • General comments
  • Room for improvements
  • Conclusion

3
Context
  • JIM and SAM can be used together to produce
    montecarlo events at three execution sites
    Wisconsin, Manchester, Lyon. Other sites are also
    joining.
  • For every request, montecarlo thumbnails are
    produced in parallel at each cluster.
  • JIM provides a mechanism to merge the files
    produced.

4
Production mechanism
  • The user submits a job to SAM-Grid to process N
    events of a certain request.
  • Eventually the job is dispatched to an execution
    site.
  • The grid job is split into M parallel local jobs.
  • Each local job uses Runjob to produce the events
    and store the files.
  • Thumbnails are stored into a SAM location (disk)
    close to the cluster.

5
Merging mechanism I
  • When the production jobs are finished, a merging
    job can be submitted to SAM-Grid.
  • The job needs to know a dataset definition for
    the files to be merged. Basic consistency checks
    are applied at the client site.
  • Eventually the job is dispatched to an execution
    site.

6
Merging mechanism II
  • A single local job is submitted to the batch
    system. The job stages the files from SAM locally
    and uses copyd0om (via Runjob) to merge them.
  • merge.py manipulates the metadata and stores the
    merged files to SAM.
  • When the file is handed over to FSS, the
    thumbnail locations are undeclared from the
    database.
  • The files can then be physically removed

7
General Comments
  • The dataset definition that identifies the files
    to merge can use dimensions like request id, grid
    job id of the production job, event size,
    dates...
  • We do basic checks before submission e.g. are
    the files in the dataset unmerged thumbnails?
    AND do they have well formed parents.
  • Can merge files produced in different sites the
    mechanism relies on SAM for the delivery

8
Room for improvement... I
  • We dont have a way to distinguish merged
    thumbnails (no dimension, no datatier) we try
    not to depend on it and we use file names if we
    need to guess
  • We should optimize the checks at the client
  • We should improve the logic for storing merged
    thumbnails
  • use asynchronous store instead of synchronous

9
Room for improvement... II
  • Removal of used unmerged thumbnails
  • using a file status vs. undeclaring the location
    from the database ?
  • automatic physical file removal
  • usage of sam caches (e.g. sam import file)
  • Working on automation, by automatically launching
    merging jobs

10
Conclusions
  • SAM-Grid uses SAM to select the files to be
    merged
  • The location of production is decoupled from the
    location of merging
  • We use the SAM-Grid infrastructure for job
    dispatch and monitoring
  • The infrastructure is in production since March.
  • The SAR institutions have developed a JIM
    interface to mc_farm and ramping up.
  • We can still improve the overall robustness and
    automation
Write a Comment
User Comments (0)
About PowerShow.com