Run Control Update and Database Information Overview - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Run Control Update and Database Information Overview

Description:

Very preliminary discussion in CD (no outside commentary yet, expect a lot of discussion! ... imagined that as nodes themselves power up, they load the OS and ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 13
Provided by: jimk53
Category:

less

Transcript and Presenter's Notes

Title: Run Control Update and Database Information Overview


1
Run Control Update and Database Information
Overview
  • Er, I Meant Resource Management.

2
Resource Management
  • Request - How we define to the system of hardware
    that RTES is to allocate, configure,etc and the
    language by which the computational task will be
    defined by the experimenter
  • Very preliminary discussion in CD (no outside
    commentary yet, expect a lot of discussion!)
  • Quickly decided we needed a context/perspective
    when are database accesses needed and what
    databases in particular?
  • Never made it past run control

3
(Experiment) Database Overview
  • run conditions - Snapshots of running conditions.
  • run configuration run types and associated
    information
  • run history summary of a completed run
  • trigger definitions
  • algorithm parameters (thresholds)
  • executables
  • event/segment/dataset catalog
  • luminosity
  • prescales
  • calibration
  • alignment
  • geometry
  • hardware and resource database
  • farm monitoring - history data
  • fault management strategies

4
Run Control Manager
  • Added layer on top of run control instances weve
    listed in the past. Launches the run control
    needed for each partition.
  • Desirable to allow for possible cache
    synchronization well before data taking to start.
  • Power turns on at the experiment
  • Must be robust against instance failures

Power down
5
Notes
  • It is imagined that as nodes themselves power up,
    they load the OS and establish routes/connections
    as available.
  • Initialize - ready the system for creation of run
    control instances.
  • Verify/start global servers that are partition
    independent
  • May involve attempting to synchronize cached
    executables on nodes and some configuration
    constants.
  • Synchronized with database trigger information
    and perhaps with other nodes
  • Available partitions can be created

6
Run Control Instance
7
Run Control States
  • What database information is needed when
  • Each partition needs to reserve the resources
    before configuring them
  • This can require some negotiation ie, it needs
    n processors of type x, but substitutions may be
    allowed.
  • For physics data taking runs, this will typically
    be done automatically
  • Users may need total control during commissioning
    runs
  • Abort takes you to before the reservation of
    resources or after?

8
Run Control States
  • Once reserved, resources needs to be configured
    based on run type.
  • Some may have a state already loaded from a
    previous run.
  • Configure a resource can be time consuming!
  • Runs are restarted often, many times an hour,
    during commissioning.
  • Configure should first verify if a specific
    resource needs to be reconfigured.
  • Verification needs to be FAST (seconds)

9
Run Control States
  • Reset takes you back to a state of having the
    resources reserved.
  • Clear frees resources.

10
Questions raised
  • what is the real definition of a segment within a
    run control (RC) instance?
  • how relevant is partition to the DB?
  • is it only seem in the hardware/resource
    database?
  • does it otherwise care?
  • - where does prescaling information really live?
  • what is its relationship with info in other
    databases?
  • can some "hardware initialization" take place
    outside of a run control instance?
    (available/updated excutables downloaded to disks
    and flash for example)

11
Potential Requirements
  • RC instance can run independent of the overall
    state machine (orphaned if master state machine
    dies)
  • Trigger farm will need to report status on a
    periodic basis or change event basis (subject to
    thresolds delivered asynchronously). Eg.
  • CPU usage / loade
  • configuration or fault correction action taken
  • memory/disk usage
  • per event size and processing time statistics
  • I/O performance
  • network utilization / load
  • various failures
  • throughput on nodes
  • fast verification (software, configuration,
    constants), (seconds)

12
Potential Requirements
  • operating consistent with the run control state
    diagram
  • experiment wide databases (not application
    specific)
  • spawned run control instances do not necessarily
    specify the list of nodes where it will run
    (dynamically determined but it should be able to
    do (ie certain subnet, class of machine, highway)
Write a Comment
User Comments (0)
About PowerShow.com