JSOC Pipeline Processing Environment - PowerPoint PPT Presentation

About This Presentation
Title:

JSOC Pipeline Processing Environment

Description:

Storing keywords in relational database system (Oracle) ... Questions this meeting should address. List of all science data products ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 12
Provided by: brockca
Learn more at: http://hmi.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: JSOC Pipeline Processing Environment


1
JSOC Pipeline Processing Environment
  • Rasmus Munk Larsen, Stanford University
  • rmunk_at_quake.stanford.edu
  • 650-725-5485

2
Overview
  • JSOC data series organization
  • Pipeline execution environment
  • Pipeline software architecture
  • Co-I analysis module contribution
  • Pipeline Data Products

3
JSOC logical data organization
  • Evolved from MDI dataset concept to
  • Fix known limitations/problems
  • Accommodate more complex data models required by
    higher-level processing
  • Main design features
  • Separation of meta-data (keywords) and image data
  • No need to re-write large image files when only
    keywords change (lev1.8 problem)
  • No (fewer) out-of-date keyword values in FITS
    headers
  • Can bind to most recent values on export
  • Easier data access
  • All access in terms of (collections of) data
    records, which are the atomic units of a data
    series
  • A dataset name is a query specifying a set of
    data records (possibly from multiple data
    series)
  • jsochmi_lev0_com1_fg?recordnum12345 (a
    specific filtergram with unique record number
    12345)
  • jsochmi_lev0_cam1_fg12300-12330 (a minutes
    worth of filtergrams from camera1)
  • jsochmi_lev1_fd_V?T_OBSgt2008-11-01 AND
    T_OBSlt2008-12-01 AND N_MISSINGlt100
  • Storage and tape management must be transparent
    to user
  • Chunking of data records into storage units for
    efficient tape/disk usage done internally
  • Completely separate storage and catalog (i.e.
    series record) databases more modular design
  • Legacy MDI modules should run on top of new
    storage service
  • Storing keywords in relational database system
    (Oracle)

4
Logical Data Organization
JSOC Data Series
Data records for series hmi_lev1_fd_V
Single hmi_lev1_fd_V data record
Keywords RECORDNUM 12345 Unique serial
number SERIESNUM 5531704 Slots since
epoch. T_OBS 2009.01.05_232240_TAI DATAMIN
-2.537730543544E03 DATAMAX
1.935749511719E03 ... P_ANGLE
LINKORBIT,KEYWORDSOLAR_P
hmi_lev0_cam1_fg
hmi_lev1_fd_V12345
aia_lev0_cont1700
hmi_lev1_fd_V12346
hmi_lev1_fd_M
hmi_lev1_fd_V12347
hmi_lev1_fd_V
Links ORBIT hmi_lev0_orbit, SERIESNUM
221268160 CALTABLE hmi_lev0_dopcal, RECORDNUM
7 L1 hmi_lev0_cam1_fg, RECORDNUM 42345232 R1
hmi_lev0_cam1_fg, RECORDNUM 42345233
hmi_lev1_fd_V12348
aia_lev0_FE171
hmi_lev1_fd_V12349

hmi_lev1_fd_V12350
hmi_lev1_fd_V12351
hmi_lev1_fd_V12352
Data Segments V_DOPPLER
hmi_lev1_fd_V12353

Storage Unit Directory
5
JSOC Series Definition (JSD)
Global series
information Seriesnam
e "testclass1" Description This is a
small example of a JSOC series definition." Author
"Rasmus Munk Larsen" Owners
"rmunk" Unitsize 10 Archive
1 Retention permanent Tapegroup
127 Primary Index
Keywords
Format Keyword ltnamegt, link, ltlinknamegt,
lttarget keyword namegt or Keyword ltnamegt,
lttypegt, ltdefault valuegt, ltformatgt, ltunitgt,
ltcommentgt Keyword "keywd0", float, 0.0f,
"f", "unit3",
"Comment3" Keyword "keywd1", double, 0.0,
"lf", "unit4", "Comment4" Keyword
"keywd2", datetime, "1970-01-01 000000",
"-s", "unit5", "Comment5" Keyword "keywd3",
timestamp, "19700101000000", "-s", "unit6",
"Comment6" Keyword "keywd4", string, "",
"-s", "unit7", "Comment7" Keyword
"keywd5", link, "link1", "keywd0" Keyword
"keywd6", char, '\0',
"d", "unit1", "Comment1" Keyword "keywd7",
int, 0, "d", "unit2",
"Comment2" Links
Format
Link ltnamegt, lttarget seriesgt, static
dynamic Link "link0", "testclass0",
static Link "link1", "testclass0",
dynamic Data
segments Data
ltnamegt, lttypegt, ltnaxisgt, ltaxis dimsgt, ltunitgt,
ltprotocolgt Data "x-axis", float, 1, 100,
"m", fits Data "y-axis", float, 1, 200, "m",
fits Data "z-axis", float, 1, 50, "m",
fits Data "pressure", float, 3, 100, 200, 50,
"kg/(s2m)", fitz Data "velocity", float, 4,
100, 200, 50, 3, "m/s", fitz
Creating a new Data Series
testclass1.jsd
JSD parser
SQL INSERT INTO series_catalog
VALUES(testclass1,rmunk, SQL
CREATE TABLE testclass1 ( recnum
integer not null unique, keywd0
binary_float,
Oracle database
6
Pipeline batch processing (a.k.a. MDI mapfile)
  • Pipeline processing is scheduled in batches by
    PUI a data driven pipeline scheduler inherited
    from MDI
  • A pipeline batch is a single atomic transaction
  • If no module fails all data records are commited
    and become visible to other clients of the
    archive
  • If failure occurs all data records are deleted
    and the database rolled back

7
Pipeline Client-Server Architecture
Pipeline client process
Analysis code C/Fortran/IDL/Matlab
OpenRecords CloseRecords
GetKeyword, SetKeyword GetLink, SetLink
OpenDataSegment CloseDataSegment
File I/O
JSOC Library
Data Segment I/O
JSOC Disks
JSOC Disks
Record Cache (KeywordsLinksData paths)
JSOC Disks
JSOC Disks
Storage unit transfer
Storage Unit Management Service (SUMS)
Data Record Management Service (DRMS)
AllocUnit GetUnit PutUnit
Storage unit transfer
SQL query
Tape Archive Service
Oracle Database Server
SQL query
SQL query
Record Catalogs
Record Catalogs
Series Catalog
Record Catalogs
Storage Database
8
co-I contributions and collaboration
  • Contributions from co-I teams
  • Software for intermediate and high level analysis
    modules
  • Output data series definition
  • Keywords, links, data segments, size of storage
    units etc.
  • Documentation (detailed enough to understand the
    contributed code)
  • Test data and intended results for verification
  • Time
  • Explain algorithms and implementation
  • Help with verification
  • Collaborate on improvements if required (e.g.
    performance or maintainability)
  • Contributions from HMI team
  • Pipeline execution environment
  • Software hardware resources (Development
    environment, libraries, tools)
  • Time
  • Help with defining data series
  • Help with porting code to JSOC API
  • If needed, collaborate on algorithmic
    improvements, tuning for JSOC hardware,
    parallelization
  • Verification

9
HMI module status and MDI heritage
Intermediate and high level data products
Primary observables
Internal rotation
Heliographic Doppler velocity maps
Spherical Harmonic Time series
Mode frequencies And splitting
Internal sound speed
Full-disk velocity, sound speed, Maps (0-30Mm)
Local wave frequency shifts
Ring diagrams
Doppler Velocity
Carrington synoptic v and cs maps (0-30Mm)
Time-distance Cross-covariance function
Tracked Tiles Of Dopplergrams
Wave travel times
High-resolution v and cs maps (0-30Mm)
Egression and Ingression maps
Wave phase shift maps
Deep-focus v and cs maps (0-200Mm)
Far-side activity index
Stokes I,V
Line-of-sight Magnetograms
Line-of-Sight Magnetic Field Maps
Stokes I,Q,U,V
Full-disk 10-min Averaged maps
Vector Magnetograms Fast algorithm
Vector Magnetic Field Maps
Vector Magnetograms Inversion algorithm
Coronal magnetic Field Extrapolations
Tracked Tiles
Tracked full-disk 1-hour averaged Continuum maps
Coronal and Solar wind models
Continuum Brightness
Solar limb parameters
Brightness feature maps
Brightness Images
10
Questions this meeting should address
  • List of all science data products
  • Which data products, including intermediate ones,
    should be produced by JSOC?
  • What cadence, resolution, coverage etc.
    will/should each data product have?
  • Eventually a JSOC series description must be
    written for each one.
  • Which data products should be computed on the fly
    and which should be archived?
  • Have we got the basic pipeline right? Are there
    maturing new techniques that have been
    overlooked?
  • Detailing each branch of the processing pipeline
  • What are the detailed steps in each branch?
  • Can some of the computational steps be
    encapsulated in general tools that can be shared
    among different branches (example tracking)?
  • What are the computer resource requirements of
    computational steps?
  • Contributed analysis modules
  • Who will contribute code?
  • Which codes are mature enough for inclusion?
    Should be at least working research code now,
    since integration has to begin by c. mid 2006.

11
Example Global Seismology Pipeline
Write a Comment
User Comments (0)
About PowerShow.com