JSOC Pipeline Processing Environment - PowerPoint PPT Presentation

About This Presentation

Title:

JSOC Pipeline Processing Environment

Description:

Storing keywords in relational database system (Oracle) ... Questions this meeting should address. List of all science data products ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 12

Provided by: brockca

Learn more at: http://hmi.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: JSOC Pipeline Processing Environment

1
JSOC Pipeline Processing Environment

Rasmus Munk Larsen, Stanford University
rmunk_at_quake.stanford.edu
650-725-5485

2
Overview

JSOC data series organization
Pipeline execution environment
Pipeline software architecture
Co-I analysis module contribution
Pipeline Data Products

3
JSOC logical data organization

Evolved from MDI dataset concept to
Fix known limitations/problems
Accommodate more complex data models required by
higher-level processing
Main design features
Separation of meta-data (keywords) and image data
No need to re-write large image files when only
keywords change (lev1.8 problem)
No (fewer) out-of-date keyword values in FITS
headers
Can bind to most recent values on export
Easier data access
All access in terms of (collections of) data
records, which are the atomic units of a data
series
A dataset name is a query specifying a set of
data records (possibly from multiple data
series)
jsochmi_lev0_com1_fg?recordnum12345 (a
specific filtergram with unique record number
12345)
jsochmi_lev0_cam1_fg12300-12330 (a minutes
worth of filtergrams from camera1)
jsochmi_lev1_fd_V?T_OBSgt2008-11-01 AND
T_OBSlt2008-12-01 AND N_MISSINGlt100
Storage and tape management must be transparent
to user
Chunking of data records into storage units for
efficient tape/disk usage done internally
Completely separate storage and catalog (i.e.
series record) databases more modular design
Legacy MDI modules should run on top of new
storage service
Storing keywords in relational database system
(Oracle)

4
Logical Data Organization
JSOC Data Series
Data records for series hmi_lev1_fd_V
Single hmi_lev1_fd_V data record
Keywords RECORDNUM 12345 Unique serial
number SERIESNUM 5531704 Slots since
epoch. T_OBS 2009.01.05_232240_TAI DATAMIN
-2.537730543544E03 DATAMAX
1.935749511719E03 ... P_ANGLE
LINKORBIT,KEYWORDSOLAR_P
hmi_lev0_cam1_fg
hmi_lev1_fd_V12345
aia_lev0_cont1700
hmi_lev1_fd_V12346
hmi_lev1_fd_M
hmi_lev1_fd_V12347
hmi_lev1_fd_V
Links ORBIT hmi_lev0_orbit, SERIESNUM
221268160 CALTABLE hmi_lev0_dopcal, RECORDNUM
7 L1 hmi_lev0_cam1_fg, RECORDNUM 42345232 R1
hmi_lev0_cam1_fg, RECORDNUM 42345233
hmi_lev1_fd_V12348
aia_lev0_FE171
hmi_lev1_fd_V12349

hmi_lev1_fd_V12350
hmi_lev1_fd_V12351
hmi_lev1_fd_V12352
Data Segments V_DOPPLER
hmi_lev1_fd_V12353

Storage Unit Directory
5
JSOC Series Definition (JSD)
Global series
information Seriesnam
e "testclass1" Description This is a
small example of a JSOC series definition." Author
"Rasmus Munk Larsen" Owners
"rmunk" Unitsize 10 Archive
1 Retention permanent Tapegroup
127 Primary Index
Keywords
Format Keyword ltnamegt, link, ltlinknamegt,
lttarget keyword namegt or Keyword ltnamegt,
lttypegt, ltdefault valuegt, ltformatgt, ltunitgt,
ltcommentgt Keyword "keywd0", float, 0.0f,
"f", "unit3",
"Comment3" Keyword "keywd1", double, 0.0,
"lf", "unit4", "Comment4" Keyword
"keywd2", datetime, "1970-01-01 000000",
"-s", "unit5", "Comment5" Keyword "keywd3",
timestamp, "19700101000000", "-s", "unit6",
"Comment6" Keyword "keywd4", string, "",
"-s", "unit7", "Comment7" Keyword
"keywd5", link, "link1", "keywd0" Keyword
"keywd6", char, '\0',
"d", "unit1", "Comment1" Keyword "keywd7",
int, 0, "d", "unit2",
"Comment2" Links
Format
Link ltnamegt, lttarget seriesgt, static
dynamic Link "link0", "testclass0",
static Link "link1", "testclass0",
dynamic Data
segments Data
ltnamegt, lttypegt, ltnaxisgt, ltaxis dimsgt, ltunitgt,
ltprotocolgt Data "x-axis", float, 1, 100,
"m", fits Data "y-axis", float, 1, 200, "m",
fits Data "z-axis", float, 1, 50, "m",
fits Data "pressure", float, 3, 100, 200, 50,
"kg/(s2m)", fitz Data "velocity", float, 4,
100, 200, 50, 3, "m/s", fitz
Creating a new Data Series
testclass1.jsd
JSD parser
SQL INSERT INTO series_catalog
VALUES(testclass1,rmunk, SQL
CREATE TABLE testclass1 ( recnum
integer not null unique, keywd0
binary_float,
Oracle database
6
Pipeline batch processing (a.k.a. MDI mapfile)

Pipeline processing is scheduled in batches by
PUI a data driven pipeline scheduler inherited
from MDI
A pipeline batch is a single atomic transaction
If no module fails all data records are commited
and become visible to other clients of the
archive
If failure occurs all data records are deleted
and the database rolled back

7
Pipeline Client-Server Architecture
Pipeline client process
Analysis code C/Fortran/IDL/Matlab
OpenRecords CloseRecords
GetKeyword, SetKeyword GetLink, SetLink
OpenDataSegment CloseDataSegment
File I/O
JSOC Library
Data Segment I/O
JSOC Disks
JSOC Disks
Record Cache (KeywordsLinksData paths)
JSOC Disks
JSOC Disks
Storage unit transfer
Storage Unit Management Service (SUMS)
Data Record Management Service (DRMS)
AllocUnit GetUnit PutUnit
Storage unit transfer
SQL query
Tape Archive Service
Oracle Database Server
SQL query
SQL query
Record Catalogs
Record Catalogs
Series Catalog
Record Catalogs
Storage Database
8
co-I contributions and collaboration

Contributions from co-I teams
Software for intermediate and high level analysis
modules
Output data series definition
Keywords, links, data segments, size of storage
units etc.
Documentation (detailed enough to understand the
contributed code)
Test data and intended results for verification
Time
Explain algorithms and implementation
Help with verification
Collaborate on improvements if required (e.g.
performance or maintainability)
Contributions from HMI team
Pipeline execution environment
Software hardware resources (Development
environment, libraries, tools)
Time
Help with defining data series
Help with porting code to JSOC API
If needed, collaborate on algorithmic
improvements, tuning for JSOC hardware,
parallelization
Verification

9
HMI module status and MDI heritage
Intermediate and high level data products
Primary observables
Internal rotation
Heliographic Doppler velocity maps
Spherical Harmonic Time series
Mode frequencies And splitting
Internal sound speed
Full-disk velocity, sound speed, Maps (0-30Mm)
Local wave frequency shifts
Ring diagrams
Doppler Velocity
Carrington synoptic v and cs maps (0-30Mm)
Time-distance Cross-covariance function
Tracked Tiles Of Dopplergrams
Wave travel times
High-resolution v and cs maps (0-30Mm)
Egression and Ingression maps
Wave phase shift maps
Deep-focus v and cs maps (0-200Mm)
Far-side activity index
Stokes I,V
Line-of-sight Magnetograms
Line-of-Sight Magnetic Field Maps
Stokes I,Q,U,V
Full-disk 10-min Averaged maps
Vector Magnetograms Fast algorithm
Vector Magnetic Field Maps
Vector Magnetograms Inversion algorithm
Coronal magnetic Field Extrapolations
Tracked Tiles
Tracked full-disk 1-hour averaged Continuum maps
Coronal and Solar wind models
Continuum Brightness
Solar limb parameters
Brightness feature maps
Brightness Images
10
Questions this meeting should address

List of all science data products
Which data products, including intermediate ones,
should be produced by JSOC?
What cadence, resolution, coverage etc.
will/should each data product have?
Eventually a JSOC series description must be
written for each one.
Which data products should be computed on the fly
and which should be archived?
Have we got the basic pipeline right? Are there
maturing new techniques that have been
overlooked?
Detailing each branch of the processing pipeline
What are the detailed steps in each branch?
Can some of the computational steps be
encapsulated in general tools that can be shared
among different branches (example tracking)?
What are the computer resource requirements of
computational steps?
Contributed analysis modules
Who will contribute code?
Which codes are mature enough for inclusion?
Should be at least working research code now,
since integration has to begin by c. mid 2006.