DRMS Core System - PowerPoint PPT Presentation

About This Presentation
Title:

DRMS Core System

Description:

To update such a record, one first makes a clone of it, then update the clone. The clone and the original share the same prime keys ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 14
Provided by: franki150
Learn more at: http://hmi.stanford.edu
Category:
Tags: drms | core | keys | system

less

Transcript and Presenter's Notes

Title: DRMS Core System


1
DRMS Core System
  • Karen Tian
  • ktian_at_Stanford.EDU

2
Requirements and usage status
  • Requirements
  • 1TB per year
  • Thousands of transactions per day
  • Stable system
  • Usage started in late 2005
  • Current users
  • SID project
  • MDI data export
  • HMI ground test data ingest
  • DB size
  • SUMSDRMS 11GB
  • DRMS gt 7.7M records (hmi ground 7.2M records)

3
DRMS software structure
4
A typical module
  • Select input
  • drms_open_records()
  • Processing/analysis
  • drms_getkey_()
  • drms_segment_read()
  • Write output
  • drms_create_records() or drms_clone_records()
  • drms_setkey_()
  • drms_segment_write()
  • drms_close_records()

5
A typical module
  • Select input
  • drms_open_records()
  • Processing/analysis
  • drms_getkey_()
  • drms_segment_read()
  • Write output
  • drms_create_records() or drms_clone_records()
  • drms_setkey_()
  • drms_segment_write()
  • drms_close_records()

6
DB performance evaluation
  • Query speed
  • Table size (both width and length)
  • Number of records in a series 150M records
  • Fraction of a second for query on indexed
    keyword, gt30 minutes on non-indexed keyword
  • Performance depending on hardware (disk speed,
    etc)
  • Avoid such big tables split into smaller tables
  • Insert speed bulk better than individual
  • Concurrency
  • No problem yet with table locking
  • Performance depends on mix
  • Index type
  • Order matters in a composite index
  • Better way to implement prime key currently
    composite index
  • Additional indices for selected keywords
  • Dataset names
  • GROUP BY, ORDER BY
  • GROUP BY to select the most recent version
  • ORDER BY to sort according to prime key
  • Upgrade PostgreSQL to get better sorting
    performance

7
Records in memory
  • Current implementation use one query to gets all
    keywords
  • Efficient for DB query
  • Inefficient for memory usage, especially if
    interested in
  • A subset of keywords of a record
  • A column of keywords from a set of records

8
A typical module
  • Select input
  • drms_open_records()
  • Processing/analysis
  • drms_getkey_()
  • drms_segment_read()
  • Write output
  • drms_create_records() or drms_clone_records()
  • drms_setkey_()
  • drms_segment_write()
  • drms_close_records()

9
Processing/Analysis
  • Warning against long running ( gt1 day)
    transaction
  • Drain on system resources
  • Vacuum can not deleting dead rows
  • Replication tool Slony-I can not start when there
    are some transactions open
  • No checkpoint available yet
  • Difficulty in committing in the middle of a
    module because SUs are not committed until the
    end of a session
  • Application needs to break up jobs into
    manageable pieces

10
A typical module
  • Select input
  • drms_open_records()
  • Processing/analysis
  • drms_getkey_()
  • drms_segment_read()
  • Write output
  • drms_create_records() or drms_clone_records()
  • drms_setkey_()
  • drms_segment_write()
  • drms_close_records()

11
Write output
  • Transient records intermediate results, removed
    at the end of a session
  • current implementation leaves dead rows in the
    series table
  • alternative CREATE TEMPORARY TABLE
  • No dead rows to vacuum
  • Added complexity for drms_open_records()
  • Modify series definition, e.g., add keywords, etc
  • Updatable records?
  • Currently a DRMS record can only be written into
    once.
  • To update such a record, one first makes a clone
    of it, then update the clone
  • The clone and the original share the same prime
    keys
  • Query APIs automatically pick up the latest
    version unless told otherwise
  • Would like to allow records to be updated within
    the same session
  • Drawback upset our vacuum plan
  • With insert only, DRMS tables requires
    minimum vacuum. Allowing records to be updatable
    even within the same session leaves dead rows
    behind, which makes DRMS tables candidates for
    vacuum

12
Remote DRMS and SUMS
  • Remote DRMS
  • Remote DB replicates a subset of HMI/AIA data
    series
  • Subscription based
  • Slony-I logging shipping
  • Minimum customization of DRMS code
  • Stanford may subscribe to series originated from
    remote sites
  • Remote SUMS
  • Use higher bits in SUID to identify SUMS sites
  • if local
  • SUM_get(SUID)
  • elsif cached
  • global SUID -gt local SUID SUM_get(local
    SUID)
  • else
  • fetch from remote SUMS asynchronously
    ingest into local SUMS

13
Export tools
  • DRMS interacts with SUMS
  • drms_open_records() does not stage SU's
  • Head-of-line blocking queue design
  • Bunch multiple SUM_get() requests into one
  • Need better polling mechanism
  • Export stage data only
  • SU staging task takes time, easily a few hours,
    resulting in long running transactions if run as
    modules
  • Need to provide a non-blocking alternative
    either through direct SUMS connection or allow
    staging option in drms_open_records()
Write a Comment
User Comments (0)
About PowerShow.com