Digital Library Development - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Digital Library Development

Description:

Digital Library Development. Reagan Moore. George Kremenek. Yifeng Cui * Jing Zhu. Yuanfang Hu ... Integration with Joanna Muench service oriented architecture ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 13
Provided by: jillan5
Category:

less

Transcript and Presenter's Notes

Title: Digital Library Development


1
Digital Library Development
  • Reagan Moore
  • George Kremenek
  • Yifeng Cui
  • Jing Zhu
  • Yuanfang Hu
  • Marcio Faerman
  • San Diego SuperComupter Center
  • July 2006 _at_ SCEC/CME AHM

2
Topics
  • TeraShake2 Runs Data
  • Overview
  • Detail of a typical production run
  • Data Archival Management
  • Run-time Data Management
  • Archival to SRB
  • Replication to HPSS
  • Data Access Management
  • Multiple access approaches to meet different
    needs
  • SRB commands
  • mySRB
  • Scenario-oriented Web Interfaces
  • Web access integrated into a single GridSphere
    portal

3
TeraShake2 Runs Data Overview
  • 13 large-scale production runs
  • - 3 in Wav Propagation Mode
  • - 10 in Dynamic Mode
  • 30 TS2..dyn 200m validation runs
  • - in preparation for 100m production run
  • TeraShake 2 Data Archival
  • on SRB
  • 14.4TB Online (disk)
  • 14.4T Offline (tape)
  • On HPSS
  • 28.8TB

4
Data Detail of a Typical TeraShake2 Production Run
  • TS2.2.wav 200m
  • TS2.2.dyn 100m

5
Data Archival Management
  • Challenge Solution
  • Ensure a smooth complete run
  • by avoiding disk space shortness
  • problem
  • Ensure the completeness and
  • correctness of data staging of
  • such huge amount and size
  • Ensure the reliability and integrity of
  • data archived for future access
  • Run-time Archival
  • MD5 Checksum
  • Scripts to check the progress/status of data
    archival, re-ingest data failed to archive
  • Workflow (in progress)

6
Run-time Data Archival/Transfer Management
  • Data are immediately archived to SRB/HPSS after
    generated, then removed from disk
  • used when disk space is tight
  • Transfer a partial run results among TeraGrid
    clusters for restart
  • - eg. TS2.3.dyn 100m run
  • first 7000 timesteps conducted on TG IA-64
    machine
  • Then migrated to DataStar for better performance
  • transfer Checkpoint files from TG IA-64 to
    DataStar for runs restart
  • transfer partial results from TG IA-64 to
    DataStar
  • Techniques Employed
  • tgcp, GridFTP, GPFS-WAN,
  • Delete former checkpoint files when new
    checkpoint files generated to save disk space for
    the run

7
Data Archival Management
  • Staging to both SRB and HPSS for dual backup
  • Different Strategies based on
  • Data Importance
  • Access Frequency
  • Many pre-processing, post-processing scripts to
    facilitate data transfer/archival

8
Digital Library Data Access Approaches
  • SRB Commands
  • - Unix-like commands for accessing SRB data
    and metadata
  • - command line utilities that run in the Unix,
    Windows or Mac OS command shells
  • Scenario-Oriented Web Interfaces
  • - seismogram display of each scenario run,
    more intuitive
  • MySRB
  • - https//srb.npaci.edu/mysrb331all.shtml
  • - a Web-based Browser and Query Tool for SRB
  • Service Oriented Query
  • - Integration with Joanna Muench service
    oriented architecture
  • to provide query on TeraShake Data collection
  • - Work In Progress
  • Integrated Web Access Environment
  • http//webwork.sdsc.edu8080/gridsphere
  • http//www.sdsc.edu/SCEC

9
Scenario-Oriented Interface Updates
  • TeraShake Interface
  • Generalization of services to apply to new
    simulations
  • Automatic Detection and Update of new Runs Data
    Registration
  • Mechanism configuration file stored in SRB
    (/home/sceclib.scec/TSRunsList.txt)
  • TS1.2/home/sceclib.scec/TeraShake1/3000x1500x400/
    TS1.2
  • TS1.3/home/sceclib.scec/TeraShake1/3000x1500x400/
    TS1.3
  • TS1.4/home/sceclib.scec/TeraShake1/3000x1500x400/
    TS1.4
  • TS2.1.wav/home/sceclib.scec/TeraShake2/TS2.1.wav
  • TS2.2.wav/home/sceclib.scec/TeraShake2/TS2.2.wav
  • TS2.3.wav/home/sceclib.scec/TeraShake2/TS2.3.wav
  • TeraShake SRB Collection manager may update the
    configuration file
  • Once a new run is registered and ready for
    display/access, or the SRB collection is
    reorganized
  • Web interface will automatically check the
    configuration file from SRB,
  • and update its list of available runs and
    display
  • Changes completely transparent to end users
  • User Defined MetaData Display
  • LAWeb Interface

10
Integration with GridSphere Technology
  • Motivation
  • Provide an integrated web portal for end users
    access
  • Single Login, access multiple resources
  • Highly self-configurable
  • Have built a GridSphere Portlet which provides
    hyperlink to the two scenario-oriented web
    interfaces
  • Future Work in consideration
  • Re-implement scenario-oriented web interfaces
    directly as GridSphere portlets
  • Integrate access to MySRB

11
Integration with Joanna Muenchs service oriented
architecture Work in Progress
  • Goal
  • Build a web interface for metadata-based query
    over TeraShake data SRB collection
  • system metadata
  • user defined metadata

12
Summary
  • SDSC provides persistent, reliable storage and
    maintenance for SCEC simulations data
  • 168TB storage, 3.5 million files
  • Various data access approaches provided for end
    users preference and different needs
  • Technology Improvement and Upgrade are completely
    transparent to end users
  • SRBs update
  • HPSSs update
  • Web technology improvement
Write a Comment
User Comments (0)
About PowerShow.com