FreeLoader: Lightweight Data Management for Scientific Visualization - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

FreeLoader: Lightweight Data Management for Scientific Visualization

Description:

new() free() get() put() (A) services: Dataset creation/deletion. Space ... Gig-E card. Linux 2.4.20-8. Benefactors. Group of heterogeneous Linux workstations ... – PowerPoint PPT presentation

Number of Views:202
Avg rating:3.0/5.0
Slides: 27
Provided by: tin9
Learn more at: http://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: FreeLoader: Lightweight Data Management for Scientific Visualization


1
FreeLoader Lightweight Data Management for
Scientific Visualization
  • Vincent Freeh1
  • Xiaosong Ma1,2
  • Nandan Tammineedi1
  • Jonathan Strickland1
  • Sudharshan Vazhkudai2
  • 1. North Carolina State University
  • 2. Oak Ridge National Laboratory
  • September, 2004

2
Roadmap
  • Motivation
  • FreeLoader architecture
  • Initial design and optimization
  • Preliminary results
  • In-progress and future work

3
Motivation Data Avalanche
  • More data to process
  • Science, industry, government
  • Example scientific data
  • Better observational instruments
  • Better experimental instruments
  • More simulation power

PE Gene Sequencer From http//www.genome.uci.edu/
Space Telescope
(Picture courtesy Jim Gray, SLAC Data Management
Workshop)
4
Motivation Needs for Remote Data
Data acquisition, reduction, analysis,
visualization, storage
Data Acquisition System
Remote users with local computing and storage
High Speed Network
raw data
Remote users
Metadata
Local users
Remote storage
Supercomputers
5
Motivation Remote Data Sources
  • Supercomputing centers
  • Shared file systems
  • Archiving systems
  • Data centers
  • Internet
  • World Wide Telescope Virtual Observatory
  • NCBI bio databases
  • Tools used in access
  • FTP, GridFTP
  • Grid file systems
  • Customized data migration program
  • Web browser

6
Motivation Insufficient Local Storage
  • End user consumes data locally
  • Convenience and control
  • Better CPU/memory configurations
  • Problem 1 needs local space to hold data
  • Problem 2 getting data from remote sources is
    slow
  • Dataset characteristics
  • Write-once, read-many (or a few)
  • Raw data often discarded
  • Shared interest to same data among groups
  • Primary copy archived somewhere

7
Condor for Storage?
  • Harnessing storage resources of individual
    workstations Harnessing idle CPU cycles

8
Why would it work, and work well?
  • Average workstations have more and more GBs
  • And half of the space is idle!
  • Even a modest contribution (Contribution Available) can amass collective, staggering
    aggregate storage!
  • Increasing numbers of workstations are online
    most of the time desk-top grid research
  • Access locality, aggregate I/O and network
    bandwidth, data sharing

9
Use Cases
  • FreeLoader storage cloud as a
  • Cache
  • Local, client-side scratch
  • Intermediate hop
  • Grid replica
  • RAS for Terascale Supercomputers

10
Related Work and Design Issues
  • Design Issues Assumptions
  • Scalability O(100) or O(1000)
  • Commodity Components
  • User Autonomy
  • Security and trust
  • Heterogeneity
  • Large, write once read many datasets
  • Transparent
  • Naming
  • Grid Aware
  • Related Work
  • Network/Distributed File Systems (NFS, LOCUS)
  • Parallel File Systems (PVFS, XFS)
  • Serverless File Systems (FARSITE, xFS, GFS)
  • Peer-to-Peer Storage (OceanStore, PAST, CFS)
  • Grid Storage Services (LegionFS, SRB, IBP, SRM,
    GASS)

11
Intended Role of FreeLoader
  • What the scavenged storage is not
  • Not a replacement to high-end storage
  • Not a file system
  • Not intended for integrating resources at
    wide-area scale
  • What it is
  • Low-cost, best-effort alternative to scientific
    data sources
  • Intended to facilitate
  • transient access to large, read-only datasets
  • data sharing within administrative domain
  • To be used in conjunction with higher-end storage
    systems

12
FreeLoader Architecture
13
Storage Layer
  • Benefactors
  • Morsels as a unit of contribution
  • Basic morsel operations new(), free(), get(),
    put()
  • Space Reclaim
  • User withdrawal / space shrinkage
  • Data Integrity through checksums
  • Performance history
  • Pools
  • Benefactor registrations (soft state)
  • Dataset distributions
  • Metadata
  • Selection heuristics

14
Management Layer
  • Manager
  • Pool registrations
  • Metadata datasets-to-pools pools-to-benefactors,
    etc.
  • Availability
  • Redundant Array of Replicated Morsels
  • Minimum replication factor for morsels
  • Where to replicate?
  • Which morsel replica to choose?
  • Grid Awareness
  • Information Providers
  • Space reservations
  • Transfer protocols
  • Transparent Access
  • Namespace

15
Dataset Striping
  • Stripe datasets across benefactors
  • Morsel doubles as basic unit of striping
  • Multiple-fold benefits
  • Higher aggregate access bandwidth
  • Better resource usage
  • Lowering impact per benefactor
  • Tradeoff between access rates and availability
  • Need to consider
  • Heterogeneity, network connections
  • Working together with replication
  • Serving partial datasets

16
Current Status
reserve() cancel() store() retrieve()
delete() open()
close() read() write()
  • (A) services
  • Dataset creation/deletion
  • Space reservation
  • (B) services
  • Dataset retrieval
  • Hints
  • (C) services
  • Registration
  • Benefactor alerts, warnings, alarms to manager
  • (D) services
  • Dataset store
  • Morsel request

Application
I/O interface
Client
UDP (A)
UDP/TCP (B)
Manager
UDP (C)
UDP/TCP (D)
new()
free()
get()
put()
Benefactor
Benefactor
OS
OS
Simple data striping
17
Preliminary Results Experiment Setup
  • FreeLoader prototype running at ORNL
  • Client Box
  • AMD Athlon 700MHz
  • 400MB memory
  • Gig-E card
  • Linux 2.4.20-8
  • Benefactors
  • Group of heterogeneous Linux workstations
  • Contributing 7GB-30GB each
  • 100Mb cards

18
Sample Data Sources
  • Local GPFS
  • Attached to ORNL SPs
  • Accessed through GridFTP
  • 1MB TCP buffer, 4 parallel streams
  • Local HPSS
  • Accessed through HSI client, highly optimized
  • Hot data in disk cache without tape unloading
  • Cold data purged, retrieval done in large
    intervals
  • Remote NFS
  • At NCSU HPC center
  • Accessed through GridFTP
  • 1MB TCP buffer, 4 parallel streams

19
FreeLoader Data Retrieval Performance
Throughput (MB/s)
20
Impact Tests
  • How uncomfortable donors may feel?
  • A set of tests at NCSU
  • Benefactor performing local tasks
  • Client retrieving datasets at a given rate

21
CPU-intensive Task
Time (s)
22
Network-intensive Task
Normalized Download Time
23
Disk-intensive Task
Throughput (MB/s)
24
Mixed Task Linux Kernel Compilation
Time (s)
25
In-progress and Future Work
  • In-progress
  • APIs for use as scratch space
  • Windows support
  • Future
  • Complete pool structure, registration
  • Intelligent data distribution, service profiling
  • Benefactor impact control, self-configuration
  • Naming and replication
  • Grid awareness
  • Potential extensions
  • Harnessing local storage at cluster nodes?
  • Complementing commercial storage servers?

26
Further Information
  • http//www.csm.ornl.gov/vazhkuda/Morsels/
Write a Comment
User Comments (0)
About PowerShow.com