High Performance I/O and Data Management System Group Seminar - PowerPoint PPT Presentation

About This Presentation
Title:

High Performance I/O and Data Management System Group Seminar

Description:

Photography, movies, tennis ... 4. High-Performance Computing ... Panda Parallel I/O Library. University of Illinois. Client-server architecture ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 22
Provided by: tin77
Category:

less

Transcript and Presenter's Notes

Title: High Performance I/O and Data Management System Group Seminar


1
High Performance I/O and Data ManagementSystem
Group Seminar
  • Xiaosong Ma
  • Department of Computer Science
  • North Carolina State University
  • September 12, 2003

2
Roadmap
  • Introduction
  • Research area description
  • Past research
  • Future research directions

3
About Myself
  • Xiaosong Ma
  • Pronunciation Shiao-song
  • Homepage through the faculty directory
  • Brief bio
  • B.S., Peking University, China
  • Ph.D., UIUC
  • Hobbies
  • Traveling
  • Food
  • Photography, movies, tennis

4
High-Performance Computing
  • Enabled by increasing computational power
  • Scientific computation
  • Parallel data mining
  • Web data processing
  • High-performance computing in daily life
  • Weather forecast
  • Web crawling and web search
  • Games, movie graphics, virtual reality

5
Past Research
  • I/O performance optimization for parallel
    applications
  • High-level buffering and prefetching techniques
  • Hiding the I/O cost
  • Utilizes idle resources for maximizing inter-task
    parallelism
  • Lightweight database support for visualization
    applications
  • Making optimizations portable and adaptive

6
Parallel I/O in Scientific Simulations
  • Write-intensive
  • Collective and periodic
  • Bottleneck-prone
  • Poor stepchild
  • Traditional collective I/O focused on data
    transfer


Computation
I/O
Computation
I/O
Computation
I/O
Computation

7
Active Buffering
  • Hides periodic I/O costs behind computation
    phases IPDPS 02, ICS 02, IPDPS 03
  • Organizes idle memory resources into buffer
    hierarchy
  • Controlled by state machines
  • Flexible regarding buffer space availability
  • Adapts to applications output pattern
  • Flexible software architecture

8
AB vs. Asynchronous I/O
9
Deployment of Active Buffering
  • Panda Parallel I/O Library
  • University of Illinois
  • Client-server architecture
  • ROMIO Parallel I/O Library
  • Argonne National Lab
  • Popular MPI-IO implementation, included in MPICH
  • Server-less architecture
  • ABT (Active Buffering with Threads)

10
Sample Execution with ABT
comp. phase 1
Data reorganization and buffering
comp. phase 2
time
Data reorganization and buffering
I/O phase 2
comp. phase 3
Data reorganization and buffering
I/O phase 3
comp. phase 4
11
I/O in Visualization
  • Periodic reads
  • Dual modes of operation
  • Interactive
  • Batch-mode
  • Harder to overlap I/O with computation


Computation
I/O
Computation
I/O
Computation
I/O
Computation
12
Lightweight Data Management
  • Process large number of datasets
  • Scientific data are structured
  • Conventional DBMS rarely used in parallel
    scientific codes
  • GODIVA framework ICDE 04
  • General Object Data Interface for Visualization
    Applications
  • In-memory database managing data buffer locations
  • Relational database-like interfaces
  • Developer controllable prefetching and caching
  • Developer-supplied read functions

13
GODIVA Architecture
14
Sample Record Instance
  • Sample query
  • Where is the temperature array holding block_0003
    at time-step 0.000075 in a fluid record?

15
Prefetching and Caching
  • process unit
  • readUnit
  • addUnit and waitUnit
  • finishUnit and deleteUnit
  • // add all units.
  • addUnit("fluid_file1", read_file)
  • addUnit("fluid_file2", read_file)
  • // process array records in fluid_file1
  • waitUnit("fluid_file1")
  • do_visualization_computation("fluid_file1")
  • deleteUnit("fluid_file1")
  • // process array records in fluid_file2
  • waitUnit("fluid_file2")
  • do_visualization_computation("fluid_file2")
  • deleteUnit("fluid_file2")

16
Voyager on a Single-processor Workstation
17
Voyager on a Dual-processor Cluster node
18
Future work I/O Performance Prediction
  • Objective to predict the I/O time for
    high-performance applications
  • Challenge lack of information in the Grid
    environment
  • Knowledge on applications or systems not
    available
  • Hard to simulate real applications in real
    environments
  • Hard to predict scalability
  • How do we parameterize an application?

19
Future work Sci. Data Management
  • Objective to manage data in scientific
    applications effectively and efficiently
  • Challenge two research world not well connected
  • Conventional databases not suitable for HPC
  • Scientific databases designed for specific
    applications
  • General approach? Need to handle storage and I/O
    for different types of datasets and their
    distribution

20
Summary
  • Wide area of potential research
  • Parallel computing
  • Databases
  • Operating systems/storage systems
  • Many open problems and new challenges

21
References
  • ICDE 04 Xiaosong Ma, Marianne Winslett, John
    Norris, Xiangmin Jiao and Robert Fiedler, GODIVA
    Lightweight Data Management for Scientific
    Visualization, the 20th International Conference
    on Data Engineering, 2004
  • PhD Thesis Xiaosong Ma, Hiding Periodic I/O
    Costs for Parallel Applications, PhD thesis,
    University of Illinois, 2003
  • IPDPS 03 Xiaosong Ma, Marianne Winslett,
    Jonghyun Lee and Shengke Yu, Improving MPI-IO
    Output Performance with Active Buffering Plus
    Threads, 2003 International Parallel and
    Distributed Processing Symposium
  • PDSECA 03 Xiaosong Ma, Xiangmin Jiao, Michael
    Campbell and Marianne Winslett, Flexible and
    Efficient Parallel I/O for Large-Scale
    Multi-component Simulations, The 4th Workshop on
    Parallel and Distributed Scientific and
    Engineering Computing with Applications
  • ICS 02 Jonghyun Lee, Xiaosong Ma, Marianne
    Winslett and Shengke Yu, Active Buffering Plus
    Compressed Migration An Integrated Solution to
    Parallel Simulations' Data Transport Needs, the
    16th ACM International Conference on
    Supercomputing
  • IPDPS 02 Xiaosong Ma, Marianne Winslett,
    Jonghyun Lee and Shengke Yu, Faster Collective
    Output through Active Buffering, 2002
    International Parallel and Distributed Processing
    Symposium
Write a Comment
User Comments (0)
About PowerShow.com