Title: Prototyping a virtual filesystem for storing and processing petascale neural circuit datasets.
1Prototyping a virtual filesystem for storing and
processing petascale neural circuit datasets.
Art Wetzel, Greg Hood and Markus
Dittrich National Resource for Biomedical
Supercomputing Pittsburgh Supercomputing
Center awetzel_at_psc.edu 412-268-3912 www.psc.edu
and www.nrbsc.org
R. Clay Reid, Jeff Lichtman, Wei-Chung Allen
Lee Harvard Medical School, Allen Institute
for Brain Science Center for Brain Science,
Harvard University Davi Bock HMMI Janelia Farm
David Hall and Scott Emmons
Albert Einstein College of Medicine
Jan 11, 2012 Connectomics Data Project Overview
2Reconstructing brain circuits requires high
resolution electron microscopy over long
distances BIGDATA
Vesicles 30 nm diam.
A synaptic junction gt500 nm wide with cleft gap
20 nm
www.coolschool.ca/lor/BI12/unit12/U12L04.htm
Dendritic spine
Recent ICs have 32nm features 22nm chips are
being delivered.
Dendrite
Gate oxide 1.2nm thick
3A10 Tvoxel dataset aligned by our groupwas an
essential part of the March 2011 Nature paper
with Davi Bock, Clay Reid and Harvard
colleaguesNow we are working ontwo datasets of
100TB each and expect to reach PBs in 2-3 years.
4The CS project is to implement and test a
prototype virtual filesystem to address common
problems associated with neuralcircuit and other
massive datasets.
- The most important aim is reducing unwanted data
duplication as raw data are preprocessed for
final analysis. The virtual filesystem addresses
this by replacing redundant storage by on-the-fly
computing. - The second aim is to provide a convenient
framework for efficient on-the-fly computation on
multidimensional datasets within high performance
parallel computing environments using both CPU
and GPGPU processing. - The Filesystem in User Space mechanism (FUSE)
provides a convenient implementation basis that
will work across a variety of systems. There are
many existing FUSE codes that serve as useful
examples.
5We would eventually like to have a flexible
software framework that allows a combination of
common prewritten and user written application
codes to operate together and take advantage of
parallel CPU and GPGPU technologies.
6Multidimensional data structures to provide
efficient random and sequential access analogous
to the 1D representations provided by standard
filesystems will be part of this work.
Students working on this project will have access
to a parallel cluster which holds our large
datasets along with the compilers and other tools
required. Minimal end-to-end functionality with
simple linear transforms can likely be achieved
in about 8 weeks and then extended as time
permits. Please contact Art Wetzel if there are
further questions awetzel_at_psc.edu.