Title: Using HDF5 tools for performance tuning and troubleshooting
1Using HDF5 tools for performance tuning and
troubleshooting
2Introduction
- HDF5 tools may be very useful for performance
tuning and troubleshooting - Discover objects and their properties in HDF5
files - h5dump -p
- Get file size overhead information
- h5stat
- Get locations of the objects in a file
- h5ls
- Discover differences
- h5diff, h5ls
- Location of raw data
- h5ls vra
3h5stat
- Prints different statistics about HDF5 file
- Helps
- To troubleshoot size overhead in HDF5 files
- To choose specific objects properties and
storage strategies - To use
- h5stat --help
- h5stat file.h5
- Spec can be found http//www.hdfgroup.org/RFC/h5st
at/ - Let us know if you need some special type of
statistics
4h5stat
- Reports two types of statistics
- High-level information about objects (examples)
- Number of different objects (groups, datasets,
datatypes) in a file - Number of unique datatypes
- Size of raw data in a file
- Information about objects structural metadata
- Sizes of structural metadata (total/free)
- Object headers, local and global heaps
- Sizes of B-trees
- Object headers fragmentation
5h5stat
- Examples of high-level information
- File information
- of unique groups 10008
- of unique datasets 30
- of unique named datatypes 0
-
- Max. of links to object 1
- Max. depth of hierarchy 4
- Max. of objects in group 19
-
- Group bins
- of groups of size 0 10000
- of groups of size 1 - 9 7
- of groups of size 10 - 99 1
-
- Max. dimension size of 1-D datasets 1643
-
6h5stat
- Conclusion
- There are a lot of empty groups in the file good
candidate for compact group feature - Some datasets use user-defined filters and may
not be readable by HDF5 library - SZIP compression is needed to read some datasets
-
Oh my application uses buffers of size 1024 to
read data No wonder it crashes on reading Do I
have all filters needed to read the data?
7h5stat
- Examples of structural metadata information
- Object header size (total/unused)
- Groups 1808/72
- Datasets 15792/832
-
- Dataset storage information
- Total raw data size 6140688
-
- Dataset datatype 3
- Count (total/named) (2/0)
- Size (desc./elmt) (10/65535)
- Dataset datatype 4
- Count (total/named) (1/0)
- Size (desc./elmt) (10/32000)
8h5stat
- Conclusions
- File size 6228197
- 1.5 overhead (not bad at all!)
- There some elements are of size 65535 and 32000
Oh Is it really what I want? Should I use other
datatype and get advantage of compression?
9Case study Using HDF5tools to debug a problem
- My applications creates files on Windows with
VS2005 and VS2003. I can read the VS2003 file but
not the VS2005 one. H5dump reads both files OK
and there are no differences. What am I doing
wrong? - h5diff good.h5 bad.h5
- Datatype lt/Definitions/timespecgt and
lt/Definitions/timespecgt 1 differences found - h5ls vr good.h5
- /Definitions/timespec Type
- Location 010900
- h5debug good.h5 900
- Message Information
- Type class
compound - Size 8
bytes - h5debug bad.h5 900
- Message Information
- Type class
compound - Size 16
bytes
10Case study Using HDF5tools to debug a problem
- Conclusions
- Compound datatype timespec requires different
number of bytes on VS2005 (16 bytes 2x8bytes)
and on VS2003 (8bytes 2x4bytes)
Oh How do I read my data back? I assumed that my
struct would need only 8 bytes for each elements
but it needs 16 bytes on VS2005. I need
H5Tget_native_type function to find the type of
my data in memory
11Where is my data?
- h5ls var be_data.h5
- Opened "be_data.h5" with sec2 driver.
- /Array Dataset 5/5, 6/6
- Location 010792
- Links 1
- Modified 2006-04-07 150839 CDT
- Storage 240 logical bytes, 240 allocated
bytes, 100.00 utilization - Type IEEE 64-bit big-endian float
- Address 2048
- 30 8-byte elements can be read from address 2048
by non-HDF5 application
12Questions? Comments?
? Thank you!