A Lightweight Histogram Interface Layer - PowerPoint PPT Presentation

About This Presentation
Title:

A Lightweight Histogram Interface Layer

Description:

They are no longer simply histogramming packages, but have added data analysis ... ntuples are trickier than histograms, as there are several different types ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 21
Provided by: charles62
Category:

less

Transcript and Presenter's Notes

Title: A Lightweight Histogram Interface Layer


1
A Lightweight Histogram Interface Layer
  • CHEP 2000
  • Session F (F320)
  • Thursday Feb 10 2000

2
Introduction
  • Current histogramming software packages, such as
    PAW, ROOT, JAS have enormous functionality.
  • They are no longer simply histogramming packages,
    but have added data analysis and visualization
    features.
  • The tight integration between these features has
    made it difficult to separate the statistical
    data gathering feature from the analysis and
    graphical presentation features.
  • This results in significant overheads, if only
    the histogramming aspect is needed.

3
Introduction (cont)
  • Many histogramming packages are wedded to a
    specific i/o format.
  • Very few translation programs exist to convert
    between various formats.
  • Makes it very hard to use analysis and
    visualization tools that are not part of the
    package used to generate the histogram.
  • Users have very little freedom to chose the
    package best suited to their needs, or the ones
    they are most familiar with.

4
Why an Interface Layer
  • Since it is format independent, and has no i/o
    (file or visual) requirements, it is not wedded
    to a specific part of the analysis procedure.
  • It can sit between components, such as between
    the data acquisition component and the analysis
    component, offering the ability to use various
    formats in different applications.

5
Design Requirements
  • Platform and i/o format independent
  • Lightweight - low overhead, minimal non-histogram
    features
  • Possibility to histogram any data type
  • Ability to use within an analysis schema, as an
    interface between different components, or as a
    standalone utility
  • Ability to use as a translator between various
    i/o formats
  • i/o formats user extensible
  • Easy implementation by user

6
Required Qualities of a Histogram
  • A collection of statistical data related to a
    particular process.
  • Should not contain any information unrelated to
    the statistical data, such as colour, fitting
    parameters, line width, cuts, etc.
  • Number of bins overflow/underflow
  • Bin edges
  • Entries per bin associated errors
  • Identification information, such as an ID or name
  • n3 2n

7
Minimal Set of Useful Methods
  • weighted entries
  • reset()
  • bin contents, errors, centers, edges
  • bin numbers lt-gt bin edges/centers
  • simple operations , , -
  • mean(), rms()
  • min(), max()
  • rebin(), resize()
  • change title

8
What Gets Histogrammed
  • Normally we used to histogram ints and floats.
  • What about entire objects?
  • To histogram an object, have to define which
    aspect of the object is used to order the
    histogram.
  • Can provide this ordering every time a histogram
    is filled, but nicer to associate an ordering
    mechanism with the histogram itself.
  • Define a function which provides this ordering,
    give pointer to histogram object.

9
Types of Histograms
  • BINNED
  • bin edges defined when created.
  • Either fixed or variable width
  • UNBINNED
  • only for very small data samples
  • can be converted to BINNED
  • AUTO-BINNED
  • starts off as UNBINNED, automatically converted
    to BINNED after a set number of entries.
  • Conversion routines calculate bin edges with
    either fixed width, or to maximize occupancy in
    each bin.

10
Use Overview
11
Internal Storage
  • If memory utilization is very tight, the user may
    want to limit the precision of the statistical
    data
  • User can chose between 4 and 8 byte internal
    record keeping
  • bin contents
  • bin errors
  • number of entries
  • number of equivalent entries

12
Memory Usage
  • Dynamic memory allocation is neat, but
    implementation (often) sucks. Will always be an
    overhead to using it.
  • Pre-allocate memory - fairly easy to do with a
    BINNED histogram.
  • Limit use of dynamic structures.
  • Only run into trouble if need to re-size or
    re-bin a histogram after its been created.
  • UNBINNED histograms can either pre-allocate
    memory, or dynamically allocate on the fly.
  • Total overhead per histogram 80 bytes.

13
Implementation Details
  • The requirement to be able to histogram objects
    has a serious implication - use of templates.
  • The histogram object becomes a templated object,
    with parameters the type of object to be
    histogrammed and the type of internal record
    keeping data
  • Histogramltobject type, (floatdouble)gt
  • For UNBINNED histograms, STL vectors are used if
    dynamic memory management is chosen.
  • Similar syntax for 2D histograms.

14
Usage
  • Simple histogram of floats, fixed bin width
  • Histogramltgt h1(-10.,10.,100)
  • h1.Fill(X)
  • Histogram of ints, variable bin width, double
    precision
  • Histogramltint,doublegt h2(Xedge)
  • Histogram of Muon object, automatically binned to
    maximize occupancy
  • float MuonQuantFunction(const Muon M)
  • HistogramltMuongt h3(AUTOBINNED)
  • h3.SetQuantFunction( MuonQuantFunction )

15
I/O
  • File manager class used to read and write
    histograms from/to disk in a variety of formats
  • Internal histograms are only converted to a
    particular format when they are written.
  • File manager can easily be extended to encompass
    new file formats.
  • Current formats
  • ASCII flat file
  • HBOOK
  • ROOT
  • XDR / DSL

16
Ntuples
  • ntuples are trickier than histograms, as there
    are several different types (column-wise vs.
    row-wise, ROOT trees, etc)
  • For the moment, have implemented them in the most
    trivial way arrays/vectors of structs.
  • struct S float E int np Muon M
  • ntupleltSgt nt
  • S.E ....
  • nt.Fill(S)
  • Simple accessor methods also provided.

17
Additional Functionality
  • Even though no complex functions are provided
    within the package, users may find it necessary
    to create them at needed.
  • Library functions can easily be added to provide
    user-specific histogram/ntuple operations.
  • For instance, if a user needs to perform a double
    gaussian fit to a histogram, it is very easy to
    add this function in an external library,
    declared as a friend.

18
Additions in the Pipeline
  • Ability to use shared memory
  • Extend i/o format to include JAS
  • Internal conversion to ROOT/HBOOK/JAS
  • Profile histograms
  • Further support for ntuples
  • Adhere to AIDA interface

19
Pipedreams
  • Create an adaptor to a memory resident histogram
    object to allow multi-format access.
  • Basic histogram object sits in memory, presents
    different representations of itself to various
    components - eg looks like an HBOOK histogram to
    minuit, a ROOT histogram to a ROOT specific
    process. If modifications are made to histogram
    by other applications, can re-synchronize and
    update itself.

20
Conclusions
  • Makes a clean break between statistical data
    gathering, and analysis and visualization tasks.
  • Enables histogramming of complex types.
  • Simple and small implementation that is well
    suited to memory restricted tasks, such as online
    data taking.
  • Provides the user with the freedom to chose a
    wide variety of different analysis and
    visualization tools.
  • Easily extensible, whether to new i/o formats or
    specific analysis functions.
Write a Comment
User Comments (0)
About PowerShow.com