Input/Output (io)

Introduction

ctapipe.io contains functions and classes related to reading, writing, and in-memory storage of event data

Reading Event Data

This module provides a set of event sources that are python generators that loop through an input file or stream and fill in ctapipe.core.Container classes, defined below. They are designed such that ctapipe can be independent of the file format used for event data, and new formats may be supported by simply adding a plug-in.

The underlying mechanism is a set of EventSource sub-classes that read data in various formats, with a common interface and automatic command-line configuration parameters. These are generally constructed in a generic way by using EventSource(file_or_url) which will construct the appropriate EventSource subclass based on the input file’s type.

The resulting EventSource then works like a python collection and can be looped over, providing data for each subsequent event. If looped over multiple times, each will start at the beginning of the file (except in the case of streams that cannot be restarted):

with EventSource(input_url="file.simtel.gz") as source:
    for event in source:
       do_something_with_event(event)

If you need random access to events rather than looping over all events in order, you can use the EventSeeker class to allow random access by event index or event_id. This may not be efficient for some EventSources if the underlying file type does not support random access.

Creating a New EventSource Plugin

An example can be found in:

https://github.com/cta-observatory/ctapipe_io_sst1m

Container Classes

Event data that is intended to be read or written from files is stored in subclasses of ctapipe.core.Container, the structre of which is defined in the containers module (See reference API below). Each element in the container is a ctapipe.core.Field, containing the default value, a description, and default unit if necessary. The following rules should be followed when creating a Container for new data:

  • Containers both provide a way to exchange data (in-memory) between parts of a code, as well as define the schema for any file output files to be written.

  • All items in a Container should be expected to be updated at the same frequency. Think of a Container as the column definitions of a table, therefore representing a single row in a table. For example, if the container has event-by-event info, it should not have an item in it that does not change between events (that should be in another container), otherwise it will be written out for each event and will waste space.

  • a Container should be filled in all at once, not at different times during the data processing (to allow for parallelization and to avoid difficulty in reading code).

  • Containers may contain a dictionary of metadata (in their meta dictionary), that will become headers in any output file (this data must not change per-event, etc)

  • Algorithms should not update values in a container that have already been filled in by another algorithm. Instead, prefer a new data item, or a second copy of the Container with the updated values.

  • Fields in a container should be one of the following:

  • scalar values (int, float, bool)

  • numpy.NDarray if the data are not scalar (use only simple dtypes that can be written to output files)

  • a ctapipe.core.Container class (in the case a hierarchy is needed)

  • a ctapipe.core.Map of ctapipe.core.Container or scalar values, if the hierarchy needs multiple copies of the same Container, organized by some variable-length index (e.g. by tel_id or algorithm name)

  • Fields that should not be in a container class:

  • dicts

  • classes that are not a subclass of ctapipe.core.Container

  • any other type that cannot be translated automatically into the column of an output table.

Serialization of Containers:

The ctapipe.io.TableWriter and ctapipe.io.TableReader base classes provide an interface to implement subclasses that write/read Containers to/from table-like data files. Currently the only implementation is for writing HDF5 tables via the ctapipe.io.HDF5TableWriter. The output that the ctapipe.io.HDF5TableWriter produces can be read either one-row-at-a-time using the ctapipe.io.HDF5TableReader, or more generically using the pytables or pandas packages (note however any tables that have array values in a column cannot be read into a pandas.DataFrame, since it only supports scalar values).

Writing Output Files:

The DL1Writer Component allows one to write a series of events (stored in ctapipe.containers.ArrayEventContainer) to a standardized HDF5 format DL1 file following the DL1 data model. This includes all related datasets such as the instrument and simulation configuration information, simulated shower and image information, and observed images and parameters. It can be used in an event loop like:

with DL1Writer(event_source=source, output_path="events.dl1.h5") as write_dl1:
    for event in source:
        calibrate(event)
        write_dl1(event)

Reading Output Tables:

In addition to using an EventSource to read R0-DL1 data files, one can also access full tables for files that are in HDF5 format (e.g. DL1 files). The read_table function will load any table in an HDF5 table into an astropy.table.QTable in memory, while maintaining units, column descriptions, and other ctapipe metadata. Astropy Tables can also be converted to Pandas tables via their to_pandas() method, as long as the table does not contain any vector columns.

from ctapipe.io import read_table
mctable = read_table("events.dl1.h5", "/simulation/event/subarray/shower")
mctable['logE'] = np.log10(mc_table['energy'])
mctable.write("output.fits")

Standard Metadata Headers

The ctapipe.io.metadata package provides functions for generating standard CTA metadata headers and attaching them to output files.

Reference/API

ctapipe.io Package

Functions

get_array_layout(instrument_name)

Returns the array layout for the given instrument as an astropy.table.Table object.

read_table(h5file, path)

Get a table from a ctapipe-format HDF5 table as an astropy.table.QTable object, retaining units.

Classes

HDF5TableWriter(filename[, group_name, …])

A very basic table writer that can take a container (or more than one) and write it to an HDF5 file.

HDF5TableReader(filename, **kwargs)

Reader that reads a single row of an HDF5 table at once into a Container.

TableWriter([parent, add_prefix])

Base class for writing Container classes as rows of an output table, where each Field becomes a column.

TableReader()

Base class for row-wise table readers.

EventSeeker(event_source[, config, parent])

Provides the functionality to seek through a ctapipe.io.eventfilereader.EventSource to find a particular event.

EventSource([input_url, config, parent])

Parent class for EventSources.

SimTelEventSource([input_url, config, parent])

Read events from a SimTelArray data file (in EventIO format).

DL1EventSource([input_url, config, parent])

Event source for files in the ctapipe DL1 format.

DataLevel

Enum of the different Data Levels

DL1Writer(event_source[, config, parent])

Serialize a sequence of events into a HDF5 DL1 file, in the correct format

Class Inheritance Diagram

Inheritance diagram of ctapipe.io.hdf5tableio.HDF5TableWriter, ctapipe.io.hdf5tableio.HDF5TableReader, ctapipe.io.tableio.TableWriter, ctapipe.io.tableio.TableReader, ctapipe.io.eventseeker.EventSeeker, ctapipe.io.eventsource.EventSource, ctapipe.io.simteleventsource.SimTelEventSource, ctapipe.io.dl1eventsource.DL1EventSource, ctapipe.io.datalevels.DataLevel, ctapipe.io.dl1writer.DL1Writer

ctapipe.io.tableio Module

Classes

TableReader()

Base class for row-wise table readers.

TableWriter([parent, add_prefix])

Base class for writing Container classes as rows of an output table, where each Field becomes a column.

Class Inheritance Diagram

Inheritance diagram of ctapipe.io.tableio.TableReader, ctapipe.io.tableio.TableWriter

ctapipe.io.hdf5tableio Module

Implementations of TableWriter and -Reader for HDF5 files

Classes

HDF5TableWriter(filename[, group_name, …])

A very basic table writer that can take a container (or more than one) and write it to an HDF5 file.

HDF5TableReader(filename, **kwargs)

Reader that reads a single row of an HDF5 table at once into a Container.

Class Inheritance Diagram

Inheritance diagram of ctapipe.io.hdf5tableio.HDF5TableWriter, ctapipe.io.hdf5tableio.HDF5TableReader

ctapipe.io.metadata Module

Management of CTA Reference Metadata, as defined in the CTA Top-Level Data Model document [ctatopleveldatamodel] , version 1A. This information is required to be attached to the header of any files generated.

The class Reference collects all required reference metadata, and can be turned into a flat dictionary. The user should try to fill out all fields, or use a helper to fill them (as in Activity.from_provenance())

ref = Reference(
    contact=Contact(name="Some User", email="user@me.com"),
    product=Product(format='hdf5', ...),
    process=Process(...),
    activity=Activity(...),
    instrument = Instrument(...)
)

some_astropy_table.meta = ref.to_dict()
some_astropy_table.write("output.ecsv")

Functions

write_to_hdf5(metadata, h5file)

Write metadata fields to a PyTables HDF5 file handle.

Classes

Reference(*args, **kwargs)

All the reference Metadata required for a CTA output file, plus a way to turn it into a dict() for easy addition to the header of a file

Contact(*args, **kwargs)

Contact information

Process(*args, **kwargs)

Process (top-level workflow) information

Product(*args, **kwargs)

Data product information

Activity(*args, **kwargs)

Activity (tool) information

Instrument(*args, **kwargs)

Instrumental Context

Class Inheritance Diagram

Inheritance diagram of ctapipe.io.metadata.Reference, ctapipe.io.metadata.Contact, ctapipe.io.metadata.Process, ctapipe.io.metadata.Product, ctapipe.io.metadata.Activity, ctapipe.io.metadata.Instrument