Writing Containers to a tabular format

The TableWriter/TableReader sub-classes allow you to write a ctapipe.core.Container class and its meta-data to an output table. They treat the Fields in the Container as columns in the output, and automatically generate a schema. Here we will go through an example of writing out data and reading it back with Pandas, PyTables, and a ctapipe.io.TableReader:

In this example, we will use the HDF5TableWriter, which writes to HDF5 datasets using PyTables. Currently this is the only implemented TableWriter.

Caveats to think about: * vector columns in Containers can be written, but some lilbraries like Pandas can not read those (so you must use pytables or astropy to read outputs that have vector columns) * units are stored in the table metadata, but some libraries like Pandas ignore them and all other metadata

Create some example Containers

[1]:
from ctapipe.io import HDF5TableWriter
from ctapipe.core import Container, Field
from astropy import units as u
import numpy as np
/usr/local/lib/python3.8/site-packages/setuptools_scm/git.py:68: UserWarning: "/github/workspace" is shallow and may cause errors
  warnings.warn('"{}" is shallow and may cause errors'.format(wd.path))
[2]:
class VariousTypesContainer(Container):

    a_int = Field(int, 'some int value')
    a_float = Field(float, 'some float value with a unit', unit=u.m)
    a_bool = Field(bool, 'some bool value')
    a_np_int = Field(np.int, 'a numpy int')
    a_np_float = Field(np.float, 'a numpy float')
    a_np_bool = Field(np.bool, 'np.bool')

let’s also make a dummy stream (generator) that will create a series of these containers

[3]:
def create_stream(n_event):

    data = VariousTypesContainer()
    for i in range(n_event):

        data.a_int = int(i)
        data.a_float = float(i) * u.cm # note unit conversion will happen
        data.a_bool = (i % 2) == 0
        data.a_np_int = np.int(i)
        data.a_np_float = np.float(i)
        data.a_np_bool = np.bool((i % 2) == 0)

        yield data
[4]:
for data in create_stream(2):

    for key, val in data.items():

        print('{}: {}, type : {}'.format(key, val, type(val)))
a_int: 0, type : <class 'int'>
a_float: 0.0 cm, type : <class 'astropy.units.quantity.Quantity'>
a_bool: True, type : <class 'bool'>
a_np_int: 0, type : <class 'int'>
a_np_float: 0.0, type : <class 'float'>
a_np_bool: True, type : <class 'bool'>
a_int: 1, type : <class 'int'>
a_float: 1.0 cm, type : <class 'astropy.units.quantity.Quantity'>
a_bool: False, type : <class 'bool'>
a_np_int: 1, type : <class 'int'>
a_np_float: 1.0, type : <class 'float'>
a_np_bool: False, type : <class 'bool'>

Writing the Data (and good practices)

How not to do it:

[5]:
h5_table = HDF5TableWriter('container.h5', group_name='data')

for data in create_stream(10):

    h5_table.write('table', data)

h5_table.close()

In that case the file is not garenteed to close properly for instance if one does a mistake in the for loop. Let’s just add a stupid mistake and see what happens.

[6]:
try:
    h5_table = HDF5TableWriter('container.h5', group_name='data')

    for data in create_stream(10):

        h5_table.write('table', data)
        0/0  # cause an error

    h5_table.close()
except Exception as err:
    print("FAILED!", err)
FAILED! division by zero

Now the file did not close properly. So let’s try to correct the mistake and execute the code again.

[7]:
try:
    h5_table = HDF5TableWriter('container.h5', group_name='data')

    for data in create_stream(10):

        h5_table.write('table', data)
        0/0  # cause an error
    h5_table.close()
except Exception as err:
    print("FAILED!", err)
FAILED! The file 'container.h5' is already opened.  Please close it before reopening in write mode.

Ah it seems that the file did not close! Now I am stuck. Maybe I should restart the kernel? ahh no I don’t want to loose everything. Can I just close it ?

[8]:
h5_table.close()

It worked!

Better to use context management!

[9]:
try:
    with HDF5TableWriter('container.h5', group_name='data') as h5_table:

        for data in create_stream(10):

            h5_table.write('table', data)
            0/0
except Exception as err:
    print("FAILED:", err)
print('Done')
FAILED: division by zero
Done
[10]:
!ls container.h5
container.h5

Appending new Containers

To append some new containers we need to set the writing in append mode by using: ‘mode=a’. But let’s now first look at what happens if we don’t.

[11]:
for i in range(2):

    with HDF5TableWriter('container.h5', mode='w', group_name='data_{}'.format(i)) as h5_table:

        for data in create_stream(10):

            h5_table.write('table', data)

        print(h5_table._h5file)
container.h5 (File) ''
Last modif.: 'Tue Dec  1 09:30:28 2020'
Object Tree:
/ (RootGroup) ''
/data_0 (Group) ''
/data_0/table (Table(0,), fletcher32, shuffle, blosc:zstd(5)) 'Storage of VariousTypesContainer'

container.h5 (File) ''
Last modif.: 'Tue Dec  1 09:30:28 2020'
Object Tree:
/ (RootGroup) ''
/data_1 (Group) ''
/data_1/table (Table(0,), fletcher32, shuffle, blosc:zstd(5)) 'Storage of VariousTypesContainer'

[12]:
!rm -f container.h5

Ok so the writer destroyed the content of the file each time it opens the file. Now let’s try to append some data group to it! (using mode=‘a’)

[13]:
for i in range(2):

    with HDF5TableWriter('container.h5', mode='a', group_name='data_{}'.format(i)) as h5_table:

        for data in create_stream(10):

            h5_table.write('table', data)

        print(h5_table._h5file)
container.h5 (File) ''
Last modif.: 'Tue Dec  1 09:30:29 2020'
Object Tree:
/ (RootGroup) ''
/data_0 (Group) ''
/data_0/table (Table(0,), fletcher32, shuffle, blosc:zstd(5)) 'Storage of VariousTypesContainer'

container.h5 (File) ''
Last modif.: 'Tue Dec  1 09:30:29 2020'
Object Tree:
/ (RootGroup) ''
/data_0 (Group) ''
/data_0/table (Table(10,), fletcher32, shuffle, blosc:zstd(5)) 'Storage of VariousTypesContainer'
/data_1 (Group) ''
/data_1/table (Table(0,), fletcher32, shuffle, blosc:zstd(5)) 'Storage of VariousTypesContainer'

So we can append some data groups. As long as the data group_name does not already exists. Let’s try to overwrite the data group : data_1

[14]:
try:
    with HDF5TableWriter('container.h5', mode='a', group_name='data_1') as h5_table:
        for data in create_stream(10):
            h5_table.write('table', data)
except Exception as err:
    print("Failed as expected:", err)
Failed as expected: group ``/data_1`` already has a child node named ``table``

Good ! I cannot overwrite my data.

[15]:
print(bool(h5_table._h5file.isopen))
False

Reading the Data

Reading the whole table at once:

For this, you have several choices. Since we used the HDF5TableWriter in this example, we have at least these options avilable:

  • Pandas

  • PyTables

  • Astropy Table

For other TableWriter implementations, others may be possible (depending on format)

Reading with Pandas:

Pandas is a convenient way to read the output. HOWEVER BE WARNED that so far Pandas does not support reading the table meta-data or units for colums, so that information is lost!

[16]:
import pandas as pd

data = pd.read_hdf('container.h5', key='/data_0/table')
data.head()
[16]:
a_int a_float a_bool a_np_int a_np_float a_np_bool
0 0 0.00 True 0 0.0 True
1 1 0.01 False 1 1.0 False
2 2 0.02 True 2 2.0 True
3 3 0.03 False 3 3.0 False
4 4 0.04 True 4 4.0 True

Reading with PyTables

[17]:
import tables
h5  = tables.open_file('container.h5')
table = h5.root['data_0']['table']
table
[17]:
/data_0/table (Table(10,), fletcher32, shuffle, blosc:zstd(5)) 'Storage of VariousTypesContainer'
  description := {
  "a_int": Int64Col(shape=(), dflt=0, pos=0),
  "a_float": Float64Col(shape=(), dflt=0.0, pos=1),
  "a_bool": BoolCol(shape=(), dflt=False, pos=2),
  "a_np_int": Int64Col(shape=(), dflt=0, pos=3),
  "a_np_float": Float64Col(shape=(), dflt=0.0, pos=4),
  "a_np_bool": BoolCol(shape=(), dflt=False, pos=5)}
  byteorder := 'little'
  chunkshape := (1927,)

note that here we can still access the metadata

[18]:
table.attrs
[18]:
/data_0/table._v_attrs (AttributeSet), 24 attributes:
   [CLASS := 'TABLE',
    CTAPIPE_VERSION := '0.1.dev1+gddc003f',
    FIELD_0_FILL := 0,
    FIELD_0_NAME := 'a_int',
    FIELD_1_FILL := 0.0,
    FIELD_1_NAME := 'a_float',
    FIELD_2_FILL := False,
    FIELD_2_NAME := 'a_bool',
    FIELD_3_FILL := 0,
    FIELD_3_NAME := 'a_np_int',
    FIELD_4_FILL := 0.0,
    FIELD_4_NAME := 'a_np_float',
    FIELD_5_FILL := False,
    FIELD_5_NAME := 'a_np_bool',
    NROWS := 10,
    TITLE := 'Storage of VariousTypesContainer',
    VERSION := '2.7',
    a_bool_DESC := 'some bool value',
    a_float_DESC := 'some float value with a unit',
    a_float_UNIT := 'm',
    a_int_DESC := 'some int value',
    a_np_bool_DESC := 'np.bool',
    a_np_float_DESC := 'a numpy float',
    a_np_int_DESC := 'a numpy int']

Reading one-row-at-a-time:

Rather than using the full-table methods, if you want to read it row-by-row (e.g. to maintain compatibility with an existing event loop), you can use a TableReader instance.

The advantage here is that units and other metadata are retained and re-applied

[19]:
from ctapipe.io import HDF5TableReader

def read(mode):

    print('reading mode {}'.format(mode))

    with HDF5TableReader('container.h5', mode=mode) as h5_table:

        for group_name in ['data_0/', 'data_1/']:

            group_name = '/{}table'.format(group_name)
            print(group_name)

            for data in h5_table.read(group_name, VariousTypesContainer()):

                print(data.as_dict())
[20]:
read('r')
reading mode r
/data_0/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
/data_1/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
[21]:
read('r+')
reading mode r+
/data_0/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
/data_1/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
[22]:
read('a')
reading mode a
/data_0/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
/data_1/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
[23]:
read('w')
reading mode w
/data_0/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
/data_1/table
{'a_int': 0, 'a_float': <Quantity 0. m>, 'a_bool': True, 'a_np_int': 0, 'a_np_float': 0.0, 'a_np_bool': True}
{'a_int': 1, 'a_float': <Quantity 0.01 m>, 'a_bool': False, 'a_np_int': 1, 'a_np_float': 1.0, 'a_np_bool': False}
{'a_int': 2, 'a_float': <Quantity 0.02 m>, 'a_bool': True, 'a_np_int': 2, 'a_np_float': 2.0, 'a_np_bool': True}
{'a_int': 3, 'a_float': <Quantity 0.03 m>, 'a_bool': False, 'a_np_int': 3, 'a_np_float': 3.0, 'a_np_bool': False}
{'a_int': 4, 'a_float': <Quantity 0.04 m>, 'a_bool': True, 'a_np_int': 4, 'a_np_float': 4.0, 'a_np_bool': True}
{'a_int': 5, 'a_float': <Quantity 0.05 m>, 'a_bool': False, 'a_np_int': 5, 'a_np_float': 5.0, 'a_np_bool': False}
{'a_int': 6, 'a_float': <Quantity 0.06 m>, 'a_bool': True, 'a_np_int': 6, 'a_np_float': 6.0, 'a_np_bool': True}
{'a_int': 7, 'a_float': <Quantity 0.07 m>, 'a_bool': False, 'a_np_int': 7, 'a_np_float': 7.0, 'a_np_bool': False}
{'a_int': 8, 'a_float': <Quantity 0.08 m>, 'a_bool': True, 'a_np_int': 8, 'a_np_float': 8.0, 'a_np_bool': True}
{'a_int': 9, 'a_float': <Quantity 0.09 m>, 'a_bool': False, 'a_np_int': 9, 'a_np_float': 9.0, 'a_np_bool': False}
[ ]: