Exploring Raw Data

Here are just some very simple examples of going through and inspecting the raw data, and making some plots using ctapipe. The data explored here are raw Monte Carlo data, which is Data Level “R0” in CTA terminology (e.g. it is before any processing that would happen inside a Camera or off-line)


from ctapipe.utils import get_dataset_path
from ctapipe.io import EventSource, EventSeeker
from ctapipe.visualization import CameraDisplay
from ctapipe.instrument import CameraGeometry
from matplotlib import pyplot as plt
from astropy import units as u
%matplotlib inline
/usr/local/lib/python3.8/site-packages/setuptools_scm/git.py:68: UserWarning: "/github/workspace" is shallow and may cause errors
  warnings.warn('"{}" is shallow and may cause errors'.format(wd.path))

To read SimTelArray format data, ctapipe uses the pyeventio library (which is installed automatically along with ctapipe). The following lines however will load any data known to ctapipe (multiple EventSources are implemented, and chosen automatically based on the type of the input file.

All data access first starts with an EventSource, and here we use a helper function event_source that constructs one. The resulting source object can be iterated over like a list of events. We also here use an EventSeeker which provides random-access to the source (by seeking to the given event ID or number)

source = EventSource(get_dataset_path("gamma_test_large.simtel.gz"), max_events=100, back_seekable=True)
seeker = EventSeeker(source)

Explore the contents of an event

note that the R0 level is the raw data that comes out of a camera, and also the lowest level of monte-carlo data.

event = seeker.get_event_index(3)  # get 3rd event
Seeking event by iterating through events.. (potentially long process)
                       index.*: event indexing information
                          r0.*: Raw Data
                          r1.*: R1 Calibrated Data
                         dl0.*: DL0 Data Volume Reduced Data
                         dl1.*: DL1 Calibrated image
                         dl2.*: Reconstructed Shower Information
                  simulation.*: Simulated Event Information
                     trigger.*: central trigger information
                         count: number of events processed
                    pointing.*: Array and telescope pointing positions
                 calibration.*: Container for calibration coefficients for the
                                current event
                         mon.*: container for event-wise monitoring data (MON)

the event is just a class with a bunch of data items in it. You can see a more compact represntation via:

                        tel[*]: map of tel_id to R0CameraContainer

printing the event structure, will currently print the value all items under it (so you get a lot of output if you print a high-level container):

{'alt': <Angle 1.19518363 rad>,
 'az': <Angle 0.1114234 rad>,
 'core_x': <Quantity 51.67511749 m>,
 'core_y': <Quantity 69.65403748 m>,
 'energy': <Quantity 1.86374986 TeV>,
 'h_first_int': <Quantity 18171.7265625 m>,
 'shower_primary_id': 0,
 'x_max': <Quantity 379.53845215 g / cm2>}
dict_keys([2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 19, 20, 21])

note that the event has 2 telescopes in it: 38,40… Let’s try the next one:

event = seeker.get_event_index(5) # get the next event
dict_keys([6, 10, 15, 18, 22, 27, 35])

now, we have a larger event with many telescopes… Let’s look at the data from CT7:

teldata = event.r0.tel[15]
{'waveform': array([[[ 77,  92, 101, ..., 126, 128, 117],
        [100,  87, 101, ...,  94,  89,  94],
        [111, 103,  98, ...,  74,  80,  77],
        [105, 108, 105, ..., 137, 126, 115],
        [119, 149, 178, ...,  87,  93,  87],
        [102,  93,  98, ..., 147, 137, 130]]], dtype=uint16)}
                      waveform: numpy array containing ADC samples(n_channels,
                                n_pixels, n_samples)

Note that some values are unit quantities (astropy.units.Quantity) or angular quantities (astropy.coordinates.Angle), and you can easily maniuplate them:

$0.46449643 \; \mathrm{TeV}$
$464.49643 \; \mathrm{GeV}$
$7.4420533 \times 10^{-8} \; \mathrm{J}$
print("Altitude in degrees:", event.simulation.shower.alt.deg)
Altitude in degrees: 71.7223821345782

Look for signal pixels in a camera

again, event.r0.tel[x] contains a data structure for the telescope data, with some fields like waveform.

Let’s make a 2D plot of the sample data (sample vs pixel), so we can see if we see which pixels contain Cherenkov light signals:

plt.pcolormesh(teldata.waveform[0])  # note the [0] is for channel 0
plt.xlabel("sample number")
Text(0, 0.5, 'Pixel_id')

Let’s zoom in to see if we can identify the pixels that have the Cherenkov signal in them

plt.xlabel("sample number")
print("waveform[0] is an array of shape (N_pix,N_slice) =",teldata.waveform[0].shape)
waveform[0] is an array of shape (N_pix,N_slice) = (1764, 25)

Now we can really see that some pixels have a signal in them!

Lets look at a 1D plot of pixel 270 in channel 0 and see the signal:

trace = teldata.waveform[0][719]
plt.plot(trace, drawstyle='steps')
[<matplotlib.lines.Line2D at 0x7f7b3019d340>]

Great! It looks like a standard Cherenkov signal!

Let’s take a look at several traces to see if the peaks area aligned:

for pix_id in range(718,723):
    plt.plot(teldata.waveform[0][pix_id], label="pix {}".format(pix_id), drawstyle='steps')
<matplotlib.legend.Legend at 0x7f7b30155820>

Look at the time trace from a Camera Pixel

ctapipe.calib.camera includes classes for doing automatic trace integration with many methods, but before using that, let’s just try to do something simple!

Let’s define the integration windows first: By eye, they seem to be reaonsable from sample 8 to 13 for signal, and 20 to 29 for pedestal (which we define as the sum of all noise: NSB + electronic)

for pix_id in range(718,723):
plt.fill_betweenx([0,1600],19,24,color='red',alpha=0.3, label='Ped window')
plt.fill_betweenx([0,1600],5,9,color='green',alpha=0.3, label='Signal window')
<matplotlib.legend.Legend at 0x7f7b3019d940>

Do a very simplisitic trace analysis

Now, let’s for example calculate a signal and background in a the fixed windows we defined for this single event. Note we are ignoring the fact that cameras have 2 gains, and just using a single gain (channel 0, which is the high-gain channel):

data = teldata.waveform[0]
peds = data[:, 19:24].mean(axis=1)  # mean of samples 20 to 29 for all pixels
sums = data[:, 5:9].sum(axis=1)/(13-8)    # simple sum integration
phist = plt.hist(peds, bins=50, range=[0,150])
plt.title("Pedestal Distribution of all pixels for a single event")
Text(0.5, 1.0, 'Pedestal Distribution of all pixels for a single event')

let’s now take a look at the pedestal-subtracted sums and a pedestal-subtracted signal:

plt.plot(sums - peds)
plt.xlabel("pixel id")
plt.ylabel("Pedestal-subtracted Signal")
Text(0, 0.5, 'Pedestal-subtracted Signal')

Now, we can clearly see that the signal is centered at 0 where there is no Cherenkov light, and we can also clearly see the shower around pixel 250.

# we can also subtract the pedestals from the traces themselves, which would be needed to compare peaks properly
for ii in range(270,280):
    plt.plot(data[ii] - peds[ii], drawstyle='steps', label="pix{}".format(ii))
<matplotlib.legend.Legend at 0x7f7b3018b550>

Camera Displays

It’s of course much easier to see the signal if we plot it in 2D with correct pixel positions!

note: the instrument data model is not fully implemented, so there is not a good way to load all the camera information (right now it is hacked into the inst sub-container that is read from the Monte-Carlo file)

camgeom = source.subarray.tel[24].camera.geometry
title="CT24, run {} event {} ped-sub".format(event.index.obs_id,event.index.event_id)
disp = CameraDisplay(camgeom,title=title)
disp.image = sums - peds
disp.cmap = plt.cm.RdBu_r
disp.set_limits_percent(95)  # autoscale

It looks like a nice signal! We have plotted our pedestal-subtracted trace integral, and see the shower clearly!

Let’s look at all telescopes:

note we plot here the raw signal, since we have not calculated the pedestals for each)

for tel in event.r0.tel.keys():
    camgeom = source.subarray.tel[tel].camera.geometry
    title="CT{}, run {} event {}".format(tel,event.index.obs_id,event.index.event_id)
    disp = CameraDisplay(camgeom,title=title)
    disp.image = event.r0.tel[tel].waveform[0].sum(axis=1)
    disp.cmap = plt.cm.RdBu_r

some signal processing…

Let’s try to detect the peak using the scipy.signal package: http://docs.scipy.org/doc/scipy/reference/signal.html

from scipy import signal
import numpy as np
pix_ids = np.arange(len(data))
has_signal = sums > 300

widths = np.array([8,]) # peak widths to search for (let's fix it at 8 samples, about the width of the peak)
peaks = [signal.find_peaks_cwt(trace,widths) for trace in data[has_signal] ]

for p,s in zip(pix_ids[has_signal],peaks):
    print("pix{} has peaks at sample {}".format(p,s))
    plt.plot(data[p], drawstyle='steps-mid')
pix630 has peaks at sample [8]
pix696 has peaks at sample [ 8 20]
pix714 has peaks at sample [ 7 20]
pix716 has peaks at sample [ 8 20]
pix717 has peaks at sample [ 8 20]
pix718 has peaks at sample [ 8 20]
pix719 has peaks at sample [ 8 21]
pix803 has peaks at sample [ 8 21]

clearly the signal needs to be filtered first, or an appropriate wavelet used, but the idea is nice

[ ]: