Using the ctapipe Provenance service

The provenance functionality is used automatically when you use most of ctapipe functionality (particularly ctapipe.core.Tool and functions in ctapipe.io and ctapipe.utils), so normally you don’t have to work with it directly. It tracks both input and output files, as well as details of the machine and software environment on which a Tool executed.

Here we show some very low-level functions of this system:

[1]:
from ctapipe.core import Provenance
from pprint import pprint
/usr/local/lib/python3.8/site-packages/setuptools_scm/git.py:68: UserWarning: "/github/workspace" is shallow and may cause errors
  warnings.warn('"{}" is shallow and may cause errors'.format(wd.path))

Activities

The basis of Provenance is an activity, which is generally an executable or step in a script. Activities can be nested (e.g. with sub-activities), as shown below, but normally this is not required:

[2]:
p = Provenance()  # note this is a singleton, so only ever one global provenence object
p.clear()
p.start_activity()
p.add_input_file("test.txt")

p.start_activity("sub")
p.add_input_file("subinput.txt")
p.add_input_file("anothersubinput.txt")
p.add_output_file("suboutput.txt")
p.finish_activity("sub")

p.start_activity("sub2")
p.add_input_file("sub2input.txt")
p.finish_activity("sub2")

p.finish_activity()
[3]:
p.finished_activity_names
[3]:
['sub', 'sub2', '/usr/local/bin/python3']

Activities have associated input and output entities (files or other objects)

[4]:
[ (x['activity_name'], x['input']) for x in p.provenance]
[4]:
[('sub',
  [{'url': '/github/workspace/docs/examples/subinput.txt', 'role': None},
   {'url': '/github/workspace/docs/examples/anothersubinput.txt',
    'role': None}]),
 ('sub2',
  [{'url': '/github/workspace/docs/examples/sub2input.txt', 'role': None}]),
 ('/usr/local/bin/python3',
  [{'url': '/github/workspace/docs/examples/test.txt', 'role': None}])]

Activities track when they were started and finished:

[5]:
[ (x['activity_name'],x['duration_min']) for x in p.provenance]
[5]:
[('sub', 0.00013333333341414288),
 ('sub2', 0.00014999999994103064),
 ('/usr/local/bin/python3', 0.0036833333333774476)]

Full provenance

The provence object is a list of activitites, and for each lots of details are collected:

[6]:
p.provenance[0]
[6]:
{'activity_name': 'sub',
 'activity_uuid': 'f827ed7c-1f8a-4d7b-ad33-30d1bf05b711',
 'start': {'time_utc': '2020-12-01T09:30:24.201'},
 'stop': {'time_utc': '2020-12-01T09:30:24.209'},
 'system': {'ctapipe_version': '0.1.dev1+gddc003f',
  'ctapipe_resources_version': 'not installed',
  'eventio_version': '1.4.2',
  'ctapipe_svc_path': None,
  'executable': '/usr/local/bin/python3',
  'platform': {'architecture_bits': '64bit',
   'architecture_linkage': '',
   'machine': 'x86_64',
   'processor': '',
   'node': 'e54bc70aaa4e',
   'version': '#32~18.04.1-Ubuntu SMP Tue Oct 6 10:03:22 UTC 2020',
   'system': 'Linux',
   'release': '5.4.0-1031-azure',
   'libcver': ('glibc', '2.28'),
   'num_cpus': 2,
   'boot_time': '2020-12-01T09:25:05.000'},
  'python': {'version_string': '3.8.2 (default, Feb 26 2020, 15:09:34) \n[GCC 8.3.0]',
   'version': ('3', '8', '2'),
   'compiler': 'GCC 8.3.0',
   'implementation': 'CPython'},
  'environment': {'CONDA_DEFAULT_ENV': None,
   'CONDA_PREFIX': None,
   'CONDA_PYTHON_EXE': None,
   'CONDA_EXE': None,
   'CONDA_PROMPT_MODIFIER': None,
   'CONDA_SHLVL': None,
   'PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
   'LD_LIBRARY_PATH': None,
   'DYLD_LIBRARY_PATH': None,
   'USER': None,
   'HOME': '/github/home',
   'SHELL': None},
  'arguments': ['/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py',
   '-f',
   '/tmp/tmpp8irx_og.json',
   '--HistoryManager.hist_file=:memory:'],
  'start_time_utc': '2020-12-01T09:30:24.208'},
 'input': [{'url': '/github/workspace/docs/examples/subinput.txt',
   'role': None},
  {'url': '/github/workspace/docs/examples/anothersubinput.txt',
   'role': None}],
 'output': [{'url': '/github/workspace/docs/examples/suboutput.txt',
   'role': None}],
 'status': 'sub',
 'duration_min': 0.00013333333341414288}

This can be better represented in JSON:

[7]:
print(p.as_json(indent=2))
[
  {
    "activity_name": "sub",
    "activity_uuid": "f827ed7c-1f8a-4d7b-ad33-30d1bf05b711",
    "start": {
      "time_utc": "2020-12-01T09:30:24.201"
    },
    "stop": {
      "time_utc": "2020-12-01T09:30:24.209"
    },
    "system": {
      "ctapipe_version": "0.1.dev1+gddc003f",
      "ctapipe_resources_version": "not installed",
      "eventio_version": "1.4.2",
      "ctapipe_svc_path": null,
      "executable": "/usr/local/bin/python3",
      "platform": {
        "architecture_bits": "64bit",
        "architecture_linkage": "",
        "machine": "x86_64",
        "processor": "",
        "node": "e54bc70aaa4e",
        "version": "#32~18.04.1-Ubuntu SMP Tue Oct 6 10:03:22 UTC 2020",
        "system": "Linux",
        "release": "5.4.0-1031-azure",
        "libcver": [
          "glibc",
          "2.28"
        ],
        "num_cpus": 2,
        "boot_time": "2020-12-01T09:25:05.000"
      },
      "python": {
        "version_string": "3.8.2 (default, Feb 26 2020, 15:09:34) \n[GCC 8.3.0]",
        "version": [
          "3",
          "8",
          "2"
        ],
        "compiler": "GCC 8.3.0",
        "implementation": "CPython"
      },
      "environment": {
        "CONDA_DEFAULT_ENV": null,
        "CONDA_PREFIX": null,
        "CONDA_PYTHON_EXE": null,
        "CONDA_EXE": null,
        "CONDA_PROMPT_MODIFIER": null,
        "CONDA_SHLVL": null,
        "PATH": "/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "LD_LIBRARY_PATH": null,
        "DYLD_LIBRARY_PATH": null,
        "USER": null,
        "HOME": "/github/home",
        "SHELL": null
      },
      "arguments": [
        "/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py",
        "-f",
        "/tmp/tmpp8irx_og.json",
        "--HistoryManager.hist_file=:memory:"
      ],
      "start_time_utc": "2020-12-01T09:30:24.208"
    },
    "input": [
      {
        "url": "/github/workspace/docs/examples/subinput.txt",
        "role": null
      },
      {
        "url": "/github/workspace/docs/examples/anothersubinput.txt",
        "role": null
      }
    ],
    "output": [
      {
        "url": "/github/workspace/docs/examples/suboutput.txt",
        "role": null
      }
    ],
    "status": "sub",
    "duration_min": 0.00013333333341414288
  },
  {
    "activity_name": "sub2",
    "activity_uuid": "55365a01-ceef-4589-af0f-aa49bb63c285",
    "start": {
      "time_utc": "2020-12-01T09:30:24.211"
    },
    "stop": {
      "time_utc": "2020-12-01T09:30:24.220"
    },
    "system": {
      "ctapipe_version": "0.1.dev1+gddc003f",
      "ctapipe_resources_version": "not installed",
      "eventio_version": "1.4.2",
      "ctapipe_svc_path": null,
      "executable": "/usr/local/bin/python3",
      "platform": {
        "architecture_bits": "64bit",
        "architecture_linkage": "",
        "machine": "x86_64",
        "processor": "",
        "node": "e54bc70aaa4e",
        "version": "#32~18.04.1-Ubuntu SMP Tue Oct 6 10:03:22 UTC 2020",
        "system": "Linux",
        "release": "5.4.0-1031-azure",
        "libcver": [
          "glibc",
          "2.28"
        ],
        "num_cpus": 2,
        "boot_time": "2020-12-01T09:25:05.000"
      },
      "python": {
        "version_string": "3.8.2 (default, Feb 26 2020, 15:09:34) \n[GCC 8.3.0]",
        "version": [
          "3",
          "8",
          "2"
        ],
        "compiler": "GCC 8.3.0",
        "implementation": "CPython"
      },
      "environment": {
        "CONDA_DEFAULT_ENV": null,
        "CONDA_PREFIX": null,
        "CONDA_PYTHON_EXE": null,
        "CONDA_EXE": null,
        "CONDA_PROMPT_MODIFIER": null,
        "CONDA_SHLVL": null,
        "PATH": "/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "LD_LIBRARY_PATH": null,
        "DYLD_LIBRARY_PATH": null,
        "USER": null,
        "HOME": "/github/home",
        "SHELL": null
      },
      "arguments": [
        "/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py",
        "-f",
        "/tmp/tmpp8irx_og.json",
        "--HistoryManager.hist_file=:memory:"
      ],
      "start_time_utc": "2020-12-01T09:30:24.219"
    },
    "input": [
      {
        "url": "/github/workspace/docs/examples/sub2input.txt",
        "role": null
      }
    ],
    "output": [],
    "status": "sub2",
    "duration_min": 0.00014999999994103064
  },
  {
    "activity_name": "/usr/local/bin/python3",
    "activity_uuid": "a40b466f-0076-489a-9c0c-02c9ee61413f",
    "start": {
      "time_utc": "2020-12-01T09:30:24.001"
    },
    "stop": {
      "time_utc": "2020-12-01T09:30:24.222"
    },
    "system": {
      "ctapipe_version": "0.1.dev1+gddc003f",
      "ctapipe_resources_version": "not installed",
      "eventio_version": "1.4.2",
      "ctapipe_svc_path": null,
      "executable": "/usr/local/bin/python3",
      "platform": {
        "architecture_bits": "64bit",
        "architecture_linkage": "",
        "machine": "x86_64",
        "processor": "",
        "node": "e54bc70aaa4e",
        "version": "#32~18.04.1-Ubuntu SMP Tue Oct 6 10:03:22 UTC 2020",
        "system": "Linux",
        "release": "5.4.0-1031-azure",
        "libcver": [
          "glibc",
          "2.28"
        ],
        "num_cpus": 2,
        "boot_time": "2020-12-01T09:25:05.000"
      },
      "python": {
        "version_string": "3.8.2 (default, Feb 26 2020, 15:09:34) \n[GCC 8.3.0]",
        "version": [
          "3",
          "8",
          "2"
        ],
        "compiler": "GCC 8.3.0",
        "implementation": "CPython"
      },
      "environment": {
        "CONDA_DEFAULT_ENV": null,
        "CONDA_PREFIX": null,
        "CONDA_PYTHON_EXE": null,
        "CONDA_EXE": null,
        "CONDA_PROMPT_MODIFIER": null,
        "CONDA_SHLVL": null,
        "PATH": "/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "LD_LIBRARY_PATH": null,
        "DYLD_LIBRARY_PATH": null,
        "USER": null,
        "HOME": "/github/home",
        "SHELL": null
      },
      "arguments": [
        "/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py",
        "-f",
        "/tmp/tmpp8irx_og.json",
        "--HistoryManager.hist_file=:memory:"
      ],
      "start_time_utc": "2020-12-01T09:30:24.200"
    },
    "input": [
      {
        "url": "/github/workspace/docs/examples/test.txt",
        "role": null
      }
    ],
    "output": [],
    "status": "completed",
    "duration_min": 0.0036833333333774476
  }
]

Storing provenance info in output files

  • already this can be stored in something like an HDF5 file header, which allows hierarchies.

  • Try to flatted the data so it can be stored in a key=value header in a FITS file (using the FITS extended keyword convention to allow >8 character keywords), or as a table

[8]:
def flatten_dict(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out
[9]:
d = dict(activity=p.provenance)
[10]:
pprint(flatten_dict(d))
{'activity.0.activity_name': 'sub',
 'activity.0.activity_uuid': 'f827ed7c-1f8a-4d7b-ad33-30d1bf05b711',
 'activity.0.duration_min': 0.00013333333341414288,
 'activity.0.input.0.role': None,
 'activity.0.input.0.url': '/github/workspace/docs/examples/subinput.txt',
 'activity.0.input.1.role': None,
 'activity.0.input.1.url': '/github/workspace/docs/examples/anothersubinput.txt',
 'activity.0.output.0.role': None,
 'activity.0.output.0.url': '/github/workspace/docs/examples/suboutput.txt',
 'activity.0.start.time_utc': '2020-12-01T09:30:24.201',
 'activity.0.status': 'sub',
 'activity.0.stop.time_utc': '2020-12-01T09:30:24.209',
 'activity.0.system.arguments.0': '/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py',
 'activity.0.system.arguments.1': '-f',
 'activity.0.system.arguments.2': '/tmp/tmpp8irx_og.json',
 'activity.0.system.arguments.3': '--HistoryManager.hist_file=:memory:',
 'activity.0.system.ctapipe_resources_version': 'not installed',
 'activity.0.system.ctapipe_svc_path': None,
 'activity.0.system.ctapipe_version': '0.1.dev1+gddc003f',
 'activity.0.system.environment.CONDA_DEFAULT_ENV': None,
 'activity.0.system.environment.CONDA_EXE': None,
 'activity.0.system.environment.CONDA_PREFIX': None,
 'activity.0.system.environment.CONDA_PROMPT_MODIFIER': None,
 'activity.0.system.environment.CONDA_PYTHON_EXE': None,
 'activity.0.system.environment.CONDA_SHLVL': None,
 'activity.0.system.environment.DYLD_LIBRARY_PATH': None,
 'activity.0.system.environment.HOME': '/github/home',
 'activity.0.system.environment.LD_LIBRARY_PATH': None,
 'activity.0.system.environment.PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
 'activity.0.system.environment.SHELL': None,
 'activity.0.system.environment.USER': None,
 'activity.0.system.eventio_version': '1.4.2',
 'activity.0.system.executable': '/usr/local/bin/python3',
 'activity.0.system.platform.architecture_bits': '64bit',
 'activity.0.system.platform.architecture_linkage': '',
 'activity.0.system.platform.boot_time': '2020-12-01T09:25:05.000',
 'activity.0.system.platform.libcver': ('glibc', '2.28'),
 'activity.0.system.platform.machine': 'x86_64',
 'activity.0.system.platform.node': 'e54bc70aaa4e',
 'activity.0.system.platform.num_cpus': 2,
 'activity.0.system.platform.processor': '',
 'activity.0.system.platform.release': '5.4.0-1031-azure',
 'activity.0.system.platform.system': 'Linux',
 'activity.0.system.platform.version': '#32~18.04.1-Ubuntu SMP Tue Oct 6 '
                                       '10:03:22 UTC 2020',
 'activity.0.system.python.compiler': 'GCC 8.3.0',
 'activity.0.system.python.implementation': 'CPython',
 'activity.0.system.python.version': ('3', '8', '2'),
 'activity.0.system.python.version_string': '3.8.2 (default, Feb 26 2020, '
                                            '15:09:34) \n'
                                            '[GCC 8.3.0]',
 'activity.0.system.start_time_utc': '2020-12-01T09:30:24.208',
 'activity.1.activity_name': 'sub2',
 'activity.1.activity_uuid': '55365a01-ceef-4589-af0f-aa49bb63c285',
 'activity.1.duration_min': 0.00014999999994103064,
 'activity.1.input.0.role': None,
 'activity.1.input.0.url': '/github/workspace/docs/examples/sub2input.txt',
 'activity.1.start.time_utc': '2020-12-01T09:30:24.211',
 'activity.1.status': 'sub2',
 'activity.1.stop.time_utc': '2020-12-01T09:30:24.220',
 'activity.1.system.arguments.0': '/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py',
 'activity.1.system.arguments.1': '-f',
 'activity.1.system.arguments.2': '/tmp/tmpp8irx_og.json',
 'activity.1.system.arguments.3': '--HistoryManager.hist_file=:memory:',
 'activity.1.system.ctapipe_resources_version': 'not installed',
 'activity.1.system.ctapipe_svc_path': None,
 'activity.1.system.ctapipe_version': '0.1.dev1+gddc003f',
 'activity.1.system.environment.CONDA_DEFAULT_ENV': None,
 'activity.1.system.environment.CONDA_EXE': None,
 'activity.1.system.environment.CONDA_PREFIX': None,
 'activity.1.system.environment.CONDA_PROMPT_MODIFIER': None,
 'activity.1.system.environment.CONDA_PYTHON_EXE': None,
 'activity.1.system.environment.CONDA_SHLVL': None,
 'activity.1.system.environment.DYLD_LIBRARY_PATH': None,
 'activity.1.system.environment.HOME': '/github/home',
 'activity.1.system.environment.LD_LIBRARY_PATH': None,
 'activity.1.system.environment.PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
 'activity.1.system.environment.SHELL': None,
 'activity.1.system.environment.USER': None,
 'activity.1.system.eventio_version': '1.4.2',
 'activity.1.system.executable': '/usr/local/bin/python3',
 'activity.1.system.platform.architecture_bits': '64bit',
 'activity.1.system.platform.architecture_linkage': '',
 'activity.1.system.platform.boot_time': '2020-12-01T09:25:05.000',
 'activity.1.system.platform.libcver': ('glibc', '2.28'),
 'activity.1.system.platform.machine': 'x86_64',
 'activity.1.system.platform.node': 'e54bc70aaa4e',
 'activity.1.system.platform.num_cpus': 2,
 'activity.1.system.platform.processor': '',
 'activity.1.system.platform.release': '5.4.0-1031-azure',
 'activity.1.system.platform.system': 'Linux',
 'activity.1.system.platform.version': '#32~18.04.1-Ubuntu SMP Tue Oct 6 '
                                       '10:03:22 UTC 2020',
 'activity.1.system.python.compiler': 'GCC 8.3.0',
 'activity.1.system.python.implementation': 'CPython',
 'activity.1.system.python.version': ('3', '8', '2'),
 'activity.1.system.python.version_string': '3.8.2 (default, Feb 26 2020, '
                                            '15:09:34) \n'
                                            '[GCC 8.3.0]',
 'activity.1.system.start_time_utc': '2020-12-01T09:30:24.219',
 'activity.2.activity_name': '/usr/local/bin/python3',
 'activity.2.activity_uuid': 'a40b466f-0076-489a-9c0c-02c9ee61413f',
 'activity.2.duration_min': 0.0036833333333774476,
 'activity.2.input.0.role': None,
 'activity.2.input.0.url': '/github/workspace/docs/examples/test.txt',
 'activity.2.start.time_utc': '2020-12-01T09:30:24.001',
 'activity.2.status': 'completed',
 'activity.2.stop.time_utc': '2020-12-01T09:30:24.222',
 'activity.2.system.arguments.0': '/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py',
 'activity.2.system.arguments.1': '-f',
 'activity.2.system.arguments.2': '/tmp/tmpp8irx_og.json',
 'activity.2.system.arguments.3': '--HistoryManager.hist_file=:memory:',
 'activity.2.system.ctapipe_resources_version': 'not installed',
 'activity.2.system.ctapipe_svc_path': None,
 'activity.2.system.ctapipe_version': '0.1.dev1+gddc003f',
 'activity.2.system.environment.CONDA_DEFAULT_ENV': None,
 'activity.2.system.environment.CONDA_EXE': None,
 'activity.2.system.environment.CONDA_PREFIX': None,
 'activity.2.system.environment.CONDA_PROMPT_MODIFIER': None,
 'activity.2.system.environment.CONDA_PYTHON_EXE': None,
 'activity.2.system.environment.CONDA_SHLVL': None,
 'activity.2.system.environment.DYLD_LIBRARY_PATH': None,
 'activity.2.system.environment.HOME': '/github/home',
 'activity.2.system.environment.LD_LIBRARY_PATH': None,
 'activity.2.system.environment.PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
 'activity.2.system.environment.SHELL': None,
 'activity.2.system.environment.USER': None,
 'activity.2.system.eventio_version': '1.4.2',
 'activity.2.system.executable': '/usr/local/bin/python3',
 'activity.2.system.platform.architecture_bits': '64bit',
 'activity.2.system.platform.architecture_linkage': '',
 'activity.2.system.platform.boot_time': '2020-12-01T09:25:05.000',
 'activity.2.system.platform.libcver': ('glibc', '2.28'),
 'activity.2.system.platform.machine': 'x86_64',
 'activity.2.system.platform.node': 'e54bc70aaa4e',
 'activity.2.system.platform.num_cpus': 2,
 'activity.2.system.platform.processor': '',
 'activity.2.system.platform.release': '5.4.0-1031-azure',
 'activity.2.system.platform.system': 'Linux',
 'activity.2.system.platform.version': '#32~18.04.1-Ubuntu SMP Tue Oct 6 '
                                       '10:03:22 UTC 2020',
 'activity.2.system.python.compiler': 'GCC 8.3.0',
 'activity.2.system.python.implementation': 'CPython',
 'activity.2.system.python.version': ('3', '8', '2'),
 'activity.2.system.python.version_string': '3.8.2 (default, Feb 26 2020, '
                                            '15:09:34) \n'
                                            '[GCC 8.3.0]',
 'activity.2.system.start_time_utc': '2020-12-01T09:30:24.200'}