Multivariate analysis (protopipe.mva
)#
Introduction#
protopipe.mva contains utilities to build models for regression and classification. It is based on machine learning methods available in scikit-learn. Internally, the tables are dealt with the Pandas Python module.
For each type of camera a regressor/classifier should be trained.
For both type of models an average of the image estimates is computed during the Data training and/or Production of DL2 data steps to determine a global output for the event (energy or score/gammaness).
Details#
Data is split in train and test subsamples by single telescope images.
The class `TrainModel`
uses a training sample composed of
signal for a regression model,
signal and background for a classifier.
In the default analysis workflow, signal is composed of gamma-rays while background by protons.
The training of a model can be done also via the GridSearchCV algorithm which allows to find the best hyper-parameters of the models.
Currently tested models:
sklearn.ensemble.RandomForestClassifier
sklearn.ensemble.RandomForestRegressor
sklearn.ensemble.AdaBoostRegressor
For details about the generation of each model type, please refer to Building the models.
Reference/API#
protopipe.mva Package#
Classes to build models based on machine learning methods.
Functions#
|
Returns DataStore with reco energy + score/target columns of model at the level-event. |
|
Returns DataStore with keepcols + score/target columns of model at the level-subarray-event. |
Initialize the parser of protopipe.scripts.build_model. |
|
|
|
|
Plot feature distributions for several data set. |
|
Utility function to plot histogram |
|
Plot profile of a distribution |
|
Plot ROC curve for a given set of model outputs and labels |
|
Add custom variables to the input data and optionally select it. |
|
Save model and data used to produce it per camera-type. |
|
Split the data selected for cuts in train and test samples. |
Classes#
|
Train classification or regressor model. |