TrainModel#

class protopipe.mva.TrainModel(case, feature_name_list, target_name=None)[source]#

Bases: object

Train classification or regressor model.

Parameters
case: str

Possibilities are regressor or classifier

feature_name_list: list

List of features

target_name: str, optional

Regression target

Methods Summary

get_optimal_model(init_model, ...[, refit, ...])

Get optimal hyperparameters for an estimator and return the best model.

split_data(data_sig, train_fraction[, ...])

Load and split data to build train/test samples.

Methods Documentation

get_optimal_model(init_model, tuned_parameters, scoring, cv, refit=True, verbose=2, njobs=1)[source]#

Get optimal hyperparameters for an estimator and return the best model.

The best parameters are obtained by performing an exhaustive search over specified parameter values.

Parameters
init_model: `~sklearn.base.BaseEstimator`

Model to optimise

tuned_parameters: dict

Contains parameter names and ranges to optimise on

scoring: str

Estimator

cv: int

number of split for x-validation

refit: bool, str, or callable, default=False

Refit the estimator using the best found parameters on the whole dataset.

verbose: int

Controls the verbosity: the higher, the more messages. >1 : the computation time for each fold and parameter candidate is displayed >2 : the score is also displayed >3 : the fold and candidate parameter indexes are also displayed together with the starting time of the computation

njobs: int

Number of jobs to run in parallel. -1 means using all processors.

Returns
best_estimator: ~sklearn.base.BaseEstimator

Best model

split_data(data_sig, train_fraction, data_bkg=None, force_same_nsig_nbkg=False)[source]#

Load and split data to build train/test samples.

Parameters
data_sig: `~pandas.DataFrame`

Data frame

train_fraction: float

Fraction of events to build the training sample

data_bkg: `~pandas.DataFrame`

Data frame

force_same_nsig_nbkg: bool

If true, the same number of signal and bkg events will be used to build a classifier