split_train_test#

protopipe.mva.split_train_test(survived_images, train_fraction, feature_name_list, target_name)[source]#

Split the data selected for cuts in train and test samples.

If the estimator is a classifier, data is split in a stratified fashion, using this as the class labels.

Parameters
survived_images: `~pandas.DataFrame`

Images that survived the selection cuts.

train_fraction: `float`

Fraction of data to be used for training.

feature_name_list: `list`

List of variables to use for training the model.

target_name: `str`

Variable against which to train.

Returns
X_train: ~pandas.DataFrame

Data frame

X_test: ~pandas.DataFrame

Data frame

y_train: ~pandas.DataFrame

Data frame

y_test: ~pandas.DataFrame

Data frame

data_train: ~pandas.DataFrame

Training data indexed by observation ID and event ID.

data_test: ~pandas.DataFrame

Test data indexed by observation ID and event ID.