synthgauge.metrics.classification

Utility metrics using scikit-learn-style classifiers.

Module Contents

Functions

classification_comparison(real, synth, feats, target, ...)

Classification utility metric.

synthgauge.metrics.classification.classification_comparison(real, synth, feats, target, classifier, test_prop=0.2, random_state=None, **kwargs)[source]

Classification utility metric.

This metric fits two (identical) classification models to real and synth, and then tests them both against withheld real data. We obtain utility scores by subtracting the precision, recall and f1 scores of the “synthetic” model predictions from the “real” model’s.

Parameters
  • real (pandas.DataFrame) – Dataframe containing the real data.

  • synth (pandas.DataFrame) – Dataframe containing the synthetic data.

  • feats (list of str) – List of column names to use as the input in the classification.

  • target (str) – Column to use as target in the classification.

  • classifier (scikit-learn estimator) – Classifier class with fit and predict methods.

  • test_prop (float or int, default 0.2) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.

  • random_state (int, optional) – Random seed for shuffling during the train-test split, and for the classification algorithm itself.

  • **kwargs (dict, optional) – featsword arguments passed to the classifier.

Returns

  • precision_difference (float) – Precision of the real model subtracted by that of the synthetic model.

  • recall_difference (float) – Recall of the real model subtracted by that of the synthetic model.

  • f1_difference (float) – f1 score of the real model subtracted by that of the synthetic model.

Notes

Some preprocessing is carried out before the models are trained. Numeric features are scaled and categorical features are one-hot-encoded.

A score of zero tells us the synthetic data is just as good as the real at training the given classification model. Increases in these scores indicate poorer utility.