synthgauge.evaluator

The core class for evaluating datasets.

Module Contents

Classes

Evaluator

The central class in synthgauge, used to hold and evaluate data

class synthgauge.evaluator.Evaluator(real, synth, handle_nans='drop')[source]

The central class in synthgauge, used to hold and evaluate data via metrics and visualisation.

Parameters
  • real (pandas.DataFrame) – Dataframe containing the real data.

  • synth (pandas.DataFrame) – Dataframe containing the synthetic data.

  • handle_nans (str, default "drop") – Whether to drop missing values. If yes, use “drop” (default).

Returns

An Evaluator object ready for metric and visual evaluation.

Return type

synthgauge.Evaluator

property metrics

Return __metrics.

describe_numeric()[source]

Summarise numeric features.

This function uses pandas.DataFrame.describe to calculate summary statistics for each numeric feature in self.real_data and self.synth_data.

Returns

Descriptive statistics for each numeric feature.

Return type

pandas.DataFrame

describe_categorical()[source]

Summarise categorical features.

This function uses pandas.DataFrame.describe to calculate summary statistics for each object-type feature in self.real_data and self.synth_data.

Returns

Descriptive statistics for each object-type feature.

Return type

pandas.DataFrame

add_metric(name, alias=None, **kwargs)[source]

Add a metric to the evaluator.

Metrics and their arguments are recorded to be run at a later time. This allows metric customisation but ensures that the same metric configuration is applied consistently, i.e. once added, the parameters do not require resupplying for each execution of the metric. Supplying a metric alias allows the same metric to be used multiple times with different parameters.

Note that self.real_data and self.synth_data will be passed automatically to metrics that expect these arguments. They should not be declared in metric_kwargs.

Parameters
  • name (str) – Name of the metric. Must match one of the functions in synthgauge.metrics.

  • alias (str, optional) – Alias to be given to this use of the metric in the results table. Allows the same metric to be used multiple times with different parameters. If not specified, name is used.

  • **kwargs (dict, optional) – Keyword arguments for the metric. Refer to the associated metric documentation for details.

add_custom_metric(alias, func, **kwargs)[source]

Add a custom metric to the evaluator.

A custom metric uses any user-defined function that accepts the real and synthetic dataframes as the first and second positional arguments, respectively. Any other parameters must be defined as keyword arguments. The metric function can return a value of any desired type although scalar numeric values are recommended, or collections.namedtuples when there are multiple outputs.

Parameters
  • alias (str) – Alias for the metric to appear in the results table.

  • func (function) – Top-level metric function to be called during the evaluation step. The first two arguments of func must be self.real and self.synth.

  • **kwargs (dict, optional) – Keyword arguments to be passed to func.

copy_metrics(other)[source]

Copy metrics from another evaluator.

To facilitate consistent comparisons of different synthetic datasets, this function copies the metrics dictionary from another Evaluator instance.

Parameters

other (Evaluator) – The other evaluator from which the metrics dictionary will be copied.

save_metrics(filename)[source]

Save the current metrics dictionary to disk via pickle.

Parameters

filename (str) – Path to pickle file to save the metrics.

load_metrics(filename, overwrite=False)[source]

Load metrics from disk.

Update or overwrite the current metric dictionary from a pickle.

Parameters
  • filename (str) – Path to metrics pickle file.

  • overwrite (bool, default False) – If True, all current metrics will be replaced with the loaded metrics. Default is False, which will update the current metric dictionary with the loaded metrics.

drop_metric(metric)[source]

Drops the named metric from the metrics dictionary.

Parameters

metric (str) – Key (name or alias, if specified) of the metric to remove.

evaluate(as_df=False)[source]

Compute metrics for real and synth data.

Run through the metrics dictionary and execute each with its corresponding arguments. The results are returned as either a dictionary or dataframe.

Results are also silently stored as a dictionary in self.metric_results.

Parameters

as_df (bool, default False) – If True, the results will be returned as a pandas.DataFrame, otherwise a dictionary is returned. Default is False.

Returns

  • pandas.DataFrame – If as_df is True. Each row corresponds to a metric-value pair. Metrics with multiple values have multiple rows.

  • dict – If as_df is False. The keys are the metric names and the values are the metric values (grouped). Metrics with multiple values are assigned to a single key.

plot_histograms(figcols=2, figsize=None)[source]

Plot grid of feature distributions.

Convenience wrapper for synthgauge.plot.plot_histograms. This function uses the combined real and synthetic data sets and groups by ‘source’.

plot_histogram3d(data, x, y, x_bins='auto', y_bins='auto', figsize=None)[source]

Plot 3D histogram.

Convenience wrapper for synthgauge.plot.plot_histogram3d.

Parameters
  • data ({"real", "synth", "combined"}) – Dataframe to pass to plotting function. Either “real” to pass self.real_data, “synth” to pass self.synth_data or “combined” to pass self.combined_data.

  • x (str) – Column to plot along the x-axis.

  • y (str) – Column to plot alont the y-axis.

plot_correlation(feats=None, method='pearson', figcols=2, figsize=None, **kwargs)[source]

Plot a grid of correlation heatmaps.

Convenience wrapper for synthgauge.plot.plot_correlation. Each dataset (real and synthetic) has a plot, as well as one for the differences in their correlations.

plot_crosstab(x, y, figsize=None, **kwargs)[source]

Plot pairwise cross-tabulation.

Convenience wrapper for synthgauge.plot.plot_crosstab. Automatically sets real and synth parameters to the corresponding data in self.

plot_qq(feature, n_quantiles=None, figsize=None)[source]

Plot quantile-quantile plot.

Convenience wrapper for synthgauge.plot.plot_qq.

Parameters
  • feature (str) – Feature to plot.

  • **kwargs (dict, optional) – Keyword arguments to pass to synthgauge.plot.plot_qq.