synthgauge.metrics.density

Mean absolute difference in feature densities.

Module Contents

Functions

feature_density_mad(real, synth[, feats, bins])

Mean absolute difference of feature densities.

synthgauge.metrics.density.feature_density_mad(real, synth, feats=None, bins=10)[source]

Mean absolute difference of feature densities.

For each feature the difference between the density across the bins within real and synth is calculated. Finally the MAE across all features and bins is calculated. A value close to 0 indicates that the real and synthetic datasets have a similar set of feature distributions.

Parameters
  • real (pandas.DataFrame) – DataFrame containing the real data.

  • synth (pandas.DataFrame) – DataFrame containing the sythetic data.

  • feats (list of str or None, default None) – The features that will be used to compute the densities. If None (default), all common features are used.

  • bins (str or int, default 10) – Binning method for discretising the data. Can be anything accepted by numpy.histogram_bin_edges. Default uses 10 bins.

Returns

Mean absolute error of feature densities.

Return type

float