:py:mod:`synthgauge.evaluator`
==============================

.. py:module:: synthgauge.evaluator

.. autoapi-nested-parse::

   The core class for evaluating datasets.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   synthgauge.evaluator.Evaluator


.. py:class:: Evaluator(real, synth, handle_nans='drop')

   The central class in `synthgauge`, used to hold and evaluate data
   via metrics and visualisation.

   :param real: Dataframe containing the real data.
   :type real: pandas.DataFrame
   :param synth: Dataframe containing the synthetic data.
   :type synth: pandas.DataFrame
   :param handle_nans: Whether to drop missing values. If yes, use "drop" (default).
   :type handle_nans: str, default "drop"

   :returns: An `Evaluator` object ready for metric and visual evaluation.
   :rtype: synthgauge.Evaluator

   .. py:property:: metrics

      Return __metrics.

   .. py:method:: describe_numeric()

      Summarise numeric features.

      This function uses `pandas.DataFrame.describe` to calculate
      summary statistics for each numeric feature in `self.real_data`
      and `self.synth_data`.

      :returns: Descriptive statistics for each numeric feature.
      :rtype: pandas.DataFrame


   .. py:method:: describe_categorical()

      Summarise categorical features.

      This function uses `pandas.DataFrame.describe` to calculate
      summary statistics for each object-type feature in
      `self.real_data` and `self.synth_data`.

      :returns: Descriptive statistics for each object-type feature.
      :rtype: pandas.DataFrame


   .. py:method:: add_metric(name, alias=None, **kwargs)

      Add a metric to the evaluator.

      Metrics and their arguments are recorded to be run at a later
      time. This allows metric customisation but ensures that the same
      metric configuration is applied consistently, i.e. once added,
      the parameters do not require resupplying for each execution of
      the metric. Supplying a metric alias allows the same metric to
      be used multiple times with different parameters.

      Note that `self.real_data` and `self.synth_data` will be passed
      automatically to metrics that expect these arguments. They
      should not be declared in `metric_kwargs`.

      :param name: Name of the metric. Must match one of the functions in
                   `synthgauge.metrics`.
      :type name: str
      :param alias: Alias to be given to this use of the metric in the results
                    table. Allows the same metric to be used multiple times with
                    different parameters. If not specified, `name` is used.
      :type alias: str, optional
      :param \*\*kwargs: Keyword arguments for the metric. Refer to the associated
                         metric documentation for details.
      :type \*\*kwargs: dict, optional


   .. py:method:: add_custom_metric(alias, func, **kwargs)

      Add a custom metric to the evaluator.

      A custom metric uses any user-defined function that accepts the
      real and synthetic dataframes as the first and second positional
      arguments, respectively. Any other parameters must be defined as
      keyword arguments. The metric function can return a value of any
      desired type although scalar numeric values are recommended, or
      `collections.namedtuples` when there are multiple outputs.

      :param alias: Alias for the metric to appear in the results table.
      :type alias: str
      :param func: Top-level metric function to be called during the evaluation
                   step. The first two arguments of `func` must be `self.real`
                   and `self.synth`.
      :type func: function
      :param \*\*kwargs: Keyword arguments to be passed to `func`.
      :type \*\*kwargs: dict, optional


   .. py:method:: copy_metrics(other)

      Copy metrics from another evaluator.

      To facilitate consistent comparisons of different synthetic
      datasets, this function copies the metrics dictionary from
      another `Evaluator` instance.

      :param other: The other evaluator from which the metrics dictionary will
                    be copied.
      :type other: Evaluator


   .. py:method:: save_metrics(filename)

      Save the current metrics dictionary to disk via `pickle`.

      :param filename: Path to pickle file to save the metrics.
      :type filename: str


   .. py:method:: load_metrics(filename, overwrite=False)

      Load metrics from disk.

      Update or overwrite the current metric dictionary from a pickle.

      :param filename: Path to metrics pickle file.
      :type filename: str
      :param overwrite: If `True`, all current metrics will be replaced with the
                        loaded metrics. Default is `False`, which will update the
                        current metric dictionary with the loaded metrics.
      :type overwrite: bool, default False


   .. py:method:: drop_metric(metric)

      Drops the named metric from the metrics dictionary.

      :param metric: Key (name or alias, if specified) of the metric to remove.
      :type metric: str


   .. py:method:: evaluate(as_df=False)

      Compute metrics for real and synth data.

      Run through the metrics dictionary and execute each with its
      corresponding arguments. The results are returned as either a
      dictionary or dataframe.

      Results are also silently stored as a dictionary in
      `self.metric_results`.

      :param as_df: If `True`, the results will be returned as a
                    `pandas.DataFrame`, otherwise a dictionary is returned.
                    Default is `False`.
      :type as_df: bool, default False

      :returns: * *pandas.DataFrame* -- If `as_df` is `True`. Each row corresponds to a metric-value
                  pair. Metrics with multiple values have multiple rows.
                * *dict* -- If `as_df` is `False`. The keys are the metric names and
                  the values are the metric values (grouped). Metrics with
                  multiple values are assigned to a single key.


   .. py:method:: plot_histograms(figcols=2, figsize=None)

      Plot grid of feature distributions.

      Convenience wrapper for `synthgauge.plot.plot_histograms`. This
      function uses the combined real and synthetic data sets and
      groups by `'source'`.


   .. py:method:: plot_histogram3d(data, x, y, x_bins='auto', y_bins='auto', figsize=None)

      Plot 3D histogram.

      Convenience wrapper for `synthgauge.plot.plot_histogram3d`.

      :param data: Dataframe to pass to plotting function. Either `"real"` to
                   pass `self.real_data`, `"synth"` to pass `self.synth_data`
                   or `"combined"` to pass `self.combined_data`.
      :type data: {"real", "synth", "combined"}
      :param x: Column to plot along the x-axis.
      :type x: str
      :param y: Column to plot alont the y-axis.
      :type y: str


   .. py:method:: plot_correlation(feats=None, method='pearson', figcols=2, figsize=None, **kwargs)

      Plot a grid of correlation heatmaps.

      Convenience wrapper for `synthgauge.plot.plot_correlation`. Each
      dataset (real and synthetic) has a plot, as well as one for the
      differences in their correlations.


   .. py:method:: plot_crosstab(x, y, figsize=None, **kwargs)

      Plot pairwise cross-tabulation.

      Convenience wrapper for `synthgauge.plot.plot_crosstab`.
      Automatically sets `real` and `synth` parameters to the
      corresponding data in `self`.


   .. py:method:: plot_qq(feature, n_quantiles=None, figsize=None)

      Plot quantile-quantile plot.

      Convenience wrapper for `synthgauge.plot.plot_qq`.

      :param feature: Feature to plot.
      :type feature: str
      :param \*\*kwargs: Keyword arguments to pass to `synthgauge.plot.plot_qq`.
      :type \*\*kwargs: dict, optional