synthgauge.datasets

Functions for creating toy datasets.

Module Contents

Functions

make_blood_types_df([noise, nan_prop, seed])

Create a toy dataset about blood types and physical atrtibutes.

synthgauge.datasets.make_blood_types_df(noise=0, nan_prop=0, seed=None)[source]

Create a toy dataset about blood types and physical atrtibutes.

This function is used to create data for the package’s examples and its tests. Its outputs are not intended to imply or be used for any meaningful data analysis.

Parameters
  • noise (float) – Standard deviation of the Gaussian noise added to the data. Default is zero (no noise) and must be non-negative.

  • nan_prop (float, default 0) – Proportion of dataset to replace with missing values.

  • seed (int, optional) – Seed used by all random samplers. Used for reproducibility.

Returns

data – A toy “blood type” dataset.

Return type

pandas.DataFrame

Notes

The amount of noise can be tuned to crudely simulate the creation of synthetic data.