synthgauge.datasets
Functions for creating toy datasets.
Module Contents
Functions
|
Create a toy dataset about blood types and physical atrtibutes. |
- synthgauge.datasets.make_blood_types_df(noise=0, nan_prop=0, seed=None)[source]
Create a toy dataset about blood types and physical atrtibutes.
This function is used to create data for the package’s examples and its tests. Its outputs are not intended to imply or be used for any meaningful data analysis.
- Parameters
noise (float) – Standard deviation of the Gaussian noise added to the data. Default is zero (no noise) and must be non-negative.
nan_prop (float, default 0) – Proportion of dataset to replace with missing values.
seed (int, optional) – Seed used by all random samplers. Used for reproducibility.
- Returns
data – A toy “blood type” dataset.
- Return type
pandas.DataFrame
Notes
The amount of noise can be tuned to crudely simulate the creation of synthetic data.