perform
matching.perform
Functions for performing the matching itself.
Functions
Name | Description |
---|---|
add_private_index | Add anonymous match index to input datasets. |
calculate_performance | Calculate the performance of the match by counting the positives. |
perform_matching | Initiate the data, get similarities, and match the rows. |
add_private_index
matching.perform.add_private_index(df1, df2, match, size_assumed=10000, colname='private_index')
Add anonymous match index to input datasets.
The match index assigns indices to both matched and unmatched records, so that they are indistinguishable. It doesn’t leak any info about the other dataset.
add_private_index only works with unique one-to-one matches. This is because there is no way to match many-to-one without leaking information about the successful matches.
Parameters
Name | Type | Description | Default |
---|---|---|---|
df1 |
pandas.pandas.DataFrame | A dataset. | required |
df2 |
pandas.pandas.DataFrame | Another dataset. | required |
match |
tuple[numpy.numpy.ndarray, numpy.numpy.ndarray] | A pair of matched indices, with no repeated indices. | required |
size_assumed |
int | The assumed maximum size of each dataset. Default is 10,000. | 10000 |
colname |
str | A column name for the new index. By default "private_index" . |
'private_index' |
Returns
Type | Description |
---|---|
df1, df2: pd.DataFrame | The same as input data, with private matching index added. |
calculate_performance
matching.perform.calculate_performance(data_1, data_2, match)
Calculate the performance of the match by counting the positives.
Performance metrics are sent to the logger.
Parameters
Name | Type | Description | Default |
---|---|---|---|
data_1 |
pandas.pandas.DataFrame | Data frame for PARTY1 . |
required |
data_2 |
pandas.pandas.DataFrame | Data frame for PARTY2 . |
required |
match |
tuple | Tuple of indices of matched pairs between the data frames. | required |
perform_matching
matching.perform.perform_matching(data_1, data_2, embedder)
Initiate the data, get similarities, and match the rows.
Parameters
Name | Type | Description | Default |
---|---|---|---|
data_1 |
pandas.pandas.DataFrame | Data frame for PARTY1 . |
required |
data_2 |
pandas.pandas.DataFrame | Data frame for PARTY2 . |
required |
embedder |
pprl.embedder.embedder.Embedder | Instance used to embed both data frames. | required |
Returns
Type | Description |
---|---|
pandas.pandas.DataFrame | Output for PARTY1 . |
pandas.pandas.DataFrame | Output for PARTY2 . |