Preprocessing

ips_python.preprocessing.get_wordnet_pos(word)[source]

Map POS tag to first character lemmatize() accepts

ips_python.preprocessing.preprocess_pipeline(df)[source]

Default process for taking the raw IATI data dump and processing the text for vectorizing

Parameters

df – dataframe of the raw IATI data with columns including identifier, description and title

Returns

dataframe of with preprocessed data with _only_ the columns IATI_IDENTIFIER_COLUMN_NAME and DESCRIPTION_COLUMN_NAME

Modeling

ips_python.vectorize.create_tfidf_term_document_matrix(preprocessed_text_dataframe)[source]

return a vectorizer object, TFIDF term document matrix and list of words

input:

preprocessed_text_dataframe: dataframe of preprocessed text with ‘description’ column

output:

tuple: vectorizer, term_document_matrix, word_list

ips_python.vectorize.vectorize_input_text(processed_query_dataframe, vectorizer)[source]
input:

processed_query_text: dataframe of processed user text vectorizer: TfidfVectorizer object

output:

numpy array of vectorized user input

ips_python.cosine.get_cosine_similarity(processed_user_query_vector, term_document_matrix, iati_records)[source]
input:

TDM IATI Records used in TDM vectorized query

output:

cosine similarity > 0 per iati.identifier

Results Refinement

ips_python.refinement.process_results(initial_result_df, full_iati_records, number_of_results=100)[source]

This is an example of Google style.

Parameters
  • param1 – This is the first param.

  • param2 – This is a second param.

Returns

This is a description of what is returned.

Raises

KeyError – Raises an exception.

Runtime Code

ips_python.script.download_data()[source]

this is a placeholder function to show that we need to run something in order to procure the data

Package Utilities

ips_python.utils.get_data_path()[source]

Return the absolute filepath of the data directory

Should work consistently across OS

ips_python.utils.get_input_path()[source]

Return the absolute filepath of the data directory

Should work consistently across OS

ips_python.utils.get_timestamp_string_prefix()[source]

return the date and time as a string in the format 2019_10_14_18_14_11