Preprocessing¶
-
ips_python.preprocessing.
get_wordnet_pos
(word)[source]¶ Map POS tag to first character lemmatize() accepts
-
ips_python.preprocessing.
preprocess_pipeline
(df)[source]¶ Default process for taking the raw IATI data dump and processing the text for vectorizing
- Parameters
df – dataframe of the raw IATI data with columns including identifier, description and title
- Returns
dataframe of with preprocessed data with _only_ the columns IATI_IDENTIFIER_COLUMN_NAME and DESCRIPTION_COLUMN_NAME
Modeling¶
-
ips_python.vectorize.
create_tfidf_term_document_matrix
(preprocessed_text_dataframe)[source]¶ return a vectorizer object, TFIDF term document matrix and list of words
- input:
preprocessed_text_dataframe: dataframe of preprocessed text with ‘description’ column
- output:
tuple: vectorizer, term_document_matrix, word_list
Results Refinement¶
-
ips_python.refinement.
process_results
(initial_result_df, full_iati_records, number_of_results=100)[source]¶ This is an example of Google style.
- Parameters
param1 – This is the first param.
param2 – This is a second param.
- Returns
This is a description of what is returned.
- Raises
KeyError – Raises an exception.