embedding.EmbeddingHandler

embedding.EmbeddingHandler(self, embedding_model_name=config['llm']['embedding_model_name'], db_dir=config['llm']['db_dir'], k_matches=20)

Handles embedding operations for the Chroma vector store.

Parameters

Name	Type	Description	Default
`embedding_model_name`	str	The name of the embedding model to use. Defaults to the value specified in the configuration file.	`config['llm']['embedding_model_name']`
`db_dir`	str	The directory where the vector store database is located. Defaults to the value specified in the configuration file. If None then the embedding db will be non-persistent.	`config['llm']['db_dir']`
`k_matches`	int	The number of nearest matches to retrieve. Defaults to 20.	`20`

Name	Description
embed_index	Embeds the index entries into the vector store.
search_index	Returns k document chunks with the highest relevance to the query.
search_index_multi	Returns k document chunks with the highest relevance to the query.

embedding.EmbeddingHandler.embed_index(from_empty=True, sic=None, file_object=None)

Embeds the index entries into the vector store.

Name	Type	Description	Default
`from_empty`	bool	Whether to drop the current vector store content and start fresh.	`True`
`sic`	SIC	The SIC hierarchy object. If None, the hierarchy is loaded from files specified in the config.	`None`
`file_object`	StringIO object	The index file as StringIO object. If provided, the file will be read by line and embedded. Each line has expected format of code: description	`None`

embedding.EmbeddingHandler.search_index(query, return_dicts=True)

Returns k document chunks with the highest relevance to the query.

Name	Type	Description	Default
`query`	str	Question for which most relevant index entries will be returned.	required
`return_dicts`	bool	If True, data returned as list of dictionaries, otherwise as document tuples. Defaults to True.	`True`

Type	Description
list[dict]	List[dict]: List of top k index entries by relevance.

embedding.EmbeddingHandler.search_index_multi(query)

Returns k document chunks with the highest relevance to the query.

Name	Type	Description	Default
`query`	list[str]	List of query fields (in priority order) for which most relevant index entries will be returned. e.g [industry_descr, job_title, job_descr]	required

Type	Description
list[dict]	List[dict]: List of top k index entries by relevance.