embedding.EmbeddingHandler

embedding.EmbeddingHandler(self, embedding_model_name=config['llm']['embedding_model_name'], db_dir=config['llm']['db_dir'], k_matches=20)

Handles embedding operations for the Chroma vector store.

Parameters

Name Type Description Default
embedding_model_name str The name of the embedding model to use. Defaults to the value specified in the configuration file. config['llm']['embedding_model_name']
db_dir str The directory where the vector store database is located. Defaults to the value specified in the configuration file. If None then the embedding db will be non-persistent. config['llm']['db_dir']
k_matches int The number of nearest matches to retrieve. Defaults to 20. 20

Methods

Name Description
embed_index Embeds the index entries into the vector store.
search_index Returns k document chunks with the highest relevance to the query.
search_index_multi Returns k document chunks with the highest relevance to the query.

embed_index

embedding.EmbeddingHandler.embed_index(from_empty=True, sic=None, file_object=None)

Embeds the index entries into the vector store.

Parameters

Name Type Description Default
from_empty bool Whether to drop the current vector store content and start fresh. True
sic SIC The SIC hierarchy object. If None, the hierarchy is loaded from files specified in the config. None
file_object StringIO object The index file as StringIO object. If provided, the file will be read by line and embedded. Each line has expected format of code: description None

search_index

embedding.EmbeddingHandler.search_index(query, return_dicts=True)

Returns k document chunks with the highest relevance to the query.

Parameters

Name Type Description Default
query str Question for which most relevant index entries will be returned. required
return_dicts bool If True, data returned as list of dictionaries, otherwise as document tuples. Defaults to True. True

Returns

Type Description
list[dict] List[dict]: List of top k index entries by relevance.

search_index_multi

embedding.EmbeddingHandler.search_index_multi(query)

Returns k document chunks with the highest relevance to the query.

Parameters

Name Type Description Default
query list[str] List of query fields (in priority order) for which most relevant index entries will be returned. e.g [industry_descr, job_title, job_descr] required

Returns

Type Description
list[dict] List[dict]: List of top k index entries by relevance.