embedding.EmbeddingHandler
embedding.EmbeddingHandler(self, embedding_model_name=config['llm']['embedding_model_name'], db_dir=config['llm']['db_dir'], k_matches=20)
Handles embedding operations for the Chroma vector store.
Parameters
Name | Type | Description | Default |
---|---|---|---|
embedding_model_name |
str | The name of the embedding model to use. Defaults to the value specified in the configuration file. | config['llm']['embedding_model_name'] |
db_dir |
str | The directory where the vector store database is located. Defaults to the value specified in the configuration file. If None then the embedding db will be non-persistent. | config['llm']['db_dir'] |
k_matches |
int | The number of nearest matches to retrieve. Defaults to 20. | 20 |
Methods
Name | Description |
---|---|
embed_index | Embeds the index entries into the vector store. |
search_index | Returns k document chunks with the highest relevance to the query. |
search_index_multi | Returns k document chunks with the highest relevance to the query. |
embed_index
embedding.EmbeddingHandler.embed_index(from_empty=True, sic=None, file_object=None)
Embeds the index entries into the vector store.
Parameters
Name | Type | Description | Default |
---|---|---|---|
from_empty |
bool | Whether to drop the current vector store content and start fresh. | True |
sic |
SIC | The SIC hierarchy object. If None, the hierarchy is loaded from files specified in the config. | None |
file_object |
StringIO object | The index file as StringIO object. If provided, the file will be read by line and embedded. Each line has expected format of code: description | None |
search_index
embedding.EmbeddingHandler.search_index(query, return_dicts=True)
Returns k document chunks with the highest relevance to the query.
Parameters
Name | Type | Description | Default |
---|---|---|---|
query |
str | Question for which most relevant index entries will be returned. | required |
return_dicts |
bool | If True, data returned as list of dictionaries, otherwise as document tuples. Defaults to True. | True |
Returns
Type | Description |
---|---|
list[dict] | List[dict]: List of top k index entries by relevance. |
search_index_multi
embedding.EmbeddingHandler.search_index_multi(query)
Returns k document chunks with the highest relevance to the query.
Parameters
Name | Type | Description | Default |
---|---|---|---|
query |
list[str] | List of query fields (in priority order) for which most relevant index entries will be returned. e.g [industry_descr, job_title, job_descr] | required |
Returns
Type | Description |
---|---|
list[dict] | List[dict]: List of top k index entries by relevance. |