llm.ClassificationLLM
llm.ClassificationLLM(self, model_name=config['llm']['llm_model_name'], llm=None, embedding_handler=None, max_tokens=1600, temperature=0.0, verbose=False, openai_api_key=None)
Wraps the logic for using an LLM to classify respondent’s data based on provided index. Includes direct (one-shot) generative llm method and Retrieval Augmented Generation (RAG).
Parameters
model_name |
str |
Name of the model. Defaults to the value in the config file. Used if no LLM object is passed. |
config['llm']['llm_model_name'] |
llm |
LLM |
LLM to use. Optional. |
None |
embedding_handler |
EmbeddingHandler |
Embedding handler. Optional. If None a default embedding handler is retrieved based on config file. |
None |
max_tokens |
int |
Maximum number of tokens to generate. Defaults to 1600. |
1600 |
temperature |
float |
Temperature of the LLM model. Defaults to 0.0. |
0.0 |
verbose |
bool |
Whether to print verbose output. Defaults to False. |
False |
openai_api_key |
str |
OpenAI API key. Optional, but needed for OpenAI models. |
None |
Methods
get_sic_code |
Generates a SIC classification based on respondent’s data |
get_soc_code |
Generates a SOC classification based on respondent’s data |
rag_general_code |
Generates a classification answer based on respondent’s data |
rag_sic_code |
Generates a SIC classification based on respondent’s data using RAG approach. |
get_sic_code
llm.ClassificationLLM.get_sic_code(industry_descr, job_title, job_description)
Generates a SIC classification based on respondent’s data using a whole condensed index embedded in the query.
Parameters
industry_descr |
str |
Description of the industry. |
required |
job_title |
str |
Title of the job. |
required |
job_description |
str |
Description of the job. |
required |
Returns
SicResponse |
Generated response to the query. |
get_soc_code
llm.ClassificationLLM.get_soc_code(job_title, job_description, level_of_education, manage_others, industry_descr)
Generates a SOC classification based on respondent’s data using a whole condensed index embedded in the query.
Parameters
job_title |
str |
The title of the job. |
required |
job_description |
str |
The description of the job. |
required |
level_of_education |
str |
The level of education required for the job. |
required |
manage_others |
bool |
Indicates whether the job involves managing others. |
required |
industry_descr |
str |
The description of the industry. |
required |
Returns
SocResponse |
The generated response to the query. |
Raises
ValueError |
If there is an error parsing the response from the LLM model. |
rag_general_code
llm.ClassificationLLM.rag_general_code(respondent_data, candidates_limit=7)
Generates a classification answer based on respondent’s data using RAG and custom index.
Parameters
respondent_data |
dict |
A dictionary containing respondent data. |
required |
candidates_limit |
int |
The maximum number of candidate codes to consider. Defaults to 7. |
7 |
Returns
RagResponse |
The generated classification response to the query. |
Raises
ValueError |
If there is an error during the parsing of the response. |
ValueError |
If the default embedding handler is required but not loaded correctly. |
rag_sic_code
llm.ClassificationLLM.rag_sic_code(industry_descr, job_title=None, job_description=None, expand_search_terms=True, code_digits=5, candidates_limit=5)
Generates a SIC classification based on respondent’s data using RAG approach.
Parameters
industry_descr |
str |
The description of the industry. |
required |
job_title |
str |
The job title. Defaults to None. |
None |
job_description |
str |
The job description. Defaults to None. |
None |
expand_search_terms |
bool |
Whether to expand the search terms to include job title and description. Defaults to True. |
True |
code_digits |
int |
The number of digits in the generated SIC code. Defaults to 5. |
5 |
candidates_limit |
int |
The maximum number of SIC code candidates to consider. Defaults to 5. |
5 |
Returns
SicResponse |
The generated response to the query. |
Raises
ValueError |
If there is an error during the parsing of the response. |
ValueError |
If the default embedding handler is required but not loaded correctly. |