llm.ClassificationLLM

llm.ClassificationLLM(self, model_name=config['llm']['llm_model_name'], llm=None, embedding_handler=None, max_tokens=1600, temperature=0.0, verbose=False, openai_api_key=None)

Wraps the logic for using an LLM to classify respondent’s data based on provided index. Includes direct (one-shot) generative llm method and Retrieval Augmented Generation (RAG).

Parameters

Name Type Description Default
model_name str Name of the model. Defaults to the value in the config file. Used if no LLM object is passed. config['llm']['llm_model_name']
llm LLM LLM to use. Optional. None
embedding_handler EmbeddingHandler Embedding handler. Optional. If None a default embedding handler is retrieved based on config file. None
max_tokens int Maximum number of tokens to generate. Defaults to 1600. 1600
temperature float Temperature of the LLM model. Defaults to 0.0. 0.0
verbose bool Whether to print verbose output. Defaults to False. False
openai_api_key str OpenAI API key. Optional, but needed for OpenAI models. None

Methods

Name Description
get_sic_code Generates a SIC classification based on respondent’s data
get_soc_code Generates a SOC classification based on respondent’s data
rag_general_code Generates a classification answer based on respondent’s data
rag_sic_code Generates a SIC classification based on respondent’s data using RAG approach.

get_sic_code

llm.ClassificationLLM.get_sic_code(industry_descr, job_title, job_description)

Generates a SIC classification based on respondent’s data using a whole condensed index embedded in the query.

Parameters

Name Type Description Default
industry_descr str Description of the industry. required
job_title str Title of the job. required
job_description str Description of the job. required

Returns

Type Description
SicResponse Generated response to the query.

get_soc_code

llm.ClassificationLLM.get_soc_code(job_title, job_description, level_of_education, manage_others, industry_descr)

Generates a SOC classification based on respondent’s data using a whole condensed index embedded in the query.

Parameters

Name Type Description Default
job_title str The title of the job. required
job_description str The description of the job. required
level_of_education str The level of education required for the job. required
manage_others bool Indicates whether the job involves managing others. required
industry_descr str The description of the industry. required

Returns

Type Description
SocResponse The generated response to the query.

Raises

Type Description
ValueError If there is an error parsing the response from the LLM model.

rag_general_code

llm.ClassificationLLM.rag_general_code(respondent_data, candidates_limit=7)

Generates a classification answer based on respondent’s data using RAG and custom index.

Parameters

Name Type Description Default
respondent_data dict A dictionary containing respondent data. required
candidates_limit int The maximum number of candidate codes to consider. Defaults to 7. 7

Returns

Type Description
RagResponse The generated classification response to the query.

Raises

Type Description
ValueError If there is an error during the parsing of the response.
ValueError If the default embedding handler is required but not loaded correctly.

rag_sic_code

llm.ClassificationLLM.rag_sic_code(industry_descr, job_title=None, job_description=None, expand_search_terms=True, code_digits=5, candidates_limit=5)

Generates a SIC classification based on respondent’s data using RAG approach.

Parameters

Name Type Description Default
industry_descr str The description of the industry. required
job_title str The job title. Defaults to None. None
job_description str The job description. Defaults to None. None
expand_search_terms bool Whether to expand the search terms to include job title and description. Defaults to True. True
code_digits int The number of digits in the generated SIC code. Defaults to 5. 5
candidates_limit int The maximum number of SIC code candidates to consider. Defaults to 5. 5

Returns

Type Description
SicResponse The generated response to the query.

Raises

Type Description
ValueError If there is an error during the parsing of the response.
ValueError If the default embedding handler is required but not loaded correctly.