Code: Import methods and initialise
from sic_soc_llm import setup_logging, get_config
from sic_soc_llm.embedding import EmbeddingHandler
from sic_soc_llm.llm import ClassificationLLM
= setup_logging('sic_classifier')
logger = get_config() config
Demonstration notebook for the ClassificationLLM
using Retrieval Augmented Generation (RAG) with Standard Industrial Classification (SIC) codes.
For the retrieval part of the RAG based SIC classification a correctly populated vector store is required. By default the EmbeddingHandler
would load SIC
data structure with all its activities using files specified in the sic_soc_llm_config.toml
. This may take several minutes.
For more details about the SIC
data structure and the data files required for it, see the SIC data structure tutorial.
As we have already initialised the EmbeddingHandler
we can pass it to the ClassificationLLM
object; this is not essential as the ClassificationLLM
will initialise its own EmbeddingHandler
if one is not provided (based on the same config values). Note that the sic_demo_llm
should be replaced with the LLM of your choice.
Load a few examples of possible survey responses and classify them using the SIC classifier.
sic_examples = [
{
"industry_descr": "we provide care to thousands of patients across north east lincolnshire",
"job_title": "anaesthetist",
"job_description": "give anaesthetics for surgical, medical and psychiatric procedures"
},
{
"industry_descr": "we catch fish on the north sea from grimsby port",
"job_title": None,
"job_description": None
},
{
"industry_descr": "bitcoin trading",
"job_title": None,
"job_description": None
},
{
"industry_descr": "we match tutors to pupils for extra help outside of school",
"job_title": None,
"job_description": "help gcse and a level students achieve the best possible results"
},
]
for item in sic_examples:
# Get response from LLM
response, short_list, call_dict = sic_llm.rag_sic_code(
industry_descr = item["industry_descr"],
job_title = item["job_title"],
job_description = item["job_description"],
)
# Print the output
print("Input:")
for v, w in item.items():
print(f" {v}: {w}")
print('')
print("Response:")
for x,y in response.__dict__.items():
print (f" {x}: {y}")
print("")
print('===========================================')
print("")
Input:
industry_descr: we provide care to thousands of patients across north east lincolnshire
job_title: anaesthetist
job_description: give anaesthetics for surgical, medical and psychiatric procedures
Response:
codable: True
followup: None
sic_code: 86101
sic_descriptive: Hospital activities
sic_candidates: [SicCandidate(sic_code='86101', sic_descriptive='Hospital activities', likelihood=0.9), SicCandidate(sic_code='86220', sic_descriptive='Specialist medical practice activities', likelihood=0.1)]
reasoning: The company's main activity is providing care to patients, which aligns with the 'Hospital activities' SIC code. The job title and description also suggest a hospital setting. However, there is a small possibility that the company could fall under 'Specialist medical practice activities' as the job title is a specialist role.
===========================================
Input:
industry_descr: we catch fish on the north sea from grimsby port
job_title: None
job_description: None
Response:
codable: True
followup: None
sic_code: 03110
sic_descriptive: Marine fishing
sic_candidates: [SicCandidate(sic_code='03110', sic_descriptive='Marine fishing', likelihood=1.0)]
reasoning: The company's main activity is described as 'catching fish on the north sea from grimsby port', which aligns with the 'Marine fishing' category under SIC code 03110.
===========================================
Input:
industry_descr: bitcoin trading
job_title: None
job_description: None
Response:
codable: True
followup: None
sic_code: 66190
sic_descriptive: Other activities auxiliary to financial services, except insurance and pension funding
sic_candidates: [SicCandidate(sic_code='66190', sic_descriptive='Other activities auxiliary to financial services, except insurance and pension funding', likelihood=0.7), SicCandidate(sic_code='64191', sic_descriptive='Banks', likelihood=0.2), SicCandidate(sic_code='64991', sic_descriptive='Security dealing on own account', likelihood=0.1)]
reasoning: The company's main activity is bitcoin trading, which falls under 'Other activities auxiliary to financial services, except insurance and pension funding'. However, it could also potentially fall under 'Banks' or 'Security dealing on own account', but these are less likely.
===========================================
Input:
industry_descr: we match tutors to pupils for extra help outside of school
job_title: None
job_description: help gcse and a level students achieve the best possible results
Response:
codable: True
followup: None
sic_code: 85590
sic_descriptive: Other education nec
sic_candidates: [SicCandidate(sic_code='85590', sic_descriptive='Other education nec', likelihood=0.9), SicCandidate(sic_code='85600', sic_descriptive='Educational support activities', likelihood=0.1)]
reasoning: The company's main activity of matching tutors to pupils for extra help outside of school aligns with the 'Other education nec' category (SIC code 85590). The job description of helping GCSE and A level students achieve the best possible results further supports this classification. The 'Educational support activities' category (SIC code 85600) could also be a possibility, but is less likely given the specific tutoring focus of the company.
===========================================