1. SIC data structure

Demonstration notebook for the SIC data structure.

Code: Import methods and initialise
import random

from sic_soc_llm import setup_logging, get_config
from sic_soc_llm.data_models import sic_hierarchy, sic_data_access

logger = setup_logging("sic_data_notebook")
config = get_config()
seed = 3847693223

There are two additional datasets required for the SIC hierarchy object that are not part of the repository. These are the SIC structure and SIC index datasets. The following code will download these datasets from the ONS website if they are not already available.

Code: Make sure all required SIC datasets are available
import requests
from pathlib import Path

sic_urls = [
    "https://www.ons.gov.uk/file?uri=/methodology/classificationsandstandards/ukstandardindustrialclassificationofeconomicactivities/uksic2007/publisheduksicsummaryofstructureworksheet.xlsx",
    "https://www.ons.gov.uk/file?uri=/methodology/classificationsandstandards/ukstandardindustrialclassificationofeconomicactivities/uksic2007/uksic2007indexeswithaddendumdecember2022.xlsx"
]

file_paths = [
    Path(config['lookups']['sic_structure']),
    Path(config["lookups"]["sic_index"])
]

for url, file_path in zip(sic_urls, file_paths):
    if not file_path.exists():
        r = requests.get(url)
        file_path.parent.mkdir(exist_ok=True, parents=True)
        with open(file_path, 'wb') as outfile:
            outfile.write(r.content)

Load SIC index

Code: Load SIC index
sic_index_filepath = config["lookups"]["sic_index"]
sic_index_df = sic_data_access.load_sic_index(sic_index_filepath)

sic_index_df.sample(5, random_state=seed)
uk_sic_2007 activity
2773 77120 Commercial vehicle (light) hire (without driver)
14807 22290 Trays made of plastic (manufacture)
13042 24410 Silver (manufacture)
8325 20600 Man-made staple fibres, not carded, combed or ...
8684 49319 Metropolitan scheduled passenger land transpor...

Load SIC structure

Code: Load SIC structure
sic_structure_filepath = config["lookups"]["sic_structure"]
sic_df = sic_data_access.load_sic_structure(sic_structure_filepath)

sic_df.sample(5, random_state=seed)
description section most_disaggregated_level level_headings
694 Other retail sale of new goods in specialised ... G 47789 Sub Class
803 Other software publishing J 58290 Class
507 Water collection, treatment and supply E 360 Group
286 Manufacture of ceramic sanitary fixtures C 23420 Class
512 Waste collection, treatment and disposal activ... E 38 Division

Create SIC hierarchy

Code: Create SIC hierarchy
sic = sic_hierarchy.load_hierarchy(sic_df, sic_index_df)

print(f"There are {len(sic):,} entries in the hierarcy")
There are 1,187 entries in the hierarcy

Example lookup

Supports a variety of common formatting patterns for SIC. Sometimes 4-digit SIC serve as 5-digit SIC

Code: Example lookup
print(sic["A011xx"])
print(sic["A011"])
print(sic["011"])
print(sic["01.1"])

print(sic["A0111x"])
print(sic["0111"])
print(sic["01110"])
01.1: "Growing of non-perennial crops"
01.1: "Growing of non-perennial crops"
01.1: "Growing of non-perennial crops"
01.1: "Growing of non-perennial crops"
01.11: "Growing of cereals (except rice), leguminous crops and oil seeds"
01.11: "Growing of cereals (except rice), leguminous crops and oil seeds"
01.11: "Growing of cereals (except rice), leguminous crops and oil seeds"

Select a random example

Code: Example SIC index entry
random.seed(seed)
sic_node = random.choice(sic.nodes)

sic_node.print_all()
32.99: "Other manufacturing nec"
Section: C
Parent: 32.9: "Manufacturing nec"
Children: []

detail=
includes=["manufacture of protective safety equipment: fire-resistant and protective safety clothing linemen's safety belts and other belts for occupational use cork life preservers plastics hard hats and other personal safety equipment of plastics  fire-fighting protection suits metal safety headgear and other metal personal safety devices ear and noise plugs (e.g. for swimming and noise protection) gas masks", 'manufacture of pens and pencils of all kinds whether or not mechanical', 'manufacture of pencil leads', 'manufacture of date, sealing or numbering stamps, hand-operated devices for printing, or embossing labels, hand printing sets, prepared typewriter ribbons and inked pads', 'manufacture of globes', 'manufacture of umbrellas, sun-umbrellas, walking sticks, seat-sticks', 'manufacture of buttons, press-fasteners, snap-fasteners, press-studs, slide fasteners', 'manufacture of cigarette lighters', 'manufacture of articles of personal use: smoking pipes, combs, hair slides, scent sprays, vacuum flasks and other vacuum vessels for personal or household use, wigs, false beards, eyebrows', "manufacture of miscellaneous articles: candles, tapers and the like; artificial flowers, fruit and foliage; jokes and novelties; hand sieves and hand riddles; tailors' dummies; burial coffins etc.", 'manufacture of floral baskets, bouquets, wreaths and similar articles', 'taxidermy activities']
excludes=['manufacture of lighter wicks', 'manufacture of workwear and service apparel (e.g. laboratory coats, work overalls, uniforms)', 'manufacture of paper novelties']

Activities:
    - Amber turning (manufacture)
    - Artificial flowers and fruit made of paper (manufacture)
    - Artificial flowers and fruit made of plastic (manufacture)
    - Artificial flowers and fruit made of textiles (manufacture)
    - Ballpoint pen and refill (manufacture)
    - Bedfolder (manufacture)
    - Bladder dressing (manufacture)
    - Boiler covering (not asbestos or slag wool) (manufacture)
    - Boiler packing (not asbestos or slag wool) (manufacture)
    - Bone working (manufacture)
    - Briar pipe (manufacture)
    - Buttons (manufacture)
    - Buttons made of glass (manufacture)
    - Candle (manufacture)
    - Carbon ribbon (manufacture)
    - Carnival article (manufacture)
    - Carry cot (manufacture)
    - Cartridge refill for fountain pen (manufacture)
    - Catgut (manufacture)
    - Chalk for drawing or writing (manufacture)
    - Cigarette lighter (manufacture)
    - Coffin board (manufacture)
    - Coffins (manufacture)
    - Collar stud (manufacture)
    - Combs (other than of hard rubber, plastic or metal) (manufacture)
    - Conjuring apparatus (manufacture)
    - Cork life preservers (manufacture)
    - Crayon (manufacture)
    - Cut, make, trim of fire-resistant and protective safety clothing, fee or contract basis (manufacture)
    - Cutlery handles made of horn, ivory, tortoise shell, etc. (manufacture)
    - Date sealing stamps (manufacture)
    - Date stamp and accessories (manufacture)
    - Devotional article (manufacture)
    - Ear and noise plugs (e.g. For swimming and noise protection) (manufacture)
    - Easel (manufacture)
    - Embossing devices (hand operated) for labels (manufacture)
    - False beard (manufacture)
    - False eyebrow (manufacture)
    - Feather curling (manufacture)
    - Feather ornament (manufacture)
    - Feather purifying (manufacture)
    - Feather sorting (manufacture)
    - Felt tipped pen (manufacture)
    - Fibre tipped pen (manufacture)
    - Fire resistant and protective safety clothing of leather (manufacture)
    - Fire-fighting protection suits (manufacture)
    - Firelighter (manufacture)
    - Fire-resistant and protective safety clothing (manufacture)
    - Flint for lighters (manufacture)
    - Fountain pen (manufacture)
    - Fountain pen nib (manufacture)
    - Gas masks (manufacture)
    - Gas masks (with mechanical parts or replaceable filters for protection against biological agents) (manufacture)
    - Gauntlet (protective) (manufacture)
    - Globes (manufacture)
    - Gut for musical instruments and sports goods (manufacture)
    - Gut scraping and spinning (manufacture)
    - Hair pad making (manufacture)
    - Hair preparation for wig making (manufacture)
    - Hair slides (manufacture)
    - Hand printing sets (manufacture)
    - Hand riddles (manufacture)
    - Hand sieves (manufacture)
    - Hard hats and other personal safety equipment of plastics (manufacture)
    - Horn and tortoise shell working (manufacture)
    - Horn pressing (manufacture)
    - Industrial protective headgear (manufacture)
    - Ink pad (manufacture)
    - Instruments for educational or exhibition purposes (manufacture)
    - Ivory working (manufacture)
    - Jokes and novelties (manufacture)
    - Life vests made of cork (manufacture)
    - Life vests non textile (manufacture)
    - Lifebelts (manufacture)
    - Lifebelts made of cork (manufacture)
    - Lifebuoy made of cork (manufacture)
    - Lifejacket made of cork (manufacture)
    - Lifejacket non textile (manufacture)
    - Lighter fuel in containers not exceeding 300cc (liquid or liquefied gas) (manufacture)
    - Linemen's safety belts and other belts for occupational use (manufacture)
    - Marker pen (manufacture)
    - Masks incorporating eye protection or a facial shield (manufacture)
    - Models for educational or exhibition purposes (manufacture)
    - Models for geographical use made of wax or plaster (manufacture)
    - Models made of plaster (manufacture)
    - Models made of wax (manufacture)
    - Natural sponge preparation (manufacture)
    - Nightlight (manufacture)
    - Numbering stamps (manufacture)
    - Parasol (manufacture)
    - Pastel (manufacture)
    - Pen nibs (manufacture)
    - Pencil (manufacture)
    - Pencil leads (manufacture)
    - Penholder (manufacture)
    - Pens for writing or drawing (manufacture)
    - Personal safety devices of metal (manufacture)
    - Plaster cast (manufacture)
    - Prepared typewriter ribbons (manufacture)
    - Press-fasteners (manufacture)
    - Press-studs (manufacture)
    - Printing devices (hand operated) (manufacture)
    - Propelling pencil (manufacture)
    - Protective gloves for industrial use (manufacture)
    - Protective headgear (manufacture)
    - Protective headgear for industrial use (manufacture)
    - Ribbon (inked) (manufacture)
    - Riding caps (manufacture)
    - Roller pens and refills (manufacture)
    - Safety headgear made of metal (manufacture)
    - Safety helmets made of plastic (manufacture)
    - Scale models (manufacture)
    - Scent sprays (manufacture)
    - Scientific models for educational and exhibition purposes (manufacture)
    - Sealing stamps (manufacture)
    - Seals for use with sealing wax (manufacture)
    - Seat-sticks (manufacture)
    - Slates for writing (manufacture)
    - Slide fasteners (manufacture)
    - Smokers' requisites (manufacture)
    - Smoking pipes (manufacture)
    - Snap fasteners (manufacture)
    - Sponge bleaching (manufacture)
    - Sponge trimming (manufacture)
    - Stamps made of rubber (manufacture)
    - Stylographic pen (manufacture)
    - Sun car (manufacture)
    - Sunshade (manufacture)
    - Sun-umbrellas (manufacture)
    - Tailors' chalk (manufacture)
    - Tailors' dummy (not plastic) (manufacture)
    - Tapers and the like (manufacture)
    - Taxidermy activities (manufacture)
    - Teaching aids (electronic) (manufacture)
    - Toothpicks made of bone (manufacture)
    - Trainer (electronic training equipment) (manufacture)
    - Typewriter ribbons (manufacture)
    - Umbrella (manufacture)
    - Uniform helmets (manufacture)
    - Vacuum flask (complete) (manufacture)
    - Vacuum jar (manufacture)
    - Vacuum vessels for personal or household use (manufacture)
    - Walking sticks (manufacture)
    - Whalebone cutting and splitting (manufacture)
    - Wig (manufacture)
    - Writing instrument sets (manufacture)