Week 9: Introduction to Large Language Models

2026-02-26

Welcome to Week 9

Theme: Introduction to Large Language Models

Goals:

Hierarchy view:

AI
└── ML
  └── NLP
    └── LLMs

Examples:

Token demo:

"Business Services and Administration Managers"
→ ["Business", " Services", " and", " Administration", " Managers"]

Next-token example after “Business Services and”:

Transformers can process token relationships efficiently.
Attention helps the model decide which earlier words matter most for the next token.
This is why wording and context influence outputs.

Example sentence:

"Code 1211 covers finance managers, and it includes detailed unit-group descriptions."

When predicting “it”, attention weights may focus on:

Takeaway:

A knowledge base is a trusted collection of documents for your domain.
Examples: ISCO-08 group titles, definitions, task descriptions, and unit groups.
The quality of the KB strongly affects answer quality.
In practice: better context usually beats clever prompting.

Strong KB	Weak KB
Official ISCO-08 codes and definitions	Unverified occupation labels
Group hierarchy (major→sub-major→minor→unit)	Flat list without structure
Definition + usual tasks + included unit groups	Code-only records
Clear document ownership	Unknown provenance

Example:

Chunk size choice	Typical issue
Too large	Multiple topics mixed together
Too small	Loses definitions/table context
Balanced	One idea + enough context

Practical tip:

Keep each chunk tied to code + title + definition (and keep task text together).

Vector search retrieves chunks whose embeddings are closest to the query embedding.
Semantic search finds related meaning, not just matching words.
Keyword search asks: “Do terms match?”
Semantic search asks: “Does the meaning match?”
In practice, semantic retrieval helps when users phrase questions differently from source text.

Mini comparison:

User query	Keyword search	Semantic search
“people who direct operations and financial planning”	May miss exact ISCO wording	Finds `1211` definition chunk
“administration managers in business services”	Needs close term overlap	Matches title + task chunks semantically