2026-02-26
Theme: Introduction to Large Language Models
Goals:
Position LLMs within AI, ML, and NLP
Build intuition for tokens, next-token prediction, and attention
Understand knowledge bases, chunking, embeddings, and semantic retrieval
AI: systems performing tasks that look intelligent.
ML: models learn patterns from data.
NLP: ML for human language.
LLMs: large NLP models that generate and reason over text.
Text is split into tokens (words, pieces of words, punctuation).
For each step, the model predicts the next most likely token.
It repeats this process token-by-token to build a full response.
Token demo:
"Business Services and Administration Managers"
→ ["Business", " Services", " and", " Administration", " Managers"]Next-token example after “Business Services and”:
| Candidate token | Probability |
|---|---|
Administration |
0.51 |
Managers |
0.23 |
Workers |
0.08 |
Professionals |
0.06 |
Transformers can process token relationships efficiently.
Attention helps the model decide which earlier words matter most for the next token.
This is why wording and context influence outputs.
Example sentence:
"Code 1211 covers finance managers, and it includes detailed unit-group descriptions."
When predicting “it”, attention weights may focus on:
Code 1211 (high)finance managers (high)unit-group descriptions (medium)Takeaway:
Embeddings turn text into vectors that represent meaning.
Texts with similar meaning are close in vector space.
Cosine similarity measures closeness between vectors.
This enables meaning-based retrieval beyond exact keyword matches.
A knowledge base is a trusted collection of documents for your domain.
Examples: ISCO-08 group titles, definitions, task descriptions, and unit groups.
The quality of the KB strongly affects answer quality.
In practice: better context usually beats clever prompting.
| Strong KB | Weak KB |
|---|---|
| Official ISCO-08 codes and definitions | Unverified occupation labels |
| Group hierarchy (major→sub-major→minor→unit) | Flat list without structure |
| Definition + usual tasks + included unit groups | Code-only records |
| Clear document ownership | Unknown provenance |
Large documents are split into smaller pieces called chunks.
Chunking makes retrieval faster and more precise.
Good chunks are focused, coherent, and contain enough context.
If chunks are too large, retrieval gets noisy; if too small, meaning is lost.
Example:
| Chunk size choice | Typical issue |
|---|---|
| Too large | Multiple topics mixed together |
| Too small | Loses definitions/table context |
| Balanced | One idea + enough context |
Practical tip:
Vector search retrieves chunks whose embeddings are closest to the query embedding.
Semantic search finds related meaning, not just matching words.
Keyword search asks: “Do terms match?”
Semantic search asks: “Does the meaning match?”
In practice, semantic retrieval helps when users phrase questions differently from source text.
Mini comparison:
| User query | Keyword search | Semantic search |
|---|---|---|
| “people who direct operations and financial planning” | May miss exact ISCO wording | Finds 1211 definition chunk |
| “administration managers in business services” | Needs close term overlap | Matches title + task chunks semantically |