Week 9: Introduction to Large Language Models

2026-02-26

Welcome to Week 9

Theme: Introduction to Large Language Models

Goals:

  • Position LLMs within AI, ML, and NLP

  • Build intuition for tokens, next-token prediction, and attention

  • Understand knowledge bases, chunking, embeddings, and semantic retrieval

AI, ML & NLP Foundations

  • AI: systems performing tasks that look intelligent.

  • ML: models learn patterns from data.

  • NLP: ML for human language.

  • LLMs: large NLP models that generate and reason over text.

Hierarchy view:

AI
└── ML
  └── NLP
    └── LLMs

Examples:

  • AI: chatbots
  • ML: classification model
  • NLP: translation/summarisation
  • LLM: ChatGPT

How LLMs Generate Text

  • Text is split into tokens (words, pieces of words, punctuation).

  • For each step, the model predicts the next most likely token.

  • It repeats this process token-by-token to build a full response.

Token demo:

"Business Services and Administration Managers"
→ ["Business", " Services", " and", " Administration", " Managers"]

Next-token example after “Business Services and”:

Candidate token Probability
Administration 0.51
Managers 0.23
Workers 0.08
Professionals 0.06

Transformer and Attention

  • Transformers can process token relationships efficiently.

  • Attention helps the model decide which earlier words matter most for the next token.

  • This is why wording and context influence outputs.

Example sentence:

"Code 1211 covers finance managers, and it includes detailed unit-group descriptions."

When predicting “it”, attention weights may focus on:

  • Code 1211 (high)
  • finance managers (high)
  • unit-group descriptions (medium)

Takeaway:

  • Attention is a relevance mechanism inside the model.

Embeddings & Similarity

  • Embeddings turn text into vectors that represent meaning.

  • Texts with similar meaning are close in vector space.

  • Cosine similarity measures closeness between vectors.

  • This enables meaning-based retrieval beyond exact keyword matches.

Cosine similarity cosine illustration

Cosine similarity illustration

Knowledge Base (KB)

  • A knowledge base is a trusted collection of documents for your domain.

  • Examples: ISCO-08 group titles, definitions, task descriptions, and unit groups.

  • The quality of the KB strongly affects answer quality.

  • In practice: better context usually beats clever prompting.

Strong KB Weak KB
Official ISCO-08 codes and definitions Unverified occupation labels
Group hierarchy (major→sub-major→minor→unit) Flat list without structure
Definition + usual tasks + included unit groups Code-only records
Clear document ownership Unknown provenance

Chunking Documents

  • Large documents are split into smaller pieces called chunks.

  • Chunking makes retrieval faster and more precise.

  • Good chunks are focused, coherent, and contain enough context.

  • If chunks are too large, retrieval gets noisy; if too small, meaning is lost.

Example:

Chunk size choice Typical issue
Too large Multiple topics mixed together
Too small Loses definitions/table context
Balanced One idea + enough context

Practical tip:

  • Keep each chunk tied to code + title + definition (and keep task text together).

Let’s Dive Into The Live Session