Week 3: LLMs, RAG, and why they matter for official statistics

2026-04-21

Welcome

LLMs, RAG, and why they matter for official statistics

Preliminary session 3

This session builds a shared mental model for the rest of the course.

StatsChat and the course

StatsChat at a glance

  • developed by ONS with KNBS
  • lets users ask questions over official reports
  • combines retrieval and generation
  • used here as a running example for the course

What this course is trying to do

Main focus

  • understand the pipeline
  • use systems like StatsChat critically
  • think about adaptation for your own NSO

Later in the course

  • retrieval in more detail
  • generation in more detail
  • evaluation and limitations
  • local adaptation questions

Today’s aims

  • understand what an LLM is
  • understand what RAG adds
  • see why grounding matters for official statistics
  • prepare for later sessions on retrieval, generation, and evaluation

Why this matters

If someone asks a question about a statistical report, what do we want?

  • clear
  • correct
  • recent
  • traceable to a source

Quick check

What matters most in an answer about official statistics?

A. Fluency
B. Correctness
C. Recentness
D. Traceable source
E. All of the above

What is an LLM?

A practical definition

A large language model is a model trained on large amounts of text that generates language by predicting likely next tokens.

Powerful, but not magical.

What LLMs often do well

Providing information

  • summarise
  • explain
  • answer questions

Generating and editing text

  • draft
  • rewrite
  • transform text

Useful Concepts

Prompt

The instruction or input we give the model.

Context

The text the model sees when answering.

Token

A token is a piece of text the model processes.

For us

unemployment
= 1 word

For a model

Possible tokens:
un | employment

The exact split depends on the model. The main point is that tokens are not always the same as words.

Plain LLMs can still go wrong

  • fluent but wrong
  • no trusted source
  • outdated information
  • overconfident answer

Why that matters here

In official statistics, we often need answers that are:

  • grounded in a known document
  • easy to verify
  • clear about definitions, metadata, and important caveats
  • sensitive to recent publications

So what problem does RAG solve?

RAG = retrieval + generation

A RAG system retrieves relevant source material before asking the model to answer.

LLM vs RAG

Plain LLM workflow

RAG workflow

Quick check

What makes a RAG workflow different?

A. The model is always larger
B. The system retrieves relevant source material before generation
C. The model is trained again for each question
D. The answer is guaranteed to be correct

Why RAG matters for official statistics

Often needed

  • trusted reports
  • recent publications
  • definitions, metadata, and important caveats

Often useful

  • source grounding
  • easier verification
  • organisation-specific knowledge

Important caution

RAG helps, but it does not guarantee correctness

  • retrieval may miss the best evidence
  • the wrong passage may be retrieved
  • the model may still misread the source
  • the source itself may be incomplete or ambiguous

Worked example

Question

According to the National Statistics Office’s Consumer Price Bulletin, what was the year-on-year inflation rate in March 2025, and which categories contributed most to the increase?

Use this example to compare a plain LLM-style answer with a grounded answer.

Plain LLM-style answer

Inflation in March 2025 was around 5.8%, mainly driven by food prices and transport costs. Housing-related costs may also have contributed. This suggests inflation remained elevated during the period.

What to notice

  • sounds plausible
  • fairly fluent
  • partly specific
  • no clear source
  • slightly inaccurate

Grounded / RAG-style answer

According to the Consumer Price Bulletin, March 2025, the year-on-year inflation rate was 6.2%, up from 5.8% in February 2025.

The bulletin states that the increase was mainly driven by food and non-alcoholic beverages, transport, and housing, water, electricity, gas and other fuels.

What to notice

  • cites the source
  • more precise
  • easier to verify
  • includes measure type
  • still not guaranteed correct

Which answer would you trust more?

Compare them using

  • source grounding
  • specificity
  • traceability

And still ask

  • what could still go wrong?
  • what would you want to verify?

StatsChat connection

StatsChat is one example of this broader pattern:

  • a user asks a question
  • the system retrieves relevant material
  • the model answers using that material

Later sessions will unpack these steps in more detail.

Optional deeper dive

A simple mental model of an LLM

  • code that runs the model
  • parameters / weights learned from training
  • input text goes in
  • next-token predictions come out
  • repeated predictions generate text

A useful intuition: the structure is understandable, but the learned parameters are extremely large and complex.

Optional deeper dive

From base model to assistant

  • Pretraining builds broad language ability
  • Post-training makes the model more useful as an assistant
  • Tools can extend what the overall system can do
  • RAG is one way of adding retrieval to the system

Key takeaways

  • LLMs are powerful, but fluency is not the same as trustworthiness
  • RAG adds retrieval before generation
  • grounding matters for official statistics
  • RAG helps, but it is not magical

Between now and the lab

Please complete the short task on the course page.

Be ready to discuss:

  • which answer you trusted more
  • why
  • where a RAG-style system could help in your organisation