Week 3: LLMs, RAG, and why they matter for official statistics

2026-04-21

Welcome

LLMs, RAG, and why they matter for official statistics

Preliminary session 3

This session builds a shared mental model for the rest of the course.

StatsChat and the course

StatsChat at a glance

developed by ONS with KNBS
lets users ask questions over official reports
combines retrieval and generation
used here as a running example for the course

What this course is trying to do

Main focus

understand the pipeline
use systems like StatsChat critically
think about adaptation for your own NSO

Later in the course

retrieval in more detail
generation in more detail
evaluation and limitations
local adaptation questions

Today’s aims

understand what an LLM is
understand what RAG adds
see why grounding matters for official statistics
prepare for later sessions on retrieval, generation, and evaluation

Why this matters

If someone asks a question about a statistical report, what do we want?

clear
correct
recent
traceable to a source

Quick check

What matters most in an answer about official statistics?

A. Fluency
B. Correctness
C. Recentness
D. Traceable source
E. All of the above

What is an LLM?

A practical definition

A large language model is a model trained on large amounts of text that generates language by predicting likely next tokens.

Powerful, but not magical.

What LLMs often do well

Providing information

summarise
explain
answer questions

Generating and editing text

draft
rewrite
transform text

Useful Concepts

Prompt

The instruction or input we give the model.

Context

The text the model sees when answering.

Token

A token is a piece of text the model processes.

For us

unemployment
= 1 word

For a model

Possible tokens:
un | employment

The exact split depends on the model. The main point is that tokens are not always the same as words.

Plain LLMs can still go wrong

fluent but wrong
no trusted source

outdated information
overconfident answer

Why that matters here

In official statistics, we often need answers that are:

grounded in a known document
easy to verify
clear about definitions, metadata, and important caveats
sensitive to recent publications

So what problem does RAG solve?

RAG = retrieval + generation

A RAG system retrieves relevant source material before asking the model to answer.

LLM vs RAG

Plain LLM workflow

RAG workflow

Quick check

What makes a RAG workflow different?

A. The model is always larger
B. The system retrieves relevant source material before generation
C. The model is trained again for each question
D. The answer is guaranteed to be correct

Why RAG matters for official statistics

Often needed

trusted reports
recent publications
definitions, metadata, and important caveats

Often useful

source grounding
easier verification
organisation-specific knowledge

Important caution

RAG helps, but it does not guarantee correctness

retrieval may miss the best evidence
the wrong passage may be retrieved
the model may still misread the source
the source itself may be incomplete or ambiguous

Worked example

Question

According to the National Statistics Office’s Consumer Price Bulletin, what was the year-on-year inflation rate in March 2025, and which categories contributed most to the increase?

Use this example to compare a plain LLM-style answer with a grounded answer.

Plain LLM-style answer

Inflation in March 2025 was around 5.8%, mainly driven by food prices and transport costs. Housing-related costs may also have contributed. This suggests inflation remained elevated during the period.

What to notice

sounds plausible
fairly fluent
partly specific
no clear source
slightly inaccurate

Grounded / RAG-style answer

According to the Consumer Price Bulletin, March 2025, the year-on-year inflation rate was 6.2%, up from 5.8% in February 2025.

The bulletin states that the increase was mainly driven by food and non-alcoholic beverages, transport, and housing, water, electricity, gas and other fuels.

What to notice

cites the source
more precise
easier to verify
includes measure type
still not guaranteed correct

Which answer would you trust more?

Compare them using

source grounding
specificity
traceability

And still ask

what could still go wrong?
what would you want to verify?

StatsChat connection

StatsChat is one example of this broader pattern:

a user asks a question
the system retrieves relevant material
the model answers using that material

Later sessions will unpack these steps in more detail.

Optional deeper dive

A simple mental model of an LLM

code that runs the model
parameters / weights learned from training
input text goes in
next-token predictions come out
repeated predictions generate text

A useful intuition: the structure is understandable, but the learned parameters are extremely large and complex.

Optional deeper dive

From base model to assistant

Pretraining builds broad language ability
Post-training makes the model more useful as an assistant
Tools can extend what the overall system can do
RAG is one way of adding retrieval to the system

Key takeaways

LLMs are powerful, but fluency is not the same as trustworthiness
RAG adds retrieval before generation
grounding matters for official statistics
RAG helps, but it is not magical

Between now and the lab

Please complete the short task on the course page.

Be ready to discuss:

which answer you trusted more
why
where a RAG-style system could help in your organisation