Week 3: LLMs, RAG, and why they matter for official statistics
2026-04-21
Welcome
LLMs, RAG, and why they matter for official statistics
Preliminary session 3
This session builds a shared mental model for the rest of the course.
StatsChat and the course
StatsChat at a glance
- developed by ONS with KNBS
- lets users ask questions over official reports
- combines retrieval and generation
- used here as a running example for the course
What this course is trying to do
Main focus
- understand the pipeline
- use systems like StatsChat critically
- think about adaptation for your own NSO
Later in the course
- retrieval in more detail
- generation in more detail
- evaluation and limitations
- local adaptation questions
Today’s aims
- understand what an LLM is
- understand what RAG adds
- see why grounding matters for official statistics
- prepare for later sessions on retrieval, generation, and evaluation
Why this matters
If someone asks a question about a statistical report, what do we want?
- clear
- correct
- recent
- traceable to a source
Quick check
What matters most in an answer about official statistics?
A. Fluency
B. Correctness
C. Recentness
D. Traceable source
E. All of the above
What is an LLM?
A practical definition
A large language model is a model trained on large amounts of text that generates language by predicting likely next tokens.
Powerful, but not magical.
What LLMs often do well
- summarise
- explain
- answer questions
Generating and editing text
- draft
- rewrite
- transform text
Useful Concepts
Prompt
The instruction or input we give the model.
Context
The text the model sees when answering.
Token
A token is a piece of text the model processes.
For us
unemployment
= 1 word
For a model
Possible tokens:
un | employment
The exact split depends on the model. The main point is that tokens are not always the same as words.
Plain LLMs can still go wrong
- fluent but wrong
- no trusted source
- outdated information
- overconfident answer
Why that matters here
In official statistics, we often need answers that are:
- grounded in a known document
- easy to verify
- clear about definitions, metadata, and important caveats
- sensitive to recent publications
So what problem does RAG solve?
RAG = retrieval + generation
A RAG system retrieves relevant source material before asking the model to answer.
LLM vs RAG
Plain LLM workflow
RAG workflow
Quick check
What makes a RAG workflow different?
A. The model is always larger
B. The system retrieves relevant source material before generation
C. The model is trained again for each question
D. The answer is guaranteed to be correct
Why RAG matters for official statistics
Often needed
- trusted reports
- recent publications
- definitions, metadata, and important caveats
Often useful
- source grounding
- easier verification
- organisation-specific knowledge
Important caution
RAG helps, but it does not guarantee correctness
- retrieval may miss the best evidence
- the wrong passage may be retrieved
- the model may still misread the source
- the source itself may be incomplete or ambiguous
Worked example
Question
According to the National Statistics Office’s Consumer Price Bulletin, what was the year-on-year inflation rate in March 2025, and which categories contributed most to the increase?
Use this example to compare a plain LLM-style answer with a grounded answer.
Plain LLM-style answer
Inflation in March 2025 was around 5.8%, mainly driven by food prices and transport costs. Housing-related costs may also have contributed. This suggests inflation remained elevated during the period.
What to notice
- sounds plausible
- fairly fluent
- partly specific
- no clear source
- slightly inaccurate
Grounded / RAG-style answer
According to the Consumer Price Bulletin, March 2025, the year-on-year inflation rate was 6.2%, up from 5.8% in February 2025.
The bulletin states that the increase was mainly driven by food and non-alcoholic beverages, transport, and housing, water, electricity, gas and other fuels.
What to notice
- cites the source
- more precise
- easier to verify
- includes measure type
- still not guaranteed correct
Which answer would you trust more?
Compare them using
- source grounding
- specificity
- traceability
And still ask
- what could still go wrong?
- what would you want to verify?
StatsChat connection
StatsChat is one example of this broader pattern:
- a user asks a question
- the system retrieves relevant material
- the model answers using that material
Later sessions will unpack these steps in more detail.
Optional deeper dive
A simple mental model of an LLM
- code that runs the model
- parameters / weights learned from training
- input text goes in
- next-token predictions come out
- repeated predictions generate text
A useful intuition: the structure is understandable, but the learned parameters are extremely large and complex.
Optional deeper dive
From base model to assistant
- Pretraining builds broad language ability
- Post-training makes the model more useful as an assistant
- Tools can extend what the overall system can do
- RAG is one way of adding retrieval to the system
Key takeaways
- LLMs are powerful, but fluency is not the same as trustworthiness
- RAG adds retrieval before generation
- grounding matters for official statistics
- RAG helps, but it is not magical
Between now and the lab
Please complete the short task on the course page.
Be ready to discuss:
- which answer you trusted more
- why
- where a RAG-style system could help in your organisation