UNECA StatsChat course

Published

8 June 2026

Welcome to the UNECA StatsChat course!

This course will introduce StatsChat as a practical example of a Retrieval-Augmented Generation (RAG) system for official statistics. Across the course, we will look at how a system like this works, how it can be used appropriately, and how similar approaches could potentially be adapted in different NSO contexts.

The course will have two main parts:

3 preliminary sessions, intended to give some helpful foundation in Git, Python, and LLM / RAG basics
6 core sessions, centred on understanding and adapting a RAG system using StatsChat as the main example The preliminary sessions are designed as support sessions, particularly for anyone who would find it helpful to refresh or build confidence in these areas before the main course begins.

Course Structure

Primers

Week	Topic	Highlights
Week 1	GitHub and repo orientation	How to navigate a repository, read documentation, understand branches and pull requests, and use the repo as technical documentation.
Week 2	Python and command-line basics	A light introduction to files, folders, JSON, running scripts, packages, and the role of Python in the StatsChat pipeline.
Week 3	LLMs and RAG foundations	A shared mental model of LLMs, grounding, hallucination risk, and why retrieval matters for official statistics.

Core sessions

Week	Topic	Highlights
Week 4	Introduction to StatsChat and RAG	What problem StatsChat solves; end-to-end workflow; what makes this a RAG system; where it fits in an NSO setting.
Week 5	Documents and ingestion	How reports become machine-readable inputs; PDF challenges; metadata; source tracking; what would change for HTML, Word, spreadsheets, or APIs.
Week 6	Chunking, embeddings, and vector search	Why the system does not query raw reports directly; how chunking and embeddings create a searchable representation.
Week 7	Retrieval and evidence selection	How relevant passages are found; reranking; thresholds; recentness; how weak retrieval affects final answers.
Week 8	Generation, prompting, and grounded answers	What the LLM does with retrieved context; answer formatting; confidence, refusal, and traceability.
Week 9	Evaluation, limitations, and local adaptation	Failure modes; evaluation approaches; readiness for deployment; adapting StatsChat for local documents, languages, and users.