UNECA StatsChat course

Published

5 May 2026

Welcome to the UNECA StatsChat course!

This course will introduce StatsChat as a practical example of a Retrieval-Augmented Generation (RAG) system for official statistics. Across the course, we will look at how a system like this works, how it can be used appropriately, and how similar approaches could potentially be adapted in different NSO contexts.

The course will have two main parts:

Course Structure

Primers

Week Topic Highlights
Week 1 GitHub and repo orientation How to navigate a repository, read documentation, understand branches and pull requests, and use the repo as technical documentation.
Week 2 Python and command-line basics A light introduction to files, folders, JSON, running scripts, packages, and the role of Python in the StatsChat pipeline.
Week 3 LLMs and RAG foundations A shared mental model of LLMs, grounding, hallucination risk, and why retrieval matters for official statistics.

Core sessions

Week Topic Highlights
Week 4 Introduction to StatsChat and RAG What problem StatsChat solves; end-to-end workflow; what makes this a RAG system; where it fits in an NSO setting.
Week 5 Documents and ingestion How reports become machine-readable inputs; PDF challenges; metadata; source tracking; what would change for HTML, Word, spreadsheets, or APIs.
Week 6 Chunking, embeddings, and vector search Why the system does not query raw reports directly; how chunking and embeddings create a searchable representation.
Week 7 Retrieval and evidence selection How relevant passages are found; reranking; thresholds; recentness; how weak retrieval affects final answers.
Week 8 Generation, prompting, and grounded answers What the LLM does with retrieved context; answer formatting; confidence, refusal, and traceability.
Week 9 Evaluation, limitations, and local adaptation Failure modes; evaluation approaches; readiness for deployment; adapting StatsChat for local documents, languages, and users.