🖥️ Week 10 Lecture

From Retrieval to Generation

Author

Dr Jon Cardoso-Silva

Published

30 March 2026

🥅 Learning Goals

By the end of this lecture, you should be able to: i) explain what tokenisation does and why it matters for context windows, ii) describe encoder-only, encoder-decoder, and decoder-only transformer architectures, iii) use the HuggingFace text-generation pipeline with a chat template, iv) add a generation step to your RAG pipeline with traceable citations, v) diagnose whether a wrong answer is a retrieval problem or a generation problem.

📍 Logistics

📍Location: Monday, 23 March 2026, 4-6 pm at SAL.G.03

This page is slides-first. Use the deck below during and after class.

📋 Preparation

You attended the 🖥️ W09 Lecture and 💻 W09 Lab.
You have chunks stored in ChromaDB and a Recall@5 baseline for your PS2 data.
You can run the W10 notebooks in the rag environment.

🗣️ What we will cover in this lecture

How language models turn text into numbers (tokenisation, BPE, token budgets).
How transformers work: attention, encoder-only vs decoder-only architectures.
The HuggingFace text-generation pipeline: chat templates, key parameters.
Adding citations to RAG output (the NotebookLM pattern).
Cross-encoder reranking as a second retrieval stage.
Prompt engineering: minimal vs strict system messages, extractive-style prompting.
Live notebook demo: full pipeline from retrieval to cited answer.
Diagnosing failures: retrieval problem vs generation problem.

📓 Lecture Materials

🎬 Facilitation Slides

Use keyboard arrows to navigate. You can also open the deck in fullscreen.

📥 Lecture Notebooks

Two notebooks accompany this lecture. NB00 benchmarks chunking strategies and retrieval configurations. NB01 adds the generation step.

🔖 Appendix

Course links

Tokenisation and transformers

HuggingFace generation

Models used in this lecture

RAG and retrieval