π₯οΈ Week 10 Lecture
From Retrieval to Generation
By the end of this lecture, you should be able to: i) explain what tokenisation does and why it matters for context windows, ii) describe encoder-only, encoder-decoder, and decoder-only transformer architectures, iii) use the HuggingFace text-generation pipeline with a chat template, iv) add a generation step to your RAG pipeline with traceable citations, v) diagnose whether a wrong answer is a retrieval problem or a generation problem.
π Logistics
πLocation: Monday, 23 March 2026, 4-6 pm at SAL.G.03
This page is slides-first. Use the deck below during and after class.
π Preparation
- You attended the π₯οΈ W09 Lecture and π» W09 Lab.
- You have chunks stored in ChromaDB and a Recall@5 baseline for your PS2 data.
- You can run the W10 notebooks in the
ragenvironment.
π£οΈ What we will cover in this lecture
- How language models turn text into numbers (tokenisation, BPE, token budgets).
- How transformers work: attention, encoder-only vs decoder-only architectures.
- The HuggingFace
text-generationpipeline: chat templates, key parameters. - Adding citations to RAG output (the NotebookLM pattern).
- Cross-encoder reranking as a second retrieval stage.
- Prompt engineering: minimal vs strict system messages, extractive-style prompting.
- Live notebook demo: full pipeline from retrieval to cited answer.
- Diagnosing failures: retrieval problem vs generation problem.
π Lecture Materials
π¬ Facilitation Slides
Use keyboard arrows to navigate. You can also open the deck in fullscreen.
π₯ Lecture Notebooks
Two notebooks accompany this lecture. NB00 benchmarks chunking strategies and retrieval configurations. NB01 adds the generation step.
π Appendix
Course links
- π» W10 Lab
- βοΈ Problem Set 2
- π Syllabus
Tokenisation and transformers
HuggingFace generation
Models used in this lecture