DS205 2025-2026 Winter Term Icon

πŸ–₯️ Week 10 Lecture

From Retrieval to Generation

Author

Dr Jon Cardoso-Silva

Published

30 March 2026

πŸ₯… Learning Goals

By the end of this lecture, you should be able to: i) explain what tokenisation does and why it matters for context windows, ii) describe encoder-only, encoder-decoder, and decoder-only transformer architectures, iii) use the HuggingFace text-generation pipeline with a chat template, iv) add a generation step to your RAG pipeline with traceable citations, v) diagnose whether a wrong answer is a retrieval problem or a generation problem.

πŸ“ Logistics

πŸ“Location: Monday, 23 March 2026, 4-6 pm at SAL.G.03

This page is slides-first. Use the deck below during and after class.

πŸ“‹ Preparation

  • You attended the πŸ–₯️ W09 Lecture and πŸ’» W09 Lab.
  • You have chunks stored in ChromaDB and a Recall@5 baseline for your PS2 data.
  • You can run the W10 notebooks in the rag environment.

πŸ—£οΈ What we will cover in this lecture

  • How language models turn text into numbers (tokenisation, BPE, token budgets).
  • How transformers work: attention, encoder-only vs decoder-only architectures.
  • The HuggingFace text-generation pipeline: chat templates, key parameters.
  • Adding citations to RAG output (the NotebookLM pattern).
  • Cross-encoder reranking as a second retrieval stage.
  • Prompt engineering: minimal vs strict system messages, extractive-style prompting.
  • Live notebook demo: full pipeline from retrieval to cited answer.
  • Diagnosing failures: retrieval problem vs generation problem.

πŸ““ Lecture Materials

🎬 Facilitation Slides

Use keyboard arrows to navigate. You can also open the deck in fullscreen.

πŸ“₯ Lecture Notebooks

Two notebooks accompany this lecture. NB00 benchmarks chunking strategies and retrieval configurations. NB01 adds the generation step.

πŸ”– Appendix