🖥️ Week 09 Lecture
Chunking, Embeddings, and Retrieval
By the end of this lecture, you should be able to: i) compare two chunking strategies, ii) run a regex baseline for target language, iii) compare Word2Vec, similarity MiniLM, and Q&A MiniLM retrieval, iv) interpret Recall@5 results and choose a model for PS2.
📍 Logistics
📍Location: Monday, 16 March 2026, 4-6 pm at SAL.G.03
This page is slides-first. Use the deck below during and after class.
📋 Preparation
- You attended the 🖥️ W08 Lecture and 💻 W08 Lab.
- You have a first extraction workflow running for your PS2 PDFs.
- You can run the W09 notebooks in the
ragenvironment.

Could we ask a small but important favour? The LSE runs a course survey every term, and your feedback genuinely shapes how this module is taught next year. It takes about 3 minutes.
💡 Note: Please assess all the instructors you have interacted with
(Jon counts as a teacher too!).
Last updated: 14 March 2026
🗣️ What we will cover on this lecture
- Chunking strategy A vs B and why boundaries change retrieval outcomes.
- Regex baseline and manual reference IDs for target statements.
- Word2Vec vs sentence-transformer vs Q&A retrieval comparison.
- Recall@5 interpretation and model-selection decisions for PS2.
📓 Lecture Materials
🎬 Facilitation Slides
Use keyboard arrows to navigate. You can also open the deck in fullscreen.
📥 Lecture Notebook
🔖 Appendix
Useful links
- 💻 W09 Lab
- ✍️ Problem Set 2
- 📔 Syllabus
Notebook and model links