DS205 2025-2026 Winter Term Icon

πŸ““ Syllabus

DS205 (2025/26 Winter Term)

Check this page every week, as some details may change.

Last updated: 03 March 2026, 18:00 GMT

πŸ—“οΈ Week 01 19 Jan 2026
-
23 Jan 2026
Python Skills Refresher (Prerequisites & Recap Guide)

Python Skills Refresher (Prerequisites & Recap Guide)
🎞️ Slides

Introduction to the course, logistics, and environment setup. Working with the Open Food Facts API to refresh pandas skills and understand REST API fundamentals.

Hands-on Practice with Open Food Facts API
πŸ›£οΈ Roadmap Tutorial

Building foundational skills in API consumption and data processing with pandas.

We love hearing from you! Don’t hesitate to contact us for help.

  • Slack: Post questions in the #help channel. Jon will check messages daily.

  • πŸ’¬ Office Hours: Wednesdays, 2:00 pm - 5:00 pm (bookable via StudentHub).

We will announce additional support options and sessions as the term progresses.

πŸ—“οΈ Week 02 26 Jan 2026
-
30 Jan 2026
Introduction to Web Scraping with Scrapy

Introduction to Web Scraping with Scrapy
🎞️ Slides πŸ–₯️ Live Demo

Understanding web document structure, XPath and CSS selectors. Introduction to the Scrapy framework for web scraping.

Practising XPath and CSS Selectors with Scrapy (UK Supermarket)
πŸ›£οΈ Roadmap Tutorial

Hands-on scraping practice with live product data from UK Supermarket ecommerce website. Goal: scrape a diverse set of information from a single webpage (first a listing page, then a single product page).

πŸ—’οΈ Note: in the πŸ’» W02 Lab and πŸ’» W03 Lab, your goal is to mirror what you saw in the lecture on the day before, only this time for a different website. Rather than working on collecting Wikipedia pages (as I will be doing), you will be working on collecting product data from UK Supermarket. You should do this on the dedicated GitHub repository for your Problem Set 1. That’s right, you are already building your Problem Set 1 from Week 02!

Problem Set 1 instructions will be released this week.

Due: Thursday, 26 February 2026, 8pm UK time (Week 06). Worth 20% of final grade.

Problem Set 1 is designed to be built incrementally throughout the term. Starting from Week 02, each lab session will guide you through completing a specific part of the project. You’ll work on structured lab activities during class time, with take-home components to prepare for the following week. Each week builds on the previous one, so stay on track with the weekly lab work.

πŸ—“οΈ Week 03 02 Feb 2026
-
06 Feb 2026
Crawling with Scrapy and Dynamic Content

Crawling with Scrapy and Introduction to dynamic content with Selenium
πŸ–₯️ Live Demo

Building scalable scrapers with Scrapy pipelines and handling pagination. Introducing Selenium for dynamic content scraping when Scrapy alone is not enough.

Building a Complete Scrapy Spider for UK Supermarket
πŸ›£οΈ Roadmap Tutorial

Implementing a full scraping pipeline with data cleaning and storage. Goal: to navigate and scrape a website rather than just scrape a single page. It’s up to you to decide whether to use Scrapy alone or to use Scrapy in combination with Selenium. It all depends on how the website (UK Supermarket) is structured.

πŸ—“οΈ Week 04 09 Feb 2026
-
13 Feb 2026
Building Collaborative APIs with FastAPI

Building Collaborative APIs with FastAPI
🎞️ Slides

Introduction to FastAPI, Pydantic v2, and building APIs for data sharing. Collaborative development patterns.

FastAPI Development and Docker Introduction
πŸ›£οΈ Roadmap Tutorial

Building APIs and using Docker to resolve environment conflicts.

Problem Set 1: Scrapy spider + FastAPI (peer hand-off)

Collaborative project where you build your own web scrapers but write the API for the project of another student.

Important: The web scraping component of Problem Set 1 (which you’ve been building in Weeks 02-03) will be graded as a formative exercise. This portion will be assessed directly by Jon.

πŸ—“οΈ Week 05 16 Feb 2026
-
20 Feb 2026
Data Pipelines and API Design

Data Pipelines and API Design
πŸ–₯️ Live Demo 🦸🏻 Super Tech Support

Data pipeline architecture (scraped, enriched, API layers), pipeline orchestration with click, and improving API schemas with Pydantic Field constraints and FastAPI Query parameters.

Super Tech Support: Problem Set 1 Working Session
🦸🏻 Super Tech Support

Dedicated working time for Problem Set 1. Bring your code and your partner’s repository; Jon will circulate to help with API design, enrichment logic, and collaboration workflow.

πŸ—“οΈ Week 06 23 Feb 2026
-
27 Feb 2026
Reading Week

No lecture or lab this week.

Use this time to catch up on coursework and prepare for the second half of the term.

We will announce additional support sessions this week to help you finalise Problem Set 1.

Problem Set 1 Due: Thursday, 26 February 2026, 8pm UK time

Worth 20% of final grade. Submission via GitHub. Includes peer hand-off component.

πŸ—“οΈ Week 07 02 Mar 2026
-
06 Mar 2026
From Food to Climate: Pipelines, Automation, and TPI

From Food to Climate: Pipelines, Automation, and TPI
πŸ–₯️ Live Demo πŸ—£οΈ Guest Speakers

Naming what you built in W01-W05 (ETL/ELT vocabulary, pipeline design principles), automating pipelines with GitHub Actions, and debugging with VS Code. Guest speakers: Ruikai Liu (Jorb.ai, former DS205 student) on system design and vibe coding, and the TPI Centre team on their climate assessment workflows and the CLEAR RAG system.

Building a Click Pipeline and Wiring It to GitHub Actions
πŸ›£οΈ Roadmap Tutorial

Build a skeleton Click CLI that defines your pipeline stages for ✍️ Problem Set 2, run it locally, then create a GitHub Actions workflow that runs the same commands on a remote machine. Read the PS2 brief, browse TPI corporate pages, and choose your sector and companies.

✍️ Problem Set 2 released this week.

Build a RAG pipeline using TPI Centre corporate disclosure data. Choose a sector (Food Producers, Electrical Utilities, or Diversified Mining) and at least two companies. Worth 40% of final grade.

Due: Thursday, 26 March 2026, 8pm UK time (Week 10).

πŸ—“οΈ Week 08 09 Mar 2026
-
13 Mar 2026
PDF Extraction and Introduction to Embeddings

PDF Extraction and Introduction to Embeddings
🎞️ Slides πŸ–₯️ Live Demo

Extracting text from PDFs with unstructured, handling tables and mixed layouts. Introduction to word embeddings (Word2Vec) and transformer-based sentence embeddings.

Extracting and Embedding TPI Corporate Disclosures
πŸ›£οΈ Roadmap Tutorial

Extract text from your chosen companies’ PDFs using unstructured. Inspect what comes out. Start generating embeddings with sentence-transformers. Directly applicable to ✍️ Problem Set 2.

πŸ—“οΈ Week 09 16 Mar 2026
-
20 Mar 2026
Chunking and Vector Search

Chunking Strategies and Vector Search with ChromaDB
🎞️ Slides πŸ–₯️ Live Demo

How to split extracted text into chunks suitable for retrieval. Storing and querying embeddings with ChromaDB. Evaluating retrieval quality.

Building a Search System for Climate Disclosures
πŸ›£οΈ Roadmap Tutorial

Chunk your extracted text, store embeddings in ChromaDB, and build retrieval that finds relevant passages for your ✍️ Problem Set 2 driving questions.

πŸ—“οΈ Week 10 23 Mar 2026
-
27 Mar 2026
Retrieval-Augmented Generation

Retrieval-Augmented Generation with Open-Source Models
🎞️ Slides 🦸🏻 Super Tech Support

Connecting retrieval to generation: prompt construction, using HuggingFace models for question answering, and evaluating RAG pipeline outputs. Dedicated support time for ✍️ Problem Set 2 completion.

RAG Pipeline Completion Workshop
πŸ›£οΈ Roadmap Tutorial 🦸🏻 Super Tech Support

Add the generation step to your pipeline. Evaluate results against the driving questions. Polish documentation. Dedicated support time for ✍️ Problem Set 2 submission.

✍️ Problem Set 2 Due: Thursday, 26 March 2026, 8pm UK time

RAG pipeline for TPI Centre Carbon Performance data. Worth 40% of final grade.

πŸ—“οΈ Week 11 30 Mar 2026
-
03 Apr 2026
Final Project Launch: Capstone Projects with TPI

Final Project Launch: Capstone Projects with TPI
🎞️ Slides

Final project requirements and pre-defined capstone topics. Building on Problem Set 2 skills at group scale.

Final Project Planning and Q&A
🦸🏻 Super Tech Support

Form project groups, discuss capstone topics, and plan your approach.

Final Project: Group capstone project with TPI Centre

Due in Spring Term, Thursday 21 May 2026, 8pm. Group work worth 40% of final grade.