DS205 2025-2026 Winter Term Icon

πŸ—£οΈ Week 02 Lecture

From APIs to Web Scraping

Author

Dr Jon Cardoso-Silva

Published

26 January 2026

Last Updated: 25 January 2026

This lecture builds on Week 01 and moves from API requests to web scraping. We will focus on how the Web works, how HTML is structured, and how to choose between Scrapy and Selenium.

πŸ“ Session Details

  • Date: Monday, 26 January 2026
  • Time: 16:00 - 18:00
  • Location: SAL.G.03

πŸ“‹ Preparation

  1. Review the πŸ—£οΈ Week 01 Lecture if the Web concepts feel rusty.
  2. Download the lecture notebook so you can follow the live demo.

πŸ—£οΈ Lecture Structure

We will cover:

  1. How the Web works: Internet vs Web, request and response, and why pages load in parts.
  2. HTML, CSS, JavaScript: What these layers do and how they shape scraping.
  3. Scrapy fundamentals: Why Scrapy is the default tool and how it differs from requests.
  4. Static vs dynamic content: Why some pages return empty HTML.
  5. Selenium on Nuvolos: Why setup matters in the two-container environment.

🎬 Lecture Slides

Use keyboard arrows to navigate. Select the slides below or view fullscreen.

πŸ““ Lecture Notebook

Download the lecture notebook:

Download lecture notebook

πŸ§ͺ Brief Practical Demonstration

During the lecture, I will:

  • Inspect a Wikipedia page and build selectors step by step.
  • Use read_html to pull a table, then discuss when that fails.
  • Move the logic into a minimal Scrapy spider.

βœ… Final Thoughts

Tomorrow’s πŸ’» W02 Lab focuses on Selenium setup and a dynamic scraping task. Problem Set 1 follows after the lab.

πŸŽ₯ Session Recording

Typically, the recordings are made available on Moodle in the afternoon. I’ll update this section once the recording is available.