🖥️ Week 03, Day 02 - Lecture

A Primer on Web Scraping + Project Support Workshop

Author

Dr Jon Cardoso-Silva

Last updated

31 July 2025

🥅 Learning Objectives
By the end of this session, you should be able to: i) Understand when web scraping is appropriate versus APIs, ii) Parse HTML content using scrapy, iii) Apply ethical scraping practices including robots.txt checking, iv) Organise your final project with professional structure and templates, v) Make significant progress on your project with individual support.
ME204 course icon

You asked for more project support time, so that’s what we’re doing today. Most of this session helps you make progress on your 📦 Final Project.

We’ll start with a quick look at web scraping (useful to know, but APIs come first). Then we’ll spend the rest of the time helping you organise and troubleshoot your projects.

Tuesday, 29 July 2025 | 10:00am - 1:00pm 📍 Location: CKK.2.06 (see LSE’s 🗺️ campus map)

Hour 1: Web Scraping Curiosity Demo

10:00 – 11:00

Quick Web Scraping Demo

Get the notebook for the web scraping demonstration:

Save it to your me204-study-notes repository or find it in Nuvolos under lab-notebooks/.

Sometimes you need data but there’s no API available. Web scraping can help in those situations. We’ll show you the basics so you know it’s an option.

Key Concepts

  • When to scrape: APIs first, scraping as backup
  • Ethical scraping: robots.txt, respectful requests
  • HTML parsing: Understanding web page structure
  • Scrapy basics: Python tools for scraping
This Won’t Be Assessed

We’re showing you web scraping because it’s useful to know about. Your project work is what matters most.

Hour 2: 🦸 Super Tech Support Session

11:00 – 12:00

Individual Project Help

This hour is for your specific project problems. Bring your questions about:

  • NB01 issues: API authentication, data collection problems
  • Data strategy: Choosing appropriate sources and methods
  • Technical problems: Python errors, Git issues, data processing
  • Project planning: Scope, timeline, and feasibility

How This Works

I’ll walk around the room helping individuals. Bring your laptop with your project open. Be ready to show me what you’re working on and where you’re stuck.

Hour 3: Project Organisation Templates

12:00 – 13:00

Project Templates and Organisation

Your midterm work showed some common organisational issues. We’ll give you templates to fix these:

  • Clean notebook structure: Report mode vs tutorial mode
  • Better documentation: README best practices
  • File organisation: Proper folder structure
  • Git habits: What to commit and what to ignore

Template Downloads

Get the professional templates for your final project:

Use These Templates

These templates fix the structural problems we saw in your midterm work. Use them as starting points for your final project notebooks.

After the Session

This afternoon is more 🦸 Super Tech Support time. Come with specific questions and your project materials ready.

🦸 Afternoon Support

Individual project help and troubleshooting.

➡️ Bring Your Projects

Questions?

➡️ Ask on Slack

🔗 Extra Resources: Web Scraping

Useful references if you want to explore further.