π Syllabus (Draft)
DS205 (2024/25 Winter Term)
π£ Although Weeks 01-05 are confirmed, the details on rest of the syllabus are still subject to change.
Check this page every week for more information on studying for the course.
Last updated: 20 January 2025
ποΈ Week 01
20 Jan 2025 -
24 Jan 2025
π§βπ« Lecture
Introduction, Logistics, Course Overview, and pandas
Refresher.
Note: This course works with real data challenges through our partnership with the Transition Pathway Initiative (TPI). We will use TPIβs actual datasets or data that supports their automation needs. Their team will work closely with us during the Term, giving us feedback on your work from our first lecture onwards. This direct industry connection means your coursework solves genuine business problems.
π» Lab
Exploratory Data Analysis with Pandas and Seaborn
In our first lab, you will work with the latest ASCOR framework data, a set of indicators published by the TPI Centre to assess how countries are managing the low-carbon transition and the impacts of climate change.
π Practice
Dedicate 1 hour (or 2 if you are new to Git/GitHub) during the rest of the week to work on the π W01 Formative Exercise(link does not exist yet)!
This practice exercise flips the usual data science workflow, youβll convert tabular data into JSON format. This prepares you for next week, when we will learn how one can create APIs to serve data.
The exercise also gives newcomers hands-on practice with Git and GitHub.
π Support
Click here to see how to get help this week
We love hearing from you! Donβt hesitate to contact us for help.
In this first week, the best ways to get help are:
Slack: Post questions in the
#help
channel. Jon will check messages daily.π In-person support: Drop-in sessions TBC.
π¬ Office Hours: You can book a 15-minute slot on StudentHub with any of your instructors this week. This is when we host office hours:
- Jon: Thursday, 22 January 2025, from 11 am to 1 pm.
π§ Email: For administrative queries, contact (answered by Kevin), our Teaching & Assessment Support Officer.
ποΈ Week 02
27 Jan 2025 -
31 Jan 2025
π§βπ« Lecture
REST APIs and Introduction to FastAPI
Note: You will learn about the purpose of APIs, how they facilitate data communication, and the key components of designing user-centric APIs. We will also introduce FastAPI, a modern Python web framework for creating our own APIs.
π» Lab
Hands-On: Build and Deploy a Simple API
In this lab, you will create your first API using FastAPI. You will implement a simple common endpoint to serve ASCOR data and learn to test and debug your API. The lab also covers essential GitHub workflows such as branching, pull requests, and code reviews.
π Practice
(+ π Mini-competition)
Dedicate ~2 hours during the rest of the week to work on the π W02 Formative Exercise(link does not exist yet)!
This exercise focuses on submitting a pull request (PR) to implement your API endpoint. You will compete for the best solution, with the winner earning a small reward (e.g., a DSI-branded water bottle).
Key Skills Practiced: Writing clean and reusable code, collaborating through GitHub workflows, and handling feedback on your PR.
ποΈ Week 03
03 Feb 2025 -
07 Feb 2025
π§βπ« Lecture
Static Web Scraping with Scrapy
Note: This lecture introduces web scraping as a technique to access data when APIs are unavailable. We will cover the structure of web pages (HTML and DOM), the role of CSS/JavaScript, and demonstrate how to extract data using XPath and CSS Selectors. We will also discuss the ethics and legality of web scraping.
π» Lab
Hands-On: Practising XPath and CSS Selectors with Scrapy
In this lab, you will use Scrapy to extract data from a specific website selected for this exercise. You will design and test XPath/CSS Selectors to extract meaningful data and learn how to save it in a structured format. You will be encouraged to compare your results with your peers and discuss the challenges you faced.
π Practice
Dedicate ~2 hours during the rest of the week to work on the π W03 Formative Exercise(link does not exist yet)! (Available after the W03 Lecture)
This exercise requires you to submit a Scrapy spider script using the template provided. The script should extract specific data points from the selected website, demonstrating effective use of XPath/CSS Selectors and saving the data in a structured format (e.g., JSON or CSV).
Key Skills Practiced: Crafting XPath/CSS Selectors, writing scalable scraping scripts, and handling data challenges like missing content.
ποΈ Week 04
10 Feb 2025 -
14 Feb 2025
π§βπ« Lecture
Dynamic Web Scraping with Selenium
Note: This lecture introduces dynamic scraping techniques for handling JavaScript-rendered pages. We will discuss the limitations of static scraping tools, demonstrate Selenium for handling dynamic content, and explore headless versus standard browser modes.
π» Lab
Hands-On: Scraping Dynamic Content with Selenium
In this lab, you will practice scraping data from JavaScript-rendered pages using Selenium. The tasks include navigating dynamic content, handling authentication challenges, and saving extracted data in structured formats. You will also learn techniques to manage rate limits and avoid common scraping pitfalls.
π Practice
Dedicate ~2 hours during the rest of the week to work on the π W04 Formative Exercise(link does not exist yet)! (Available after the W04 Lab)
This exercise requires you to submit a Selenium script that extracts data from a dynamic website, handling JavaScript-rendered content and saving the data in a structured format like JSON or CSV.
Key Skills Practiced: Writing dynamic scraping scripts, handling authentication, managing rate limits, and working with structured data formats.
π‘ TIP: You really donβt want to skip this formative exercise! It forms the basis for the βοΈ W07 Summative Exercise(link does not exist yet).
ποΈ Week 05
17 Feb 2025 -
21 Feb 2025
π§βπ« Lecture
Crawlers, Domain Mapping, and Change Detection
Note: This lecture introduces the concept of web crawlers as tools for mapping websites and detecting changes over time. We will explore their practical applications, such as monitoring updates to sustainability reports or company data, and discuss ethical and technical challenges.
π» Lab
Hands-On: Building a Simple Crawler with Scrapy
In this lab, you will use Scrapy to create a crawler for a selected website. You will learn to map the structure of a domain, categorise its sections, and detect changes over time. This includes handling challenges like pagination, duplicate URLs, and dynamic content.
π£ Announcement
We will release the detailed instructions for the βοΈ W07 Summative Exercise(link does not exist yet)! (Available after W04)
Task: While the details will be announced in the W05 Lecture, you will have the choice between two types of tasks: 1) build a web scraper + an accompanying API, or 2) build a web crawler to map a domain and detect content changes.
Submission via GitHub. Deadline: 5 March 2025, 8 pm.
ποΈ Week 06
24 Feb 2025 -
28 Feb 2025
π Additional Sessions
There is no lecture or lab this week. Instead, we will hold additional drop-in sessions to help you with your work.
ποΈ Week 07
03 Mar 2025 -
07 Mar 2025
βοΈ Summative 1
Work on the βοΈ W07 Summative Exercise(link does not exist yet)! (Available after W04)
Build a web crawler to map a domain and detect content changes, emphasising scalable crawling techniques and ethical considerations. Apply your knowledge from Week 5 to implement this project.
Submission via GitHub. Deadline: 5 March 2025, 8 pm.
π§βπ« Lecture
Parsing PDFs and Websites: NLP Preprocessing and Word Embeddings
π» Lab
Hands-On: Extracting and Structuring Data from PDFs
ποΈ Week 08
10 Mar 2025 -
14 Mar 2025
ποΈ Week 09
17 Mar 2025 -
21 Mar 2025
ποΈ Week 10
24 Mar 2025 -
28 Mar 2025
βοΈ Summative
Submit your π W10 Summative Assessment(link does not exist yet) via GitHub! (Deadline: 26 Mar 2025, 8 pm)
This final summative exercise focuses on building an enhanced retrieval-augmented generation (RAG) system, with advanced embedding and query strategies.
π§βπ« Lecture
Advanced RAG Architectures and Performance Optimisation
π» Lab
Optimising Your RAG System: Hands-On Debugging and Scaling