π Syllabus
DS205 (2024/25 Winter Term)
π£ Some details of the W07-W11 content are still subject to change, according to how much progress we make in the course until half-term.
Check this page every week for more information on studying for the course.
Last updated: 09 February 2025
ποΈ Week 01
20 Jan 2025 -
24 Jan 2025
π£οΈ Lecture
Introduction, Logistics, Course Overview, and pandas
Refresher.
ποΈ Slides
Note: This course works with real data challenges through our partnership with the Transition Pathway Initiative (TPI). We will use TPIβs actual datasets or data that supports their automation needs. Their team will work closely with us during the Term, giving us feedback on your work from our first lecture onwards. This direct industry connection means your coursework solves genuine business problems.
π» Lab
Exploratory Data Analysis with Pandas: ASCOR Benchmarks
π£οΈ Roadmap Tutorial
In our first lab, you will work with the latest ASCOR framework data, a set of indicators published by the TPI Centre to assess how countries are managing the low-carbon transition and the impacts of climate change.
π Practice
Create an innovative visualization of the ASCOR EP Pillar data in the π W01 Formative Exercise!
This exercise challenges you to create clear and engaging visualisations of the Emission Pathways (EP) Pillar indicators from the ASCOR dataset. Youβll practice pandas data manipulation while competing for a DSI tote bag! π (to help motivate people into participating early on in the course)
Seek help from us if you donβt feel confident about your pandas skills.
π Support
Click here to see how to get help this week
We love hearing from you! Donβt hesitate to contact us for help.
In this first week, the best ways to get help are:
Slack: Post questions in the
#help
channel. Jon will check messages daily.π In-person support: Wednesday, 22 January 2025, 11 am to 1 pm, with Sara Luxmoore.
π¬ Office Hours: You can book a 15-minute slot on StudentHub with any of your instructors this week. This is when we host office hours:
- Jon: Thursday, 11:00 am - 1:00 pm.
- Alex: Wednesdays, 3:00 pm - 5:00 pm
- Barry: Fridays, 2:00 pm - 4:00 pm
π§ Email: For administrative queries, contact (answered by Kevin), our Teaching & Assessment Support Officer.
ποΈ Week 02
27 Jan 2025 -
31 Jan 2025
π£οΈ Lecture
REST APIs and Introduction to FastAPI
ποΈ Slides
Note: You will learn about the fundamentals of web services and REST APIs. Weβll cover key concepts like client-server architecture, HTTP methods, and how APIs enable data sharing between applications. This foundational knowledge will prepare you for hands-on API development with FastAPI in the coming weeks.
π» Lab
Building APIs with FastAPI (Part I)
π£οΈ Roadmap Tutorial
This lab focuses on getting your development environment ready for API work. Weβll ensure everyone has the necessary tools installed and configured properly. Youβll learn about virtual environments, package management, and get comfortable with the development workflow weβll use throughout the course.
π Practice
+ π Mini-competition
Build an API endpoint for the ASCOR dataset.
Build an API endpoint for the ASCOR dataset with Pydantic validation. The best solution wins a DSI water bottle! π
Follow the instructions in the π W02-W03 Formative Exercise.
Update: (29/Jan/2025) the deadline was extended to 7 February 2025 to give you more time to practice with APIs.
ποΈ Week 03
03 Feb 2025 -
07 Feb 2025
π£οΈ Lecture
Data Validation with Pydantic Models
π₯οΈ Live Demo
Note: This lecture continues our API development journey with a focus on data validation using Pydantic Models in FastAPI. Through live coding demonstrations, youβll learn about writing testable Python functions, implementing data validation, and using FastAPIβs interactive documentation features. The session includes practical examples and testing strategies for API development.
π» Lab
Building APIs with FastAPI (Part II)
π¦Έπ» Super Tech Support
This lab serves as a super tech support session where you can continue developing your API skills. Youβll have the opportunity to work on the W02-W03 Formative Exercise with guidance from your class teacher, focusing on implementing FastAPI endpoints and getting hands-on help with any challenges youβre facing.
π Practice
Keep working on your API endpoint.
Take advantage of this extended time to refine your API implementation. Focus on incorporating proper data validation with Pydantic models and ensuring your API endpoints are well-tested.
For the instructions, see the π W02-W03 Formative Exercise page.
ποΈ Week 04
10 Feb 2025 -
14 Feb 2025
π£οΈ Lecture
Introduction to Web Scraping
ποΈ Slides
How are Web documents structured and how can we extract information from them? Are there any ethical and technical challenges?
In this course, we will solely use the scrapy framework to scrape the web (BeautifulSoup will not beallowed π).
π» Lab
Practising XPath and CSS Selectors with Scrapy
π£οΈ Roadmap Tutorial
In this lab, you will use Scrapy to extract data from a specific website selected for this exercise. You will design and test XPath/CSS Selectors to extract meaningful data and learn how to save it in a structured format. You will be encouraged to compare your results with your peers and discuss the challenges you faced.
π Practice
Compile a dataset from webscraped data.
This activity will help you beyond just practicing webscraping. You will also have to make decisions (and justify them) about what how to structure your dataset.
Read the instructions in the π W04-W05 Formative Exercise page.
ποΈ Week 05
17 Feb 2025 -
21 Feb 2025
π£οΈ Lecture
Topics in Web Scraping: XPath Selectors, Item Pipelines, and Dynamic URL Discovery
π» Lab
Scraping Dynamic Content with Selenium
While not required for your current assignments, this lab introduces you to Selenium - a powerful tool for handling more JavaScript-rendered pages.
These skills will be valuable if you encounter dynamic websites in the future, for your final projects or beyond this course.
π Practice
Keep working on your webscraping project.
Read the instructions in the π W04-W05 Formative Exercise page.
π£ Announcement
We will release the detailed instructions for the βοΈ W07 Summative Exercise(link does not exist yet)! (Available after W04)
Task: While the details will be announced in the W05 Lecture, you will have the choice between two types of tasks: 1) build a web scraper + an accompanying API, or 2) build a web crawler to map a domain and detect content changes.
Submission via GitHub. Deadline: 5 March 2025, 8 pm.
ποΈ Week 06
24 Feb 2025 -
28 Feb 2025
π Additional Sessions
There is no lecture or lab this week. Instead, we will hold additional drop-in sessions to help you with your work.
ποΈ Week 07
03 Mar 2025 -
07 Mar 2025
βοΈ Summative 1
Work on the βοΈ W07 Summative Exercise(link does not exist yet)! (Available after W04)
Build a web crawler to map a domain and detect content changes, emphasising scalable crawling techniques and ethical considerations. Apply your knowledge from Week 5 to implement this project.
Submission via GitHub. Deadline: 5 March 2025, 8 pm.
π£οΈ Lecture
Parsing PDFs and Websites: NLP Preprocessing and Word Embeddings
π» Lab
Hands-On: Extracting and Structuring Data from PDFs
ποΈ Week 08
10 Mar 2025 -
14 Mar 2025
ποΈ Week 09
17 Mar 2025 -
21 Mar 2025
ποΈ Week 10
24 Mar 2025 -
28 Mar 2025
βοΈ Summative
Submit your π W10 Summative Assessment(link does not exist yet) via GitHub! (Deadline: 26 Mar 2025, 8 pm)
This final summative exercise focuses on building an enhanced retrieval-augmented generation (RAG) system, with advanced embedding and query strategies.
π£οΈ Lecture
Advanced RAG Architectures and Performance Optimisation
π» Lab
Optimising Your RAG System: Hands-On Debugging and Scaling