LSE DS205 - Advanced Data Manipulation
  1. πŸ““ Syllabus
  • 🏠 Home
  • πŸ—’οΈ Course Info
  • πŸ—³οΈ Course Rep
  • πŸ““ Syllabus
  • πŸ“š Guides
    • Nuvolos - First Time
    • Using Nuvolos
  • πŸ“ Practice
    • πŸ“ W01 Practice
    • πŸ“ W02-W03 Practice
    • πŸ“ W04-W05 Practice
  • ✍️ Summative
    • ✍️ Problem Set 1
    • ✍️ Problem Set 2
    • πŸ“¦ Final Project
  • πŸ—“οΈ Weeks
    • Week 01
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap
    • Week 02
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap
    • Week 03
      • πŸ—£οΈ Lecture Material
      • βœ… Lecture Solutions
      • πŸ’» Lab Roadmap
    • Week 04
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap
      • βœ… Lab Solutions
    • Week 05
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap
    • Week 07
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap
    • Week 08
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap
    • Week 09
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap
    • Week 10
      • πŸ—£οΈ Support Session
      • πŸ’» Support Session
    • Week 11
      • πŸ—£οΈ Lecture Material
      • πŸ’» Lab Roadmap

πŸ““ Syllabus

DS205 (2024/25 Winter Term)

πŸ“£ Some details of the W07-W11 content are still subject to change, according to how much progress we make in the course until half-term.

Check this page every week for more information on studying for the course.

Last updated: 09 February 2025

πŸ—“οΈ Week 01
20 Jan 2025 -
24 Jan 2025

πŸ—£οΈ Lecture

Introduction, Logistics, Course Overview, and pandas Refresher.
🎞️ Slides

Note: This course works with real data challenges through our partnership with the Transition Pathway Initiative (TPI). We will use TPI’s actual datasets or data that supports their automation needs. Their team will work closely with us during the Term, giving us feedback on your work from our first lecture onwards. This direct industry connection means your coursework solves genuine business problems.

πŸ’» Lab

Exploratory Data Analysis with Pandas: ASCOR Benchmarks
πŸ›£οΈ Roadmap Tutorial

In our first lab, you will work with the latest ASCOR framework data, a set of indicators published by the TPI Centre to assess how countries are managing the low-carbon transition and the impacts of climate change.

πŸ“ Practice

Create an innovative visualization of the ASCOR EP Pillar data in the πŸ“ W01 Formative Exercise!

This exercise challenges you to create clear and engaging visualisations of the Emission Pathways (EP) Pillar indicators from the ASCOR dataset. You’ll practice pandas data manipulation while competing for a DSI tote bag! πŸ‘œ (to help motivate people into participating early on in the course)

Seek help from us if you don’t feel confident about your pandas skills.

πŸ›Ÿ Support

Click here to see how to get help this week

We love hearing from you! Don’t hesitate to contact us for help.

In this first week, the best ways to get help are:

  • Slack: Post questions in the #help channel. Jon will check messages daily.

  • πŸ†˜ In-person support: Wednesday, 22 January 2025, 11 am to 1 pm, with Sara Luxmoore.

  • πŸ’¬ Office Hours: You can book a 15-minute slot on StudentHub with any of your instructors this week. This is when we host office hours:

    • Jon: Thursday, 11:00 am - 1:00 pm.
    • Alex: Wednesdays, 3:00 pm - 5:00 pm
    • Barry: Fridays, 2:00 pm - 4:00 pm
  • πŸ“§ Email: For administrative queries, contact (answered by Kevin), our Teaching & Assessment Support Officer.

πŸ—“οΈ Week 02
27 Jan 2025 -
31 Jan 2025

πŸ—£οΈ Lecture

REST APIs and Introduction to FastAPI
🎞️ Slides

Note: You will learn about the fundamentals of web services and REST APIs. We’ll cover key concepts like client-server architecture, HTTP methods, and how APIs enable data sharing between applications. This foundational knowledge will prepare you for hands-on API development with FastAPI in the coming weeks.

πŸ’» Lab

Building APIs with FastAPI (Part I)
πŸ›£οΈ Roadmap Tutorial

This lab focuses on getting your development environment ready for API work. We’ll ensure everyone has the necessary tools installed and configured properly. You’ll learn about virtual environments, package management, and get comfortable with the development workflow we’ll use throughout the course.

πŸ“ Practice
+ πŸ† Mini-competition

Build an API endpoint for the ASCOR dataset.

Build an API endpoint for the ASCOR dataset with Pydantic validation. The best solution wins a DSI water bottle! πŸŽ‰

Follow the instructions in the πŸ“ W02-W03 Formative Exercise.

Update: (29/Jan/2025) the deadline was extended to 7 February 2025 to give you more time to practice with APIs.

πŸ—“οΈ Week 03
03 Feb 2025 -
07 Feb 2025

πŸ—£οΈ Lecture

Data Validation with Pydantic Models
πŸ–₯️ Live Demo

Note: This lecture continues our API development journey with a focus on data validation using Pydantic Models in FastAPI. Through live coding demonstrations, you’ll learn about writing testable Python functions, implementing data validation, and using FastAPI’s interactive documentation features. The session includes practical examples and testing strategies for API development.

πŸ’» Lab

Building APIs with FastAPI (Part II)
🦸🏻 Super Tech Support

This lab serves as a super tech support session where you can continue developing your API skills. You’ll have the opportunity to work on the W02-W03 Formative Exercise with guidance from your class teacher, focusing on implementing FastAPI endpoints and getting hands-on help with any challenges you’re facing.

πŸ“ Practice

Keep working on your API endpoint.

Take advantage of this extended time to refine your API implementation. Focus on incorporating proper data validation with Pydantic models and ensuring your API endpoints are well-tested.

For the instructions, see the πŸ“ W02-W03 Formative Exercise page.

πŸ—“οΈ Week 04
10 Feb 2025 -
14 Feb 2025

πŸ—£οΈ Lecture

Introduction to Web Scraping
🎞️ Slides

How are Web documents structured and how can we extract information from them? Are there any ethical and technical challenges?

In this course, we will solely use the scrapy framework to scrape the web (BeautifulSoup will not beallowed πŸ™ƒ).

πŸ’» Lab

Practising XPath and CSS Selectors with Scrapy
πŸ›£οΈ Roadmap Tutorial

In this lab, you will use Scrapy to extract data from a specific website selected for this exercise. You will design and test XPath/CSS Selectors to extract meaningful data and learn how to save it in a structured format. You will be encouraged to compare your results with your peers and discuss the challenges you faced.

πŸ“ Practice

Compile a dataset from webscraped data.

This activity will help you beyond just practicing webscraping. You will also have to make decisions (and justify them) about what how to structure your dataset.

Read the instructions in the πŸ“ W04-W05 Formative Exercise page.

πŸ—“οΈ Week 05
17 Feb 2025 -
21 Feb 2025

πŸ—£οΈ Lecture

Topics in Web Scraping: XPath Selectors, Item Pipelines, and Dynamic URL Discovery

πŸ’» Lab

Scraping Dynamic Content with Selenium

While not required for your current assignments, this lab introduces you to Selenium - a powerful tool for handling more JavaScript-rendered pages.

These skills will be valuable if you encounter dynamic websites in the future, for your final projects or beyond this course.

πŸ“ Practice

Keep working on your webscraping project.

Read the instructions in the πŸ“ W04-W05 Formative Exercise page.

πŸ“£ Announcement

We will release the detailed instructions for the ✍️ W07 Summative Exercise(link does not exist yet)! (Available after W04)

Task: While the details will be announced in the W05 Lecture, you will have the choice between two types of tasks: 1) build a web scraper + an accompanying API, or 2) build a web crawler to map a domain and detect content changes.

Submission via GitHub. Deadline: 5 March 2025, 8 pm.

πŸ—“οΈ Week 06
24 Feb 2025 -
28 Feb 2025

πŸ†˜ Additional Sessions

There is no lecture or lab this week. Instead, we will hold additional drop-in sessions to help you with your work.

πŸ—“οΈ Week 07
03 Mar 2025 -
07 Mar 2025

✍️ Summative 1

Work on the ✍️ W07 Summative Exercise(link does not exist yet)! (Available after W04)

Build a web crawler to map a domain and detect content changes, emphasising scalable crawling techniques and ethical considerations. Apply your knowledge from Week 5 to implement this project.

Submission via GitHub. Deadline: 5 March 2025, 8 pm.

πŸ—£οΈ Lecture

First Steps with Unstructured Data: PDF Extraction and Word Embeddings

πŸ’» Lab

Hands-On: Extracting and Structuring Data from PDFs

πŸ—“οΈ Week 08
10 Mar 2025 -
14 Mar 2025

πŸ—£οΈ Lecture

Introduction to Transformers: BERT and Beyond

πŸ’» Lab

Fine-Tune a Climate-Specific NLP Model Using HuggingFace

πŸ—“οΈ Week 09
17 Mar 2025 -
21 Mar 2025

πŸ—£οΈ Lecture

Introduction to RAG Systems: Embeddings and Vector Databases

πŸ’» Lab

Build a Simple RAG System Using a Vector Database

πŸ—“οΈ Week 10
24 Mar 2025 -
28 Mar 2025

✍️ Summative

Continue working on your ✍️ Problem Set 2

Deadline Extended: As announced in our Week 09 email, the deadline for Problem Set 2 has been extended to Friday, 4 April 2025, 8pm UK time.

πŸ—£οΈ Lecture

Problem Set 2 Support Session
🦸🏻 Super Tech Support

This session replaces the originally planned lecture on Advanced RAG Architectures. Instead, we’ll provide flexible, hands-on support for your Problem Set 2 implementation.

πŸ’» Lab

Problem Set 2 Support Session
🦸🏻 Super Tech Support

This lab continues our focus on providing hands-on support for your Problem Set 2 implementation. Bring your questions and we’ll help you overcome any technical challenges.

πŸ—“οΈ Week 11
31 Mar 2025 -
04 Apr 2025

πŸ—£οΈ Lecture

System Architecture Review and Final Q&A

πŸ’» Lab

Final Project Development and Submission Guidance

πŸ—³οΈ Course Rep
Nuvolos - First Time

Copyright 2025, LSE