DS205 β Advanced Data Manipulation
19 Jan 2026
Current Focus:
Office Hours:
Wednesdays, 2:00 pm - 5:00 pm
Book via StudentHub
Recent Recognition:
LSESU Teaching Award for Feedback & Communication (2023)

Kevin Kittoe
Teaching & Assessment Administrator (DSI)
ADMINISTRATIVE SUPPORT
Contact Kevin (π§ DSI.ug@lse.ac.uk) for:
Key Information:
Why this course exists:
By the end of this course, you should be able to:
numpy, scipy, pandas, etc.) to automate and optimise data cleaning and data processing workflows.| 20% | Individual |
βοΈ Problem Set 1: Web Scraping & API Development |
Release: Week 02 Due: β° Week 06 (4 weeks!) |
| 40% | Individual |
βοΈ Problem Set 2: RAG System Implementation |
Release: ~Week 07 Due: β° Week 10 (~20 days) |
| 40% | Group Work |
π₯ Final Project: RAG System Development π¦ a real capstone project you will develop with/for the TPI Centre |
Release: Week 11 Due: β° 21 May 2026 (~2 months!) |
Build as you learn: In the π» Weekly Labs from W02-W04, your goal will be to adapt what you would have learned in the Monday lecture, only to a different data source.
Think of your code as a βpublic goodβ rather than a personal project.
Throughout this course, youβll write code knowing that someone else will need to understand it, use it, and build upon it. This is to mirror real-world software development of data products and data pipelines where code is read far more often than itβs written.
Aim for code that is:
All of your assignments (even individual ones) will have a peer component
Youβll experience this cycle repeatedly:
This year, weβre exploring two thematic arcs:
π₯ Weeks 01β05:
Real Food vs
Ultra-Processed Food - Start with familiar, accessible data from an API (Open Food Facts)
- Collect unstructured data from the Web (web scraping principles)
- Write data pipelines that are reproducible and well-documented.
π Weeks 07β11:
Food Producersβ Corporate Sustainability Assessments - Progress to highly unstructured data (corporate statements, reports, etc.)
- Extract relevant information from unstructured data using NLP techniques
- Work with data used and produced by our collaborators at the TPI Centre.
To take this course, you must have taken DS105, ST101, EC1B1 or equivalent (for General Course students)
I expect you to already know:
int, float, str, bool, list, dict)list, dict)if, for, while)def)pandas What would be great if you already knew:
(but we will cover them in the course)
requests package to collect data from the Internetπ Communication
Donβt like your laptop for coding?
We have a dedicated cloud environment on
Nuvolos
Visit the Nuvolos - First Time Access to learn how to get access to the DS205 environment.
Read the syllabus for week-by-week information on how we will cover the course content and assessments.
![]()
After the break:
We will be writing a lot of code throughout this course and we will be using VS Code and Jupyter notebooks for most of our work. We have a dedicated Nuvolos workspace for you to use. However, if you prefer to work locally on your own machine, you will have to make sure you have a few tools installed.
You have two options for your coding environment:
Option 1: Nuvolos Platform
(Strongly recommended)
π― ACTION POINT:
Access our
Nuvolos - First Time guide.
This cloud environment comes pre-configured with all required tools. It also includes an AI code editor: GitHub Copilot (similar to OpenAIβs Codex, Cursor, Claude Code, and Googleβs Antigravity).
Option 2: Local Setup
(Prone to bugs we wonβt be able to help with)
You will need to install the following tools:
GitHub CLI (command line interface) or GitHub Desktop (graphical user interface)
Git (installed and configured)
Initial dependencies for the course:
Establishing good habits early prepares you for:
Create a repository for course notes:
π‘ TIP: Try to use Markdown to write your notes. Choose a pattern of filename convention that works for you. Here are two potential examples:
ds205-notes/
βββ README.md
βββ week01/
β βββ lecture-notes.md
β βββ lab-reflections.md
β βββ questions.md
βββ week02/
βββ [subsequent weeks]
ds205-notes/
βββ README.md
βββ w01-lecture-notes.md
βββ w01-lab-reflections.md
βββ w01-questions.md
βββ w02-lecture-notes.md
βββ w02-lab-reflections.md
βββ w02-questions.md
βββ [subsequent weeks]
git clone https://github.com/YOUR_USERNAME/ds205-notes.gitWeβll work through: W01-NB01-Lecture-Open-Food.ipynb (find it on Nuvolos). Todayβs demonstration shows the complete pipeline from data collection to interactive visualisation:
NOTE: I donβt expect you to already know what UMAP is (weβll talk about it in Week 08) but I expect you to be able to understand most of the code Iβll show.
Pay close attention when Iβm demonstrating and take note of any code snippets or Python concepts that you donβt understand.
I will skip a few details because they will be covered in the lab tomorrow.
![]()
LSE DS205 (2025/26)