ℹ️ Course Information
All you need to know about DS205 (2025/26)
Welcome to LSE DS205 - Advanced Data Manipulation, a LSE Data Science Institute course. This is where you’ll learn to master advanced data engineering techniques.
What is DS205 about?
DS205 is designed to advance your data manipulation and engineering skills to a professional-grade level. The course emphasises automation, efficiency, and scalability in data workflows, while maintaining a focus on ethical and practical implications in real-world applications.
The course uses two thematic arcs that build complexity progressively:
| Arc | Weeks | Theme | Data Sources |
|---|---|---|---|
| Real Food vs Ultra-Processed Food | 01-05 | Nutrition data mechanics | Open Food Facts API |
| Food Producers’ Corporate Sustainability Assessments | 07-11 | Corporate sustainability | TPI Centre Food Producers assessments |
You will engage in activities like:
- Building APIs for effective data sharing.
- Developing web scrapers to gather data when APIs are unavailable.
- Cleaning and structuring complex datasets.
- Applying modern NLP tools and data processing pipelines.
🥅 Intended Learning Outcomes
By the end of this course, you should be able to:
- Use appropriate Python packages (
numpy,scipy,pandas, etc.) to automate and optimise data cleaning and data processing workflows. - Build APIs using FastAPI for structured data retrieval.
- Develop scalable web scraping workflows with the Scrapy and Selenium packages.
- Proficiently use GitHub workflows for professional collaboration and version control.
- Apply NLP techniques for retrieval of information from text data.
- Apply appropriate pre-trained deep learning models from the HuggingFace library to unstructured data.
- Collaborate effectively on shared codebases and contribute to projects with a real-world impact.
📓 Syllabus
Check the syllabus every week for the most up-to-date information on the course structure, schedule, and assessment details.
📟 Communication
At DS205, we believe learning happens everywhere. Frequent communication between students, peers, and teaching staff creates the best learning experience. We’ve set up multiple channels to support you throughout this course.
Slack
Slack serves as our primary communication hub. We use it for sharing resources, announcements, polls, and answering questions. For formal requests (like extensions), please use email.
Most informal communication happens through Slack. Use it to ask questions, share discoveries, and engage with your peers and instructors.
Things we love to see on Slack:
- Questions posted on the
#helpchannel - Students helping each other by sharing solutions and explanations
- Discussions about real-world data science applications
🗒️ IMPORTANT: You can contact instructors directly via Slack DMs, but we prioritise public channels like #help and #social. We typically don’t respond in the evenings or weekends. Each week, teaching staff dedicate specific hours to Slack questions.
🗨️ Office Hours
Book one-on-one sessions with teaching staff to discuss course questions, practice exercises, or data science careers. Add a note when booking so we can prepare and make the most of our time together.
Search for the name of one of the instructors. Add a note about what you’d like to discuss.
📧 E-mails
Use email for administrative queries like class changes or extension requests. Contact our Teaching and Assessment Administrator for these matters.
📧
For administrative matters and formal requests.
👥 People and 📍 Places
Our Team

| Name: | Dr Jon Cardoso-Silva |
| Links: |
|
| Role at LSE DSI: |
Assistant Professor (Education) At LSE since 2021 |
| Office Hours: |
🗓️ Book via StudentHub |
| Background: |
|
| Research interests: | How Generative AI influences learning 1. |
COURSE LEADER

| Name: | Dr Barry Ledeatte |
| Links: | 📧 |
| Role at LSE DSI: |
AI Learning Consultant
|
| Office Hours: |
🗓️ Book via StudentHub
|
CLASS TEACHER

| Name: | Kevin Kittoe |
| Links: | 📧 |
| Role at LSE: | Teaching & Assessment Administrator |
| Responsibilities: | Course access, submissions, extensions, and admin queries |
ADMIN
✍️ Assessment & Feedback
How will I be assessed in this course?
Your grade consists of:
- Problem Set 1 (20%): Web Scraping & API Development
- Problem Set 2 (40%): Semantic Search System
- Final Project (40%): RAG System Development (Group Work)
Assessment Timeline
| Weight | Type | Assessment | Release of Instructions | Due Date |
|---|---|---|---|---|
| 20% | Individual | Problem Set 1 Web Scraping & API Development |
Week 02 | Week 06 |
| 40% | Individual | Problem Set 2 Semantic Search System |
~Week 07 | Week 10 |
| 40% | Group Work | Final Project RAG System Development |
Week 11 | 21 May 2026 |
🤖 Generative AI Policy
AI is fully authorised in this course. A full policy will be added here soon.
Footnotes
Read about the
GENIAL project↩︎