DS205 2025-2026 Winter Term Icon

ℹ️ Course Information

All you need to know about DS205 (2025/26)

Author

Welcome to LSE DS205 - Advanced Data Manipulation, a LSE Data Science Institute course. This is where you’ll learn to master advanced data engineering techniques.

What is DS205 about?

DS205 is designed to advance your data manipulation and engineering skills to a professional-grade level. The course emphasises automation, efficiency, and scalability in data workflows, while maintaining a focus on ethical and practical implications in real-world applications.

The course uses two thematic arcs that build complexity progressively:

Arc Weeks Theme Data Sources
Real Food vs Ultra-Processed Food 01-05 Nutrition data mechanics Open Food Facts API
Food Producers’ Corporate Sustainability Assessments 07-11 Corporate sustainability TPI Centre Food Producers assessments

You will engage in activities like:

  • Building APIs for effective data sharing.
  • Developing web scrapers to gather data when APIs are unavailable.
  • Cleaning and structuring complex datasets.
  • Applying modern NLP tools and data processing pipelines.

🥅 Intended Learning Outcomes

By the end of this course, you should be able to:

  • Use appropriate Python packages (numpy, scipy, pandas, etc.) to automate and optimise data cleaning and data processing workflows.
  • Build APIs using FastAPI for structured data retrieval.
  • Develop scalable web scraping workflows with the Scrapy and Selenium packages.
  • Proficiently use GitHub workflows for professional collaboration and version control.
  • Apply NLP techniques for retrieval of information from text data.
  • Apply appropriate pre-trained deep learning models from the HuggingFace library to unstructured data.
  • Collaborate effectively on shared codebases and contribute to projects with a real-world impact.

📓 Syllabus

Check the syllabus every week for the most up-to-date information on the course structure, schedule, and assessment details.

📟 Communication

At DS205, we believe learning happens everywhere. Frequent communication between students, peers, and teaching staff creates the best learning experience. We’ve set up multiple channels to support you throughout this course.

Slack

Slack serves as our primary communication hub. We use it for sharing resources, announcements, polls, and answering questions. For formal requests (like extensions), please use email.

Most informal communication happens through Slack. Use it to ask questions, share discoveries, and engage with your peers and instructors.

Things we love to see on Slack:

  • Questions posted on the #help channel
  • Students helping each other by sharing solutions and explanations
  • Discussions about real-world data science applications

🗒️ IMPORTANT: You can contact instructors directly via Slack DMs, but we prioritise public channels like #help and #social. We typically don’t respond in the evenings or weekends. Each week, teaching staff dedicate specific hours to Slack questions.

🗨️ Office Hours

Book one-on-one sessions with teaching staff to discuss course questions, practice exercises, or data science careers. Add a note when booking so we can prepare and make the most of our time together.

📅 Book your slot

Search for the name of one of the instructors. Add a note about what you’d like to discuss.

📧 E-mails

Use email for administrative queries like class changes or extension requests. Contact our Teaching and Assessment Administrator for these matters.

📧

For administrative matters and formal requests.

👥 People and 📍 Places

Our Team

Name: Dr Jon Cardoso-Silva
Links: LSE, GitHub, LinkedIn, 📧
Role at LSE DSI: Assistant Professor (Education)
At LSE since 2021
Office Hours: 🗓️ Book via StudentHub
Background:
  • PhD in Computer Science (King’s College London)
  • Former roles: Tech Lead, Data Scientist, Software Engineer
Research interests: How Generative AI influences learning 1.

COURSE LEADER

Name: Dr Barry Ledeatte
Links: 📧
Role at LSE DSI: AI Learning Consultant

Office Hours: 🗓️ Book via StudentHub

CLASS TEACHER

Name: Kevin Kittoe
Links: 📧
Role at LSE: Teaching & Assessment Administrator
Responsibilities: Course access, submissions, extensions, and admin queries

ADMIN

✍️ Assessment & Feedback

How will I be assessed in this course?

Your grade consists of:

  1. Problem Set 1 (20%): Web Scraping & API Development
  2. Problem Set 2 (40%): Semantic Search System
  3. Final Project (40%): RAG System Development (Group Work)

Assessment Timeline

Weight Type Assessment Release of Instructions Due Date
20% Individual Problem Set 1
Web Scraping & API Development
Week 02 Week 06
40% Individual Problem Set 2
Semantic Search System
~Week 07 Week 10
40% Group Work Final Project
RAG System Development
Week 11 21 May 2026

🤖 Generative AI Policy

AI is fully authorised in this course. A full policy will be added here soon.

Footnotes

  1. Read about the GENIAL project↩︎