LSE ME204

Data Engineering for the Social World

Author
Last updated

31 July 2025

ME204 course icon

🎯 Course Overview

ME204 teaches you to collect, clean, and analyse real-world data using Python. Over three weeks, you’ll learn to work with messy data and transform it into insights.

The course focuses on practical data engineering skills: collecting data through APIs, cleaning datasets with pandas, creating visualisations, and using professional workflows including Git and GitHub. You’ll work on projects using real data and build a website to showcase your findings.

Note

About the “80% data wrangling” saying: While no one has scientifically proven this exact percentage, experienced data professionals consistently report that most of their time goes into preparing and cleaning data rather than running fancy algorithms. This rings true in practice and explains why we focus so heavily on data engineering skills in this course.

What you can expect to learn

By the end of this course, you will be able to:

🌐 Collect data from real websites and APIs using Python

🧹 Clean and preprocess messy real-world data into analysis-ready formats

🗄️ Apply SQL fundamentals and best practices for data storage

📊 Create effective visualisations for exploratory data analysis

⚙️ Use professional workflows including Git, GitHub, and Generative AI tools

🚀 Build and present a complete data project with web-based reporting

✍️ Assessment Structure

Your grade consists of two main components:

Component Weight Description Due Date
Midterm Project 25% Individual project: “Is London really all that rainy?” 22 July 2025
Final Project 75% Individual project: Bring your own data and research question 01 August 2025

We mark for evidence of learning and understanding rather than just output. You’ll need to explain the rationale behind your key decisions and show engagement with course concepts.

🤖 Generative AI Policy

This course adopts Position 3: Full authorised use of generative AI. You can use any AI tools you want, including in assessments. We view AI as a tool to enhance learning, not replace it.

When marking, we look for evidence that you understand the concepts and can explain your choices. Even if you produce advanced work, if it doesn’t engage with course material or show your reasoning, you won’t get a good grade.

👥 Meet Your Teaching Team

Name: Dr Jon Cardoso-Silva
Links: LSE, GitHub, LinkedIn, 📧
Role at LSE: Assistant Professor (Education)
LSE Data Science Institute
At LSE since 2021
Background:
  • PhD in Computer Science (King’s College London)
  • Former roles: Tech Lead, Data Scientist, Software Engineer
Likes to think about: How Generative AI is influencing the way we learn 1.

COURSE LEADER | LECTURER

Dr Stuart Bramwell
DPhil in Politics (Oxford University)
Teaches: afternoon classes
📧

CLASS TEACHER

📚 What we expect from you

This course is designed for students with some basic coding experience. You should know about variables, data types, loops, functions, and lists. If you’re new to Python, we recommend reading the free online book Automate the Boring Stuff before the course starts.

Coding beginners have done really well in previous ME204 iterations. You’ll need to put in a bit more effort than others, but you’ll be totally fine.

We expect you to be self-directed. If you feel behind, ask questions. If you’re ahead, help those around you. Let us know when we’ve been unclear about any topic.

📟 Getting help

Throughout the course, you can contact us via:

  • Slack: Our primary communication channel for questions and discussions
  • Office hours: Available during the course (details on Moodle)
  • Email: For formal requests and specific questions

For detailed course structure, schedules, and guides, check the 📓 Syllabus.

❓ Questions?

📧 Email us using the contact details above
🏢 Office Hours available during the course

Footnotes

  1. Read about the GENIAL project↩︎