LSE ME204
Data Engineering for the Social World
🎯 Course Overview
ME204 teaches you to collect, clean, and analyse real-world data using Python. Over three weeks, you’ll learn to work with messy data and transform it into insights.
The course focuses on practical data engineering skills: collecting data through APIs, cleaning datasets with pandas, creating visualisations, and using professional workflows including Git and GitHub. You’ll work on projects using real data and build a website to showcase your findings.
About the “80% data wrangling” saying: While no one has scientifically proven this exact percentage, experienced data professionals consistently report that most of their time goes into preparing and cleaning data rather than running fancy algorithms. This rings true in practice and explains why we focus so heavily on data engineering skills in this course.
What you can expect to learn
By the end of this course, you will be able to:
🌐 Collect data from real websites and APIs using Python
🧹 Clean and preprocess messy real-world data into analysis-ready formats
🗄️ Apply SQL fundamentals and best practices for data storage
📊 Create effective visualisations for exploratory data analysis
⚙️ Use professional workflows including Git, GitHub, and Generative AI tools
🚀 Build and present a complete data project with web-based reporting
✍️ Assessment Structure
Your grade consists of two main components:
Component | Weight | Description | Due Date |
---|---|---|---|
Midterm Project | 25% | Individual project: “Is London really all that rainy?” | 22 July 2025 |
Final Project | 75% | Individual project: Bring your own data and research question | 01 August 2025 |
We mark for evidence of learning and understanding rather than just output. You’ll need to explain the rationale behind your key decisions and show engagement with course concepts.
🤖 Generative AI Policy
This course adopts Position 3: Full authorised use of generative AI. You can use any AI tools you want, including in assessments. We view AI as a tool to enhance learning, not replace it.
When marking, we look for evidence that you understand the concepts and can explain your choices. Even if you produce advanced work, if it doesn’t engage with course material or show your reasoning, you won’t get a good grade.
👥 Meet Your Teaching Team

Name: | Dr Jon Cardoso-Silva |
Links: |
![]() ![]() |
Role at LSE: |
Assistant Professor (Education) LSE Data Science Institute At LSE since 2021 |
Background: |
|
Likes to think about: | How Generative AI is influencing the way we learn 1. |
COURSE LEADER | LECTURER
Dr Stuart Bramwell
DPhil in Politics (Oxford University)
Teaches: afternoon classes
📧
CLASS TEACHER
📚 What we expect from you
This course is designed for students with some basic coding experience. You should know about variables, data types, loops, functions, and lists. If you’re new to Python, we recommend reading the free online book Automate the Boring Stuff before the course starts.
Coding beginners have done really well in previous ME204 iterations. You’ll need to put in a bit more effort than others, but you’ll be totally fine.
We expect you to be self-directed. If you feel behind, ask questions. If you’re ahead, help those around you. Let us know when we’ve been unclear about any topic.
📟 Getting help
Throughout the course, you can contact us via:
- Slack: Our primary communication channel for questions and discussions
- Office hours: Available during the course (details on Moodle)
- Email: For formal requests and specific questions
For detailed course structure, schedules, and guides, check the 📓 Syllabus.
❓ Questions?
📧 Email us using the contact details above
🏢 Office Hours available during the course
Footnotes
Read about the
GENIAL project↩︎