π Syllabus
LSE ME204 (2025) β Data Engineering for the Social World
Welcome to ME204! This is the course syllabus for the 2025 edition of the course.
Note from Jon:
The actual syllabus of the course has a sequence that differs from the one you saw in the course outline at the time you signed up for ME204.
I always edit syllabi when I get closer to deliver my courses because I have new, better ideas of how things will flow based on most recent teaching experience and things that happened in industry, etc.
Week 01 | Foundations & First Results
(14 July - 20 July)
This is a very practical course from Day 01. During this first week, you will jump straight into hands-on work, learning to collect data, use professional tools, and produce your first data visualisations.
ποΈ Day 01
(Mon 14 Jul)
Foundations of Data Wrangling with Python
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Describe the role of data engineering in social science
- Understand the course structure, tools, and expectations
- Familiarise yourself with the development environment of the course (Nuvolos)
- Write some basic Python commands inside the Nuvolos environment
Morning Lecture
10.00am - 1.00pm
What weβll cover
- An overview of the course, its goals, and the roles of data engineering.
- A group exercise mapping the data pipeline for a real-world case study.
- An introduction to the course tools (Python, Git, Nuvolos) and assessment structure.
- A discussion on the courseβs AI policy and expectations for student engagement.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Python Foundations Practice
ποΈ Day 02
(Tue 15 Jul)
Working with Data Types and APIs
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Load and inspect data in CSV and JSON formats
- Fetch weather data from the OpenMeteo API
Morning Lecture
10.00am - 1.00pm
What weβll cover
- Distinguishing between structured, semi-structured, and unstructured data.
- An overview of common file formats like CSV and JSON, and how to work with them.
- An introduction to Application Programming Interfaces (APIs) and their importance.
-
Using Pythonβs
requests
library to fetch live data from the OpenMeteo weather API.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Working with CSV/JSON and collecting weather data
ποΈ Day 03
(Wed 16 Jul)
Version Control and Professional Documentation
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Create a new Git repository and publish it to GitHub
- Commit your first project files
- Write professional documentation using Markdown
Morning Lecture
10.00am - 1.00pm
What weβll cover
- An introduction to the βwhyβ and βhowβ of version control with Git.
- Core Git commands: committing files, pushing changes to a remote repository, and branching.
- Using GitHub to collaborate and host your projects.
- Writing clear project documentation using Markdown.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Setting up your first repository and documenting it
ποΈ Day 04
(Thu 17 Jul)
Creating Your First Data Visualisations
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Create line plots, scatter plots, and bar charts
- Use these visualisations for weather data analysis
Morning Lecture
10.00am - 1.00pm
What weβll cover
- Fundamental principles of effective data visualisation.
- Creating static plots like line charts, scatter plots, and bar charts with Matplotlib and Seaborn.
- Guidelines for communicating insights clearly and avoiding common pitfalls.
- Using Generative AI to help write plotting code efficiently.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Visualising your weather data
π Enjoy!
(Fri - Sun)
No classes on Friday
Many students who attend the LSE Summer School students are visiting London for the first time. If that is your case, here is a suggestion of something to do on a day out in London:
- Take the 15 double-decker bus at its starting point in Trafalgar Square
- Enjoy views of some iconic London landmarks: the Royal Courts of Justice (near LSE), the St Paulβs Cathedral, Tower of London, the Tower Bridge
- Get off the bus at βAldgate Eastβ stop
- Then walk up to Brick Lane. If you are hungry, I recommend a visit to the Upmarket Food Hall or if you feel like queueing, visit the famous The Beigel Shop
Week 02 | Analysis & Automation
(21 July - 27 July)
In our second week, we will shift our focus to efficiency and more advanced data handling. You will learn to write faster, cleaner code and work with authenticated APIs and databases, which are essential skills for real-world data projects.
ποΈ Day 01
(Mon 21 Jul)
From Loops to Vectorisation
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Convert Python loops to vectorised pandas operations
- Refactor data collection code for efficiency
Morning Lecture
10.00am - 1.00pm
What weβll cover
- Understanding the concept of vectorised operations and why they are more efficient than loops.
- Using pandas to replace slow, manual loops with fast, vectorised equivalents.
- Leveraging Generative AI to help refactor and optimise data transformation code.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Refactoring loops for efficiency
ποΈ Day 02
(Tue 22 Jul)
Refactoring Workshop
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Prepare and submit the midterm assignment
- Address specific challenges based on a review of student work
Morning Lecture
10.00am - 1.00pm
What weβll cover
- A guided workshop to help you apply vectorisation concepts to your own code.
- Troubleshoot problems with Git/GitHub
- Q&A session to address any questions about the midterm assignment.
β Midterm Deadline
Your midterm assignment is due by 8pm today.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Midterm Submission Support
ποΈ Day 03
(Wed 23 Jul)
Authenticated APIs & SQL Primer
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Access data from an authenticated API (e.g., Reddit)
- Write introductory database queries using SQL
Morning Lecture
10.00am - 1.00pm
What weβll cover
- An overview of common API authentication methods like API keys and OAuth.
- Hands-on practice with an authenticated API (e.g., Spotify, Reddit, or YouTube).
- Techniques for parsing the nested JSON data that APIs often return.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Working with Authenticated APIs and SQL
ποΈ Day 04
(Thu 24 Jul)
Data Reshaping: pandas vs SQL
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Join datasets using both pandas and SQL
- Compare the two different methods for data merging
Morning Lecture
10.00am - 1.00pm
What weβll cover
- The logic of joining data from different sources (inner, outer, left, and right joins).
- Combining datasets using both pandas in Python and directly with SQL.
- Troubleshooting common issues that arise when merging real-world data.
π’ Project Announcement
The final project requirements will be revealed.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Designing good exploratory questions
π Enjoy!
(Fri - Sun)
No classes on Friday
This is a great opportunity to make progress on your final project, but donβt forget to take a break and explore the city! Another suggestion is a visit to the Borough Market, one of the largest and oldest food markets in London.
Week 03 | Databases & Pipelines
(28 July - 01 Aug)
The final week is about building robust data systems and communicating your findings effectively. You will learn how to design databases, scrape data from the web, and create interactive dashboards to present your analysis in a compelling way.
ποΈ Day 01
(Mon 28 Jul)
Designing Databases and Merging Data
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Design a database schema for an SQLite database
- Join datasets using pandas and/or SQL
Morning Lecture
10.00am - 1.00pm
What weβll cover
- Database fundamentals: tables, primary keys, and foreign keys.
- When to choose a database over a simple file-based system.
- Writing basic SQL queries to select, filter, group, and order data.
- Querying an SQLite database directly from Python.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Designing schemas and joining data
ποΈ Day 02
(Tue 29 Jul)
A Primer on Web Scraping and Text Mining
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Parse HTML content from a web page
- Extract data from public websites (web scraping)
Morning Lecture
10.00am - 1.00pm
What weβll cover
- When to use web scraping vs. APIs, and the trade-offs of each approach.
- The ethical and legal considerations of collecting data from websites.
- Using Pythonβs Scrapy library to parse HTML from web pages.
- Extracting structured data from simple HTML layouts.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Supervised web scraping and text analysis practice
ποΈ Day 03
(Wed 30 Jul)
Building Interactive Visualisations and Dashboards
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Create interactive charts using Plotly
- Build a simple data dashboard using Streamlit
Morning Lecture
10.00am - 1.00pm
What weβll cover
- Moving beyond static plots to create interactive visualisations using Plotly.
- Building simple web dashboards using Streamlit to showcase your data stories.
- Best practices for user interface design and interactive data presentation.
Afternoon Class
2.00pm - 5.00pm
π» Lab: Building interactive charts and dashboards
ποΈ Day 04
(Thu 31 Jul)
Data Storytelling & Project Management
π₯ Objectives
Review the goals for today
At the end of the day you should be able to:- Document a data project in a professional manner
- Establish a reproducible data analysis workflow
Morning Lecture
10.00am - 1.00pm
What weβll cover
- Practical advice for managing collaborative projects with Git and GitHub.
- The importance of data versioning and ensuring your analysis is reproducible.
- Best practices for setting up a well-documented and professional project repository.
Afternoon Class
2.00pm - 5.00pm
π¦Έ Super Tech Support: Final project work session
ποΈ Day 05
(Fri 01 Aug)
β³ Deadline:
Submit your final project by 6pm today. β