๐ Syllabus
LSE ME204 (2025) โ Data Engineering for the Social World
Welcome to ME204! This is the course syllabus for the 2025 edition of the course.
Note from Jon:
The actual syllabus of the course has a sequence that differs from the one you saw in the course outline at the time you signed up for ME204.
I always edit syllabi when I get closer to deliver my courses because I have new, better ideas of how things will flow based on most recent teaching experience and things that happened in industry, etc.
Week 01 | Foundations & First Results
(14 July - 20 July)
This is a very practical course from Day 01. During this first week, you will jump straight into hands-on work, learning to collect data, use professional tools, and produce your first data visualisations.
๐๏ธ Day 01
(Mon 14 Jul)
Foundations of Data Wrangling with Python
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Describe the role of data engineering in social science
- Understand the course structure, tools, and expectations
- Familiarise yourself with the development environment of the course (Nuvolos)
- Write some basic Python commands inside the Nuvolos environment
Morning Lecture
10.00am - 1.00pm
What weโll cover
- An overview of the course, its goals, and the roles of data engineering.
- A group exercise mapping the data pipeline for a real-world case study.
- An introduction to the course tools (Python, Git, Nuvolos) and assessment structure.
- A discussion on the courseโs AI policy and expectations for student engagement.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Python Foundations Practice
๐๏ธ Day 02
(Tue 15 Jul)
Working with Data Types and APIs
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Load and inspect data in CSV and JSON formats
- Fetch weather data from the OpenMeteo API
Morning Lecture
10.00am - 1.00pm
What weโll cover
- Distinguishing between structured, semi-structured, and unstructured data.
- An overview of common file formats like CSV and JSON, and how to work with them.
- An introduction to Application Programming Interfaces (APIs) and their importance.
-
Using Pythonโs
requests
library to fetch live data from the OpenMeteo weather API.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Working with CSV/JSON and collecting weather data
๐๏ธ Day 03
(Wed 16 Jul)
Version Control and Professional Documentation
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Create a new Git repository and publish it to GitHub
- Commit your first project files
- Write professional documentation using Markdown
Morning Lecture
10.00am - 1.00pm
What weโll cover
- An introduction to the โwhyโ and โhowโ of version control with Git.
- Core Git commands: committing files, pushing changes to a remote repository, and branching.
- Using GitHub to collaborate and host your projects.
- Writing clear project documentation using Markdown.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Setting up your first repository and documenting it
๐๏ธ Day 04
(Thu 17 Jul)
Creating Your First Data Visualisations
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Translate a sketched visualisation idea into functioning Python code using AI assistance.
- Navigate and draw inspiration from professional data visualisation galleries (Matplotlib and Seaborn).
- Critically evaluate the effectiveness of a data visualisation.
- Use an AI assistant to refactor and improve existing plotting code.
- Understand the workflow of progressing from raw data to a final, insightful visualisation.
Morning Lecture
10.00am - 1.00pm
What weโll cover
- A reverse-engineered walkthrough of a data analysis, starting from the final plots.
- A group activity to find and critique real-world data visualisations.
- A hands-on mini-competition to apply visualisation principles in practice.
- Framing todayโs work as creating your โfirst visualisations, not your ultimate ones.โ
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: The Visualisation Workflow (Sketch, Explore, Refactor)
๐ Enjoy!
(Fri - Sun)
No classes on Friday
Many students who attend the LSE Summer School students are visiting London for the first time. If that is your case, here is a suggestion of something to do on a day out in London:
- Take the 15 double-decker bus at its starting point in Trafalgar Square
- Enjoy views of some iconic London landmarks: the Royal Courts of Justice (near LSE), the St Paulโs Cathedral, Tower of London, the Tower Bridge
- Get off the bus at โAldgate Eastโ stop
- Then walk up to Brick Lane. If you are hungry, I recommend a visit to the Upmarket Food Hall or if you feel like queueing, visit the famous The Beigel Shop
Week 02 | Analysis & Automation
(21 July - 27 July)
In our second week, we will shift our focus to efficiency and more advanced data handling. You will learn to write faster, cleaner code and work with authenticated APIs and databases, which are essential skills for real-world data projects.
๐๏ธ Day 01
(Mon 21 Jul)
From Loops to Vectorisation
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Convert Python loops to vectorised pandas operations
- Refactor data collection code for efficiency
Morning Lecture
10.00am - 1.00pm
What weโll cover
- Understanding the concept of vectorised operations and why they are more efficient than loops.
- Using pandas to replace slow, manual loops with fast, vectorised equivalents.
- Leveraging Generative AI to help refactor and optimise data transformation code.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Refactoring loops for efficiency
๐๏ธ Day 02
(Tue 22 Jul)
Refactoring Workshop
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Prepare and submit the midterm assignment
- Address specific challenges based on a review of student work
Morning Lecture
10.00am - 1.00pm
What weโll cover
- A guided workshop to help you apply vectorisation concepts to your own code.
- Troubleshoot problems with Git/GitHub
- Q&A session to address any questions about the midterm assignment.
โ Midterm Deadline
Your midterm assignment is due by 8pm today.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Midterm Submission Support
๐๏ธ Day 03
(Wed 23 Jul)
API Authentication Patterns
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Understand common API authentication methods (API keys, Basic Auth, Bearer tokens, OAuth 2.0)
- Apply authentication patterns to collect data from APIs like Reddit
-
Implement secure credential management using
.env
files - Manage API credentials safely in your GitHub repositories
- Recognise when different authentication methods are appropriate for different APIs
Morning Lecture
10.00am - 1.00pm
What weโll cover
- Theory (Part 1): REST API fundamentals, HTTP headers vs parameters, and security principles for credential management.
- Practice (Part 2): Live Reddit developer account setup, hands-on authentication implementation, and secure repository management.
- Theory (Part 3): Comprehensive comparison of authentication patterns and their real-world applications.
- A reference toolkit approach: building copy-pasteable patterns you can use with any authenticated API.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Mastering API Pagination
๐๏ธ Day 04
(Thu 24 Jul)
Data Reshaping: pandas vs SQL
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Join datasets using both pandas and SQL
- Compare the two different methods for data merging
Morning Lecture
10.00am - 1.00pm
What weโll cover
- The logic of joining data from different sources (inner, outer, left, and right joins).
- Combining datasets using both pandas in Python and directly with SQL.
- Troubleshooting common issues that arise when merging real-world data.
๐ข Project Announcement
The final project requirements will be revealed.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Designing good exploratory questions
๐ Enjoy!
(Fri - Sun)
No classes on Friday
This is a great opportunity to make progress on your final project, but donโt forget to take a break and explore the city! Another suggestion is a visit to the Borough Market, one of the largest and oldest food markets in London.
Week 03 | Databases & Pipelines
(28 July - 01 Aug)
The final week is about building robust data systems and communicating your findings effectively. You will learn how to design databases, scrape data from the web, and create interactive dashboards to present your analysis in a compelling way.
๐๏ธ Day 01
(Mon 28 Jul)
Designing Databases and Merging Data
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Design a database schema for an SQLite database
- Join datasets using pandas and/or SQL
Morning Lecture
10.00am - 1.00pm
What weโll cover
- Database fundamentals: tables, primary keys, and foreign keys.
- When to choose a database over a simple file-based system.
- Writing basic SQL queries to select, filter, group, and order data.
- Querying an SQLite database directly from Python.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Designing schemas and joining data
๐๏ธ Day 02
(Tue 29 Jul)
A Primer on Web Scraping and Text Mining
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Parse HTML content from a web page
- Extract data from public websites (web scraping)
Morning Lecture
10.00am - 1.00pm
What weโll cover
- When to use web scraping vs. APIs, and the trade-offs of each approach.
- The ethical and legal considerations of collecting data from websites.
- Using Pythonโs Scrapy library to parse HTML from web pages.
- Extracting structured data from simple HTML layouts.
Afternoon Class
2.00pm - 5.00pm
๐ฆธ Super Tech Support
Get help with your project
๐๏ธ Day 03
(Wed 30 Jul)
Building a Report Website (+ Dashboards)
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Create interactive charts using Plotly
- Build a simple data dashboard using Streamlit
Morning Lecture
10.00am - 1.00pm
What weโll cover
- Moving beyond static plots to create interactive visualisations using Plotly.
- Building simple web dashboards using Streamlit to showcase your data stories.
- Best practices for user interface design and interactive data presentation.
Afternoon Class
2.00pm - 5.00pm
๐ป Lab: Building interactive charts and dashboards
๐๏ธ Day 04
(Thu 31 Jul)
Data Storytelling & Project Management
๐ฅ Objectives
Review the goals for today
At the end of the day you should be able to:- Document a data project in a professional manner
- Establish a reproducible data analysis workflow
Morning Lecture
10.00am - 1.00pm
What weโll cover
- Practical advice for managing collaborative projects with Git and GitHub.
- The importance of data versioning and ensuring your analysis is reproducible.
- Best practices for setting up a well-documented and professional project repository.
Afternoon Class
2.00pm - 5.00pm
๐ฆธ Super Tech Support: Final project work session
๐๏ธ Day 05
(Fri 01 Aug)
โณ Deadline:
Submit your final project by 6pm today. โ