๐Ÿ““ Syllabus

LSE ME204 (2025) โ€“ Data Engineering for the Social World

Author
ME204 course icon

Welcome to ME204! This is the course syllabus for the 2025 edition of the course.

Note from Jon:

The actual syllabus of the course has a sequence that differs from the one you saw in the course outline at the time you signed up for ME204.

I always edit syllabi when I get closer to deliver my courses because I have new, better ideas of how things will flow based on most recent teaching experience and things that happened in industry, etc.

Week 01 | Foundations & First Results

(14 July - 20 July)

This is a very practical course from Day 01. During this first week, you will jump straight into hands-on work, learning to collect data, use professional tools, and produce your first data visualisations.

๐Ÿ—“๏ธ Day 01
(Mon 14 Jul)

Foundations of Data Wrangling with Python

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Describe the role of data engineering in social science
  • Understand the course structure, tools, and expectations
  • Familiarise yourself with the development environment of the course (Nuvolos)
  • Write some basic Python commands inside the Nuvolos environment

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Foundations of Data Wrangling with Python
What weโ€™ll cover
  • An overview of the course, its goals, and the roles of data engineering.
  • A group exercise mapping the data pipeline for a real-world case study.
  • An introduction to the course tools (Python, Git, Nuvolos) and assessment structure.
  • A discussion on the courseโ€™s AI policy and expectations for student engagement.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Python Foundations Practice

๐Ÿ—“๏ธ Day 02
(Tue 15 Jul)

Working with Data Types and APIs

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Load and inspect data in CSV and JSON formats
  • Fetch weather data from the OpenMeteo API

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Working with Data Types and APIs
What weโ€™ll cover
  • Distinguishing between structured, semi-structured, and unstructured data.
  • An overview of common file formats like CSV and JSON, and how to work with them.
  • An introduction to Application Programming Interfaces (APIs) and their importance.
  • Using Pythonโ€™s requests library to fetch live data from the OpenMeteo weather API.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Working with CSV/JSON and collecting weather data

๐Ÿ—“๏ธ Day 03
(Wed 16 Jul)

Version Control and Professional Documentation

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Create a new Git repository and publish it to GitHub
  • Commit your first project files
  • Write professional documentation using Markdown

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Version Control with Git and Markdown documentation
What weโ€™ll cover
  • An introduction to the โ€œwhyโ€ and โ€œhowโ€ of version control with Git.
  • Core Git commands: committing files, pushing changes to a remote repository, and branching.
  • Using GitHub to collaborate and host your projects.
  • Writing clear project documentation using Markdown.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Setting up your first repository and documenting it

๐Ÿ—“๏ธ Day 04
(Thu 17 Jul)

Creating Your First Data Visualisations

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Translate a sketched visualisation idea into functioning Python code using AI assistance.
  • Navigate and draw inspiration from professional data visualisation galleries (Matplotlib and Seaborn).
  • Critically evaluate the effectiveness of a data visualisation.
  • Use an AI assistant to refactor and improve existing plotting code.
  • Understand the workflow of progressing from raw data to a final, insightful visualisation.

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ From Data to Insight: A Visualisation Workshop
What weโ€™ll cover
  • A reverse-engineered walkthrough of a data analysis, starting from the final plots.
  • A group activity to find and critique real-world data visualisations.
  • A hands-on mini-competition to apply visualisation principles in practice.
  • Framing todayโ€™s work as creating your โ€œfirst visualisations, not your ultimate ones.โ€

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: The Visualisation Workflow (Sketch, Explore, Refactor)

๐Ÿ’‚ Enjoy!
(Fri - Sun)

No classes on Friday

Many students who attend the LSE Summer School students are visiting London for the first time. If that is your case, here is a suggestion of something to do on a day out in London:

  • Take the 15 double-decker bus at its starting point in Trafalgar Square
  • Enjoy views of some iconic London landmarks: the Royal Courts of Justice (near LSE), the St Paulโ€™s Cathedral, Tower of London, the Tower Bridge
  • Get off the bus at โ€˜Aldgate Eastโ€™ stop
  • Then walk up to Brick Lane. If you are hungry, I recommend a visit to the Upmarket Food Hall or if you feel like queueing, visit the famous The Beigel Shop

Week 02 | Analysis & Automation

(21 July - 27 July)

In our second week, we will shift our focus to efficiency and more advanced data handling. You will learn to write faster, cleaner code and work with authenticated APIs and databases, which are essential skills for real-world data projects.

๐Ÿ—“๏ธ Day 01
(Mon 21 Jul)

From Loops to Vectorisation

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Convert Python loops to vectorised pandas operations
  • Refactor data collection code for efficiency

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ From Loops to Vectorisation
What weโ€™ll cover
  • Understanding the concept of vectorised operations and why they are more efficient than loops.
  • Using pandas to replace slow, manual loops with fast, vectorised equivalents.
  • Leveraging Generative AI to help refactor and optimise data transformation code.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Refactoring loops for efficiency

๐Ÿ—“๏ธ Day 02
(Tue 22 Jul)

Refactoring Workshop

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Prepare and submit the midterm assignment
  • Address specific challenges based on a review of student work

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Refactoring Workshop & Midterm Q&A
What weโ€™ll cover
  • A guided workshop to help you apply vectorisation concepts to your own code.
  • Troubleshoot problems with Git/GitHub
  • Q&A session to address any questions about the midterm assignment.

โŒ› Midterm Deadline

Your midterm assignment is due by 8pm today.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Midterm Submission Support

๐Ÿ—“๏ธ Day 03
(Wed 23 Jul)

API Authentication Patterns

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Understand common API authentication methods (API keys, Basic Auth, Bearer tokens, OAuth 2.0)
  • Apply authentication patterns to collect data from APIs like Reddit
  • Implement secure credential management using .env files
  • Manage API credentials safely in your GitHub repositories
  • Recognise when different authentication methods are appropriate for different APIs

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ API Authentication Patterns
What weโ€™ll cover
  • Theory (Part 1): REST API fundamentals, HTTP headers vs parameters, and security principles for credential management.
  • Practice (Part 2): Live Reddit developer account setup, hands-on authentication implementation, and secure repository management.
  • Theory (Part 3): Comprehensive comparison of authentication patterns and their real-world applications.
  • A reference toolkit approach: building copy-pasteable patterns you can use with any authenticated API.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Mastering API Pagination

๐Ÿ—“๏ธ Day 04
(Thu 24 Jul)

Data Reshaping: pandas vs SQL

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Join datasets using both pandas and SQL
  • Compare the two different methods for data merging

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Data Reshaping: pandas vs SQL
What weโ€™ll cover
  • The logic of joining data from different sources (inner, outer, left, and right joins).
  • Combining datasets using both pandas in Python and directly with SQL.
  • Troubleshooting common issues that arise when merging real-world data.

๐Ÿ“ข Project Announcement

The final project requirements will be revealed.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Designing good exploratory questions

๐Ÿ’‚ Enjoy!
(Fri - Sun)

No classes on Friday

This is a great opportunity to make progress on your final project, but donโ€™t forget to take a break and explore the city! Another suggestion is a visit to the Borough Market, one of the largest and oldest food markets in London.

Week 03 | Databases & Pipelines

(28 July - 01 Aug)

The final week is about building robust data systems and communicating your findings effectively. You will learn how to design databases, scrape data from the web, and create interactive dashboards to present your analysis in a compelling way.

๐Ÿ—“๏ธ Day 01
(Mon 28 Jul)

Designing Databases and Merging Data

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Design a database schema for an SQLite database
  • Join datasets using pandas and/or SQL

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Designing Databases and Merging Data
What weโ€™ll cover
  • Database fundamentals: tables, primary keys, and foreign keys.
  • When to choose a database over a simple file-based system.
  • Writing basic SQL queries to select, filter, group, and order data.
  • Querying an SQLite database directly from Python.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Designing schemas and joining data

๐Ÿ—“๏ธ Day 02
(Tue 29 Jul)

A Primer on Web Scraping and Text Mining

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Parse HTML content from a web page
  • Extract data from public websites (web scraping)

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ A Primer on Web Scraping and Text Mining
What weโ€™ll cover
  • When to use web scraping vs. APIs, and the trade-offs of each approach.
  • The ethical and legal considerations of collecting data from websites.
  • Using Pythonโ€™s Scrapy library to parse HTML from web pages.
  • Extracting structured data from simple HTML layouts.

Afternoon Class
2.00pm - 5.00pm

๐Ÿฆธ Super Tech Support
Get help with your project

๐Ÿ—“๏ธ Day 03
(Wed 30 Jul)

Building a Report Website (+ Dashboards)

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Create interactive charts using Plotly
  • Build a simple data dashboard using Streamlit

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Building your project website
What weโ€™ll cover
  • Moving beyond static plots to create interactive visualisations using Plotly.
  • Building simple web dashboards using Streamlit to showcase your data stories.
  • Best practices for user interface design and interactive data presentation.

Afternoon Class
2.00pm - 5.00pm

๐Ÿ’ป Lab: Building interactive charts and dashboards

๐Ÿ—“๏ธ Day 04
(Thu 31 Jul)

Data Storytelling & Project Management

๐Ÿฅ… Objectives

Review the goals for today At the end of the day you should be able to:
  • Document a data project in a professional manner
  • Establish a reproducible data analysis workflow

Morning Lecture
10.00am - 1.00pm

๐Ÿ–ฅ๏ธ Data Storytelling & Project Management
What weโ€™ll cover
  • Practical advice for managing collaborative projects with Git and GitHub.
  • The importance of data versioning and ensuring your analysis is reproducible.
  • Best practices for setting up a well-documented and professional project repository.

Afternoon Class
2.00pm - 5.00pm

๐Ÿฆธ Super Tech Support: Final project work session

๐Ÿ—“๏ธ Day 05
(Fri 01 Aug)

โณ Deadline:

Submit your final project by 6pm today. โœ