πŸ““ Syllabus

LSE ME204 (2025) – Data Engineering for the Social World

Author
ME204 course icon

Welcome to ME204! This is the course syllabus for the 2025 edition of the course.

Note from Jon:

The actual syllabus of the course has a sequence that differs from the one you saw in the course outline at the time you signed up for ME204.

I always edit syllabi when I get closer to deliver my courses because I have new, better ideas of how things will flow based on most recent teaching experience and things that happened in industry, etc.

Week 01 | Foundations & First Results

(14 July - 20 July)

This is a very practical course from Day 01. During this first week, you will jump straight into hands-on work, learning to collect data, use professional tools, and produce your first data visualisations.

πŸ—“οΈ Day 01
(Mon 14 Jul)

Foundations of Data Wrangling with Python

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Describe the role of data engineering in social science
  • Understand the course structure, tools, and expectations
  • Familiarise yourself with the development environment of the course (Nuvolos)
  • Write some basic Python commands inside the Nuvolos environment

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Foundations of Data Wrangling with Python
What we’ll cover
  • An overview of the course, its goals, and the roles of data engineering.
  • A group exercise mapping the data pipeline for a real-world case study.
  • An introduction to the course tools (Python, Git, Nuvolos) and assessment structure.
  • A discussion on the course’s AI policy and expectations for student engagement.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Python Foundations Practice

πŸ—“οΈ Day 02
(Tue 15 Jul)

Working with Data Types and APIs

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Load and inspect data in CSV and JSON formats
  • Fetch weather data from the OpenMeteo API

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Working with Data Types and APIs
What we’ll cover
  • Distinguishing between structured, semi-structured, and unstructured data.
  • An overview of common file formats like CSV and JSON, and how to work with them.
  • An introduction to Application Programming Interfaces (APIs) and their importance.
  • Using Python’s requests library to fetch live data from the OpenMeteo weather API.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Working with CSV/JSON and collecting weather data

πŸ—“οΈ Day 03
(Wed 16 Jul)

Version Control and Professional Documentation

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Create a new Git repository and publish it to GitHub
  • Commit your first project files
  • Write professional documentation using Markdown

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Version Control with Git and Markdown documentation
What we’ll cover
  • An introduction to the β€œwhy” and β€œhow” of version control with Git.
  • Core Git commands: committing files, pushing changes to a remote repository, and branching.
  • Using GitHub to collaborate and host your projects.
  • Writing clear project documentation using Markdown.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Setting up your first repository and documenting it

πŸ—“οΈ Day 04
(Thu 17 Jul)

Creating Your First Data Visualisations

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Create line plots, scatter plots, and bar charts
  • Use these visualisations for weather data analysis

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Creating Your First Data Visualisations
What we’ll cover
  • Fundamental principles of effective data visualisation.
  • Creating static plots like line charts, scatter plots, and bar charts with Matplotlib and Seaborn.
  • Guidelines for communicating insights clearly and avoiding common pitfalls.
  • Using Generative AI to help write plotting code efficiently.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Visualising your weather data

πŸ’‚ Enjoy!
(Fri - Sun)

No classes on Friday

Many students who attend the LSE Summer School students are visiting London for the first time. If that is your case, here is a suggestion of something to do on a day out in London:

  • Take the 15 double-decker bus at its starting point in Trafalgar Square
  • Enjoy views of some iconic London landmarks: the Royal Courts of Justice (near LSE), the St Paul’s Cathedral, Tower of London, the Tower Bridge
  • Get off the bus at β€˜Aldgate East’ stop
  • Then walk up to Brick Lane. If you are hungry, I recommend a visit to the Upmarket Food Hall or if you feel like queueing, visit the famous The Beigel Shop

Week 02 | Analysis & Automation

(21 July - 27 July)

In our second week, we will shift our focus to efficiency and more advanced data handling. You will learn to write faster, cleaner code and work with authenticated APIs and databases, which are essential skills for real-world data projects.

πŸ—“οΈ Day 01
(Mon 21 Jul)

From Loops to Vectorisation

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Convert Python loops to vectorised pandas operations
  • Refactor data collection code for efficiency

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« From Loops to Vectorisation
What we’ll cover
  • Understanding the concept of vectorised operations and why they are more efficient than loops.
  • Using pandas to replace slow, manual loops with fast, vectorised equivalents.
  • Leveraging Generative AI to help refactor and optimise data transformation code.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Refactoring loops for efficiency

πŸ—“οΈ Day 02
(Tue 22 Jul)

Refactoring Workshop

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Prepare and submit the midterm assignment
  • Address specific challenges based on a review of student work

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Refactoring Workshop & Midterm Q&A
What we’ll cover
  • A guided workshop to help you apply vectorisation concepts to your own code.
  • Troubleshoot problems with Git/GitHub
  • Q&A session to address any questions about the midterm assignment.

βŒ› Midterm Deadline

Your midterm assignment is due by 8pm today.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Midterm Submission Support

πŸ—“οΈ Day 03
(Wed 23 Jul)

Authenticated APIs & SQL Primer

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Access data from an authenticated API (e.g., Reddit)
  • Write introductory database queries using SQL

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Accessing Authenticated APIs and a Primer on SQL
What we’ll cover
  • An overview of common API authentication methods like API keys and OAuth.
  • Hands-on practice with an authenticated API (e.g., Spotify, Reddit, or YouTube).
  • Techniques for parsing the nested JSON data that APIs often return.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Working with Authenticated APIs and SQL

πŸ—“οΈ Day 04
(Thu 24 Jul)

Data Reshaping: pandas vs SQL

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Join datasets using both pandas and SQL
  • Compare the two different methods for data merging

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Data Reshaping: pandas vs SQL
What we’ll cover
  • The logic of joining data from different sources (inner, outer, left, and right joins).
  • Combining datasets using both pandas in Python and directly with SQL.
  • Troubleshooting common issues that arise when merging real-world data.

πŸ“’ Project Announcement

The final project requirements will be revealed.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Designing good exploratory questions

πŸ’‚ Enjoy!
(Fri - Sun)

No classes on Friday

This is a great opportunity to make progress on your final project, but don’t forget to take a break and explore the city! Another suggestion is a visit to the Borough Market, one of the largest and oldest food markets in London.

Week 03 | Databases & Pipelines

(28 July - 01 Aug)

The final week is about building robust data systems and communicating your findings effectively. You will learn how to design databases, scrape data from the web, and create interactive dashboards to present your analysis in a compelling way.

πŸ—“οΈ Day 01
(Mon 28 Jul)

Designing Databases and Merging Data

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Design a database schema for an SQLite database
  • Join datasets using pandas and/or SQL

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Designing Databases and Merging Data
What we’ll cover
  • Database fundamentals: tables, primary keys, and foreign keys.
  • When to choose a database over a simple file-based system.
  • Writing basic SQL queries to select, filter, group, and order data.
  • Querying an SQLite database directly from Python.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Designing schemas and joining data

πŸ—“οΈ Day 02
(Tue 29 Jul)

A Primer on Web Scraping and Text Mining

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Parse HTML content from a web page
  • Extract data from public websites (web scraping)

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« A Primer on Web Scraping and Text Mining
What we’ll cover
  • When to use web scraping vs. APIs, and the trade-offs of each approach.
  • The ethical and legal considerations of collecting data from websites.
  • Using Python’s Scrapy library to parse HTML from web pages.
  • Extracting structured data from simple HTML layouts.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Supervised web scraping and text analysis practice

πŸ—“οΈ Day 03
(Wed 30 Jul)

Building Interactive Visualisations and Dashboards

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Create interactive charts using Plotly
  • Build a simple data dashboard using Streamlit

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Building Interactive Visualisations and Dashboards
What we’ll cover
  • Moving beyond static plots to create interactive visualisations using Plotly.
  • Building simple web dashboards using Streamlit to showcase your data stories.
  • Best practices for user interface design and interactive data presentation.

Afternoon Class
2.00pm - 5.00pm

πŸ’» Lab: Building interactive charts and dashboards

πŸ—“οΈ Day 04
(Thu 31 Jul)

Data Storytelling & Project Management

πŸ₯… Objectives

Review the goals for today At the end of the day you should be able to:
  • Document a data project in a professional manner
  • Establish a reproducible data analysis workflow

Morning Lecture
10.00am - 1.00pm

πŸ§‘β€πŸ« Data Storytelling & Project Management
What we’ll cover
  • Practical advice for managing collaborative projects with Git and GitHub.
  • The importance of data versioning and ensuring your analysis is reproducible.
  • Best practices for setting up a well-documented and professional project repository.

Afternoon Class
2.00pm - 5.00pm

🦸 Super Tech Support: Final project work session

πŸ—“οΈ Day 05
(Fri 01 Aug)

⏳ Deadline:

Submit your final project by 6pm today. βœ