💻 Week 02, Day 01 - Lab

From Loops to Vectorisation: Makeover Puzzles

Author

Dr Jon Cardoso-Silva

Last updated

21 July 2025

🥅 Learning Objectives

By the end of this lab, you should be able to: i) Use a list comprehension to process a list of datasets. ii) Combine a list of DataFrames into a single one using pd.concat. iii) Apply the .diff() and .cumsum() pattern to identify streaks in data. iv) Use groupby() and transform() to calculate group-specific statistics for your analysis.

ME204 course icon

In this morning’s lecture, we saw how to replace slow for loops with list comprehensions and powerful pandas vectorised operations. Now, it’s your turn to get some hands-on practice.

This lab contains two “puzzles” that challenge you to apply these new techniques to realistic problems.

Monday, 21 July 2025 | Either 2:00-3.30pm or 3.30-5:00pm 📍 Check your timetable for the location of your class

🛣️ Lab Roadmap

This document outlines the exercises you’ll complete in your Jupyter Notebook. Open ME204_W02D01_Lab.ipynb and follow along.

A Note on the Midterm Assignment

Your midterm is due tomorrow. If you feel that you are behind on it, you can use today’s lab session to work on that instead.

Stuart will be running the lab session as planned, focusing on the two puzzles below. This is valuable practice, but your priority should be the midterm if you need the time.

Part I: Puzzle 1 - Sales Surge Detection (45 min)

Imagine you’re a data analyst for a retail company. You need to find periods of unusually high sales, which are called “sales surges.”

Your Task: Identify “sales surges” which are defined as periods of 3 or more consecutive days where a store’s daily_sales has exceeded 120% of that store’s median sales for the entire period.

🎯 Your Turn

In your notebook, you’ll find the code that generates the df_sales DataFrame.

Your task is to apply the multi-step pattern we learned in the lecture to find the surges. Think back to how we solved the heatwave problem:

  1. First, establish a baseline for each store.
  2. Then, identify the days that meet the “surge” condition.
  3. Group the consecutive surge days together.
  4. Count the length of each group.
  5. Finally, create a report showing only the valid surges.

Part II: Puzzle 2 - Student Performance Streaks (45 min)

For the second puzzle, you’re an analyst trying to find students who are on an “improvement streak.”

Your Task: Find “improvement streaks”, which here are defined as periods of 3 or more consecutive assessments where a student’s score in a subject has increased compared to their previous one.

🎯 Your Turn

The notebook also has a function that creates a df_grades DataFrame.

The logical pattern here is almost identical to the first puzzle. The key differences are:

  1. You’ll need to groupby() both the student and the subject to analyse them separately.
  2. Your condition for “improvement” will depend on the difference from the previous assessment (.diff()).

Your final report should be a summary table showing each valid streak and the total score improvement during that streak.