๐Ÿ’ป Week 04 Lab

From NumPy to Pandas

Author
Published

13 February 2025

๐Ÿฅ… Learning Goals
By the end of this lab, you should understand: i) The limitations of nested np.where() operations, ii) How Pandas can make complex logic more readable, iii) When to use NumPy vs when to use Pandas, iv) The value of writing maintainable code.
Image representing data transformation and discovery themes.

Last Updated: 13 February 2025, 15:00 GMT

๐Ÿ“Time and Location: Friday, 14 February 2025. Check your timetable for the precise time and location of your class.

๐Ÿ“‹ Preparation

Before starting this lab, make sure you:

  • Have completed the ๐Ÿ“ W04 Formative Exercise
  • Are caught up with ๐Ÿ—ฃ๏ธ W04 Lecture content
  • Have your notes about NumPy and Pandas up-to-date on Nuvolos

๐Ÿ“ฃ We will play with pair programming today.

One of you will be the ๐Ÿง‘โ€โœˆ๏ธ Pilot, the person typing the code. The other(s) will act as ๐Ÿ™‹ Copilot (s) and will help guide the Pilot.

๐Ÿ’ก TIP 1: You can swap roles between Part I and Part II of this lab.

๐Ÿ’ก TIP 2: If you are playing the role of ๐Ÿ™‹ Copilot, resist the urge to take over if the Pilot is struggling. Instead, try to explain your thinking differently until you both understand what to do.

๐Ÿ›ฃ๏ธ Lab Roadmap

As soon as you arrive in class, you can start setting up your notebook. Your class teacher will mediate the next parts of the session.

Set Up (15 mins)

Note to class teachers: Tell students that we will do a pair programming exercise soon and if they already have friends in this class, they can already sit together. First, though, ask them to follow the instructions below to set up their notebook.

First, letโ€™s create a new notebook and structure it properly, with a good mix of Markdown and Python cells.

๐ŸŽฏ ACTION POINTS:

  1. Open VS Code in Nuvolos

  2. Navigate to the folder โ€œWeek 04 - Intro to Pandasโ€

  3. Create a new Jupyter Notebook:

    • Right-click the folder and select โ€œNew Fileโ€
    • Name it โ€œW04 Lab - NumPy vs Pandas.ipynbโ€
  4. Set up your notebook with these cells:

    Cell 1 (Markdown): Add the title and your details

    # W04 Lab: From NumPy to Pandas
    
    **Author:** [Your Name]
    **Date:** 14 February 2025

    Cell 2 (Markdown): Add a section for imports

    ## Library Imports

    Cell 3 (Python): Add the required libraries

    import numpy as np
    import pandas as pd
    import requests

    Cell 4 (Markdown): Add a section for data collection

    ## Data Collection
    
    Collecting weather data for London (Summer 2024) using the Open-Meteo API.

    Cell 5 (Python): Add the data collection code

    # Collect weather data
    url = "https://archive-api.open-meteo.com/v1/archive"
    params = {
        "latitude": 51.5085,  # London
        "longitude": -0.1257,
        "start_date": "2024-06-01",
        "end_date": "2024-08-31",
        "daily": ["temperature_2m_max", "rain_sum"],
        "timezone": "GMT"
    }
    
    response = requests.get(url, params=params)
    weather_data = response.json()
    
    # Extract the data we need
    temperatures = weather_data['daily']['temperature_2m_max']
    rainfall = weather_data['daily']['rain_sum']
    dates = weather_data['daily']['time']

    Cell 6 (Markdown): Add a section for Part I

    ## Part I: The NumPy Approach
    
    To practice moving away from lists and using vectorised operations, we'll first create a classification system using `np.where()`:

This structure will help keep your notebook organized and easy to follow.

Part I: The Limits of NumPy (30 mins)

Note to class teachers: This is where students often struggle with nested np.where(). If many are stuck, briefly show the lecture example (slides and/or notebook). Your role is to encourage discussion between pairs rather than giving solutions. Watch for Copilots who try to take over typing - remind them their role is to explain their thinking. Weโ€™re preparing them to work as a team in the future.

๐Ÿง‘โ€โœˆ๏ธ Pilot and ๐Ÿ™‹ Copilot(s) will work together to classify Londonโ€™s summer days based on temperature and rainfall.

Letโ€™s look at our classification system:

Category Temperature Condition Precipitation Condition
Hot & Dry > 25ยฐC < 1mm
Hot & Wet > 25ยฐC โ‰ฅ 1mm
Mild & Dry 20-25ยฐC < 1mm
Mild & Wet 20-25ยฐC โ‰ฅ 1mm
Cool < 20ยฐC any

๐ŸŽฏ ACTION POINTS:

  1. ๐Ÿ‘ฅ Together Discuss the classification rules and plan how to implement them with np.where()
  2. ๐Ÿง‘โ€โœˆ๏ธ Pilot Implement the classification using np.where()
  3. ๐Ÿ™‹ Copilot(s) Document any challenges you face in the process

BE WARNED: the point of this exercise is to practice writing code that works on arrays and subtitutes long if-else statements. The code WILL look ugly, though. It is part of the learning process. In part II you will write more elegant code. (Those who came to ๐Ÿ—ฃ๏ธ W04 Lecture will understand better why we are doing this.)

Part II: The Pandas Solution (30 mins)

Note to class teachers: The key insight here is structuring the function. If pairs are jumping straight to coding, pause them and encourage the โ€œTogetherโ€ brainstorming step. Look for good examples of clean Pandas solutions to share in the final discussion.

You might want to swap roles with your partner at this point.

In this part, youโ€™re going to rewrite our weather classification system, this time using Pandas DataFrames instead of NumPy arrays. We hope that by the end youโ€™ll see how the resulting code is more readable and easier to understand (even though you might struggle to produce it in the first go).

You will need to:

  • Convert your NumPy arrays to a Pandas DataFrame
  • Create a custom function that takes a row of data as input
  • Test the function with a single row of data before applying it to the whole DataFrame
  • Use the .apply() method to classify all the weather data at once

๐ŸŽฏ ACTION POINTS:

  1. ๐Ÿง‘โ€โœˆ๏ธ Pilot Convert your NumPy arrays to a Pandas DataFrame
  2. ๐Ÿ‘ฅ Together Plan your function: what inputs it needs and what category it should return for each combination
  3. ๐Ÿง‘โ€โœˆ๏ธ Pilot Write the function and use .apply() to classify all days
  4. ๐Ÿ™‹ Copilot(s) Compare this solution with the NumPy version - which is easier to understand?

Part III: Class Discussion (15 mins)

Note to class teachers: Share 1-2 contrasting examples from the class - ideally one complex NumPy solution and one clean Pandas solution. Guide discussion toward maintainability and readability rather than just getting the right answer. If time permits, ask how they would modify each solution to add a new weather category.

Your class teacher will:

  1. Show both solutions side by side
  2. Lead a discussion about the pros and cons of each approach
  3. Collect your thoughts on:
    • Which approach was easier to implement?
    • Which code is easier to read?
    • Which would be easier to modify if we added new categories?

You should now be ready for the โœ๏ธ Mini Project 1, your first graded assignment due 27 February 8pm.

๐Ÿก Take-Home Activity

If you didnโ€™t finish the lab tasks during class time:

  1. Complete both the NumPy and Pandas implementations if you didnโ€™t manage to do so in class

  2. Add your own notes about NumPy and Pandas to the โ€œWeek 04 - Intro to Pandasโ€ folder on Nuvolos, then commit and push your changes so that you can review them later

    • Compare the syntax differences between NumPy and Pandas
    • Note which operations were easier/harder in each library
    • Document any error messages you encountered and how you fixed them
    • Save examples of both working solutions for future reference
  3. Think about when you might prefer one approach over the other

    • Consider how โ€˜self-explanatoryโ€™ your code is (do you need a minute to understand it? or you straight away understand it?)
    • Think about performance (time and memory) differences between the two approaches
    • Reflect on which version would be easier to explain to others

This will help prepare you for future assignments where youโ€™ll need to make similar implementation choices.