๐ป Week 04 Lab
From NumPy to Pandas
data:image/s3,"s3://crabby-images/7fdd0/7fdd0050b7606db98e8d2078680d19e8aa594be3" alt="Image representing data transformation and discovery themes."
Last Updated: 13 February 2025, 15:00 GMT
๐Time and Location: Friday, 14 February 2025. Check your timetable for the precise time and location of your class.
๐ Preparation
Before starting this lab, make sure you:
- Have completed the ๐ W04 Formative Exercise
- Are caught up with ๐ฃ๏ธ W04 Lecture content
- Have your notes about NumPy and Pandas up-to-date on Nuvolos
๐ฃ We will play with pair programming today.
One of you will be the ๐งโโ๏ธ Pilot, the person typing the code. The other(s) will act as ๐ Copilot (s) and will help guide the Pilot.
๐ก TIP 1: You can swap roles between Part I and Part II of this lab.
๐ก TIP 2: If you are playing the role of ๐ Copilot, resist the urge to take over if the Pilot is struggling. Instead, try to explain your thinking differently until you both understand what to do.
๐ฃ๏ธ Lab Roadmap
As soon as you arrive in class, you can start setting up your notebook. Your class teacher will mediate the next parts of the session.
Set Up (15 mins)
Note to class teachers: Tell students that we will do a pair programming exercise soon and if they already have friends in this class, they can already sit together. First, though, ask them to follow the instructions below to set up their notebook.
First, letโs create a new notebook and structure it properly, with a good mix of Markdown and Python cells.
๐ฏ ACTION POINTS:
Open VS Code in Nuvolos
Navigate to the folder โWeek 04 - Intro to Pandasโ
Create a new Jupyter Notebook:
- Right-click the folder and select โNew Fileโ
- Name it โW04 Lab - NumPy vs Pandas.ipynbโ
Set up your notebook with these cells:
Cell 1 (Markdown): Add the title and your details
# W04 Lab: From NumPy to Pandas [Your Name] **Author:** **Date:** 14 February 2025
Cell 2 (Markdown): Add a section for imports
## Library Imports
Cell 3 (Python): Add the required libraries
import numpy as np import pandas as pd import requests
Cell 4 (Markdown): Add a section for data collection
## Data Collection Collecting weather data for London (Summer 2024) using the Open-Meteo API.
Cell 5 (Python): Add the data collection code
# Collect weather data = "https://archive-api.open-meteo.com/v1/archive" url = { params "latitude": 51.5085, # London "longitude": -0.1257, "start_date": "2024-06-01", "end_date": "2024-08-31", "daily": ["temperature_2m_max", "rain_sum"], "timezone": "GMT" } = requests.get(url, params=params) response = response.json() weather_data # Extract the data we need = weather_data['daily']['temperature_2m_max'] temperatures = weather_data['daily']['rain_sum'] rainfall = weather_data['daily']['time'] dates
Cell 6 (Markdown): Add a section for Part I
## Part I: The NumPy Approach `np.where()`: To practice moving away from lists and using vectorised operations, we'll first create a classification system using
This structure will help keep your notebook organized and easy to follow.
Part I: The Limits of NumPy (30 mins)
Note to class teachers: This is where students often struggle with nested
np.where()
. If many are stuck, briefly show the lecture example (slides and/or notebook). Your role is to encourage discussion between pairs rather than giving solutions. Watch for Copilots who try to take over typing - remind them their role is to explain their thinking. Weโre preparing them to work as a team in the future.
๐งโโ๏ธ Pilot and ๐ Copilot(s) will work together to classify Londonโs summer days based on temperature and rainfall.
Letโs look at our classification system:
Category | Temperature Condition | Precipitation Condition |
---|---|---|
Hot & Dry | > 25ยฐC | < 1mm |
Hot & Wet | > 25ยฐC | โฅ 1mm |
Mild & Dry | 20-25ยฐC | < 1mm |
Mild & Wet | 20-25ยฐC | โฅ 1mm |
Cool | < 20ยฐC | any |
๐ฏ ACTION POINTS:
- ๐ฅ Together Discuss the classification rules and plan how to implement them with
np.where()
- ๐งโโ๏ธ Pilot Implement the classification using
np.where()
- ๐ Copilot(s) Document any challenges you face in the process
BE WARNED: the point of this exercise is to practice writing code that works on arrays and subtitutes long if-else
statements. The code WILL look ugly, though. It is part of the learning process. In part II you will write more elegant code. (Those who came to ๐ฃ๏ธ W04 Lecture will understand better why we are doing this.)
Part II: The Pandas Solution (30 mins)
Note to class teachers: The key insight here is structuring the function. If pairs are jumping straight to coding, pause them and encourage the โTogetherโ brainstorming step. Look for good examples of clean Pandas solutions to share in the final discussion.
You might want to swap roles with your partner at this point.
In this part, youโre going to rewrite our weather classification system, this time using Pandas DataFrames instead of NumPy arrays. We hope that by the end youโll see how the resulting code is more readable and easier to understand (even though you might struggle to produce it in the first go).
You will need to:
- Convert your NumPy arrays to a Pandas DataFrame
- Create a custom function that takes a row of data as input
- Test the function with a single row of data before applying it to the whole DataFrame
- Use the
.apply()
method to classify all the weather data at once
๐ฏ ACTION POINTS:
- ๐งโโ๏ธ Pilot Convert your NumPy arrays to a Pandas DataFrame
- ๐ฅ Together Plan your function: what inputs it needs and what category it should return for each combination
- ๐งโโ๏ธ Pilot Write the function and use
.apply()
to classify all days - ๐ Copilot(s) Compare this solution with the NumPy version - which is easier to understand?
Part III: Class Discussion (15 mins)
Note to class teachers: Share 1-2 contrasting examples from the class - ideally one complex NumPy solution and one clean Pandas solution. Guide discussion toward maintainability and readability rather than just getting the right answer. If time permits, ask how they would modify each solution to add a new weather category.
Your class teacher will:
- Show both solutions side by side
- Lead a discussion about the pros and cons of each approach
- Collect your thoughts on:
- Which approach was easier to implement?
- Which code is easier to read?
- Which would be easier to modify if we added new categories?
You should now be ready for the โ๏ธ Mini Project 1, your first graded assignment due 27 February 8pm.
๐ก Take-Home Activity
If you didnโt finish the lab tasks during class time:
Complete both the NumPy and Pandas implementations if you didnโt manage to do so in class
Add your own notes about NumPy and Pandas to the โWeek 04 - Intro to Pandasโ folder on Nuvolos, then commit and push your changes so that you can review them later
- Compare the syntax differences between NumPy and Pandas
- Note which operations were easier/harder in each library
- Document any error messages you encountered and how you fixed them
- Save examples of both working solutions for future reference
Think about when you might prefer one approach over the other
- Consider how โself-explanatoryโ your code is (do you need a minute to understand it? or you straight away understand it?)
- Think about performance (time and memory) differences between the two approaches
- Reflect on which version would be easier to explain to others
This will help prepare you for future assignments where youโll need to make similar implementation choices.