💻 Week 04 Lab

From NumPy to Pandas: Pair Programming

Author

Dr Jon Cardoso-Silva

Published

13 December 2025

🥅 Learning Goals

By the end of this lab, you should be able to: i) Experience the limitations of nested np.where() operations, ii) Use Pandas vectorised boolean operations for complex logic, iii) Chain multiple conditions with & and | operators, iv) Apply conditional logic to create new DataFrame columns efficiently.

This lab builds directly on 🖥️ W04 Lecture where you learned about vectorisation with NumPy and Pandas. Today you’ll work in pairs to experience why Pandas vectorised operations are often clearer than nested np.where() calls.

📋 Preparation

Attend or at least watch the 🖥️ W04 Lecture, where you saw NumPy arrays, np.where(), and Pandas DataFrames
Find a partner for pair programming. You’ll swap roles halfway through the lab (your class teacher will mediate this)

🛣️ Lab Roadmap

Part	Activity Type	Focus	Time	Outcome
Part 0	🎯 ACTION POINTS	Setup & Roles	10 min	Download notebook + negotiate Pilot/Copilot roles
Part I	🎯 ACTION POINTS	NumPy Struggle	30 min	Experience nested `np.where()` complexity
Part II	🎯 ACTION POINTS	Pandas Vectorisation	30 min	Use boolean masks + conditional column creation
Wrap-Up	🗣️ TEACHING MOMENT	Reflection & Mini-Project 1	15 min	Compare approaches + preview assessment

💡 Today’s Design: The NumPy section is intentionally difficult. The goal is to experience the pain of nested conditionals before discovering how Pandas vectorised operations make the same logic clearer and more maintainable.

Part 0: Setup & Pair Programming Roles (10 min)

Note to class teachers: You can either assign pairs or let students negotiate roles. Emphasise that the less experienced person should be Pilot for Part I, and that you’ll swap roles for Part II. Circulate to ensure everyone has the notebook and data file.

Setup for Today:

Check that you can see the notebook and data file

These should be in your Nuvolos folder already. If not, use the buttons below to download the materials and save them to /files/week04/ on Nuvolos.

🌟 Optional: If you completed the W03 Extra Git/GitHub setup

If you followed the 🆕 W03 Extra: Step-by-step Git/GitHub instructions, you have a my-ds105a-notes repository!

You can copy these files there for backup:

# Copy notebook and data to your Git repository
cp /files/week04/W04-NB02-Lab-NumPy-to-Pandas.ipynb /files/my-ds105a-notes/week04/
cp /files/week04/data/london_summer_2024_weather.json /files/my-ds105a-notes/week04/data/

Then work from /files/my-ds105a-notes/week04/ instead. This way your work is version-controlled!

Negotiate roles
- 🧑‍✈️ Pilot: The person typing the code
- 🙋 Copilot: The person guiding and checking the code
Tip: The less experienced person should be Pilot for Part I. You’ll swap roles for Part II.
Open the notebook and introduce yourselves

Fill in your names and roles at the top of the notebook.

🔗 Today’s Challenge: You’ll classify London summer weather into categories like “Hot & Dry” and “Mild & Wet” using BOTH temperature and rainfall. First with NumPy (hard!), then with Pandas (cleaner!).

Part I: The NumPy Struggle (30 min)

Note to class teachers: Students work through Part I of the notebook (Steps 1-5). The Pilot types while the Copilot guides. Don’t rescue them from the nested np.where() struggle unless they’re completely stuck. This difficulty is pedagogically intentional. The reflection questions at Step 5 are critical, ensure pairs discuss them.

Part I Goal: Experience the limitations of nested np.where() for complex logic.

Your pair will work through:

Step 1: Create NumPy arrays from the weather data
Step 2: Understand np.where() with a simple example
Step 3: Implement nested np.where() for weather classification (this will be challenging!)
Step 4: Verify your results
Step 5: Reflect on the experience

Key Questions to Discuss:

How easy was it to write the nested np.where() code?
How easy is it to read and understand now that it’s written?
Could you debug this code if it had a mistake?
What would happen if you needed to add another variable to the classification?

🎯 ACTION POINTS

🧑‍✈️ Pilot Run the setup cells (Library Imports and Loading the Data) to load the summer weather data.
👥 Together Review the classification table to understand what you’re building.
🧑‍✈️ Pilot Work through Steps 1-4 with guidance from the Copilot.
- 🙋 Copilot: Help think through the logic, but don’t take over. The struggle is part of learning!
👥 Together Complete Step 5 (reflection questions) before moving on.

💭 Remember: If you’re struggling with nested np.where(), that’s exactly the point! This experience will make Part II much more satisfying.

Part II: The Pandas Vectorisation Solution (30 min)

Note to class teachers: Announce role swap before Part II starts. The previous Copilot is now the Pilot. Students work through Part II (Steps 1-6). This should feel more structured than Part I. The approach uses boolean masks and vectorised operations rather than nested conditionals. The final reflection comparing both approaches is essential. If groups finish early, encourage them to explore the Optional Extension on .loc[].

⚠️ SWAP ROLES! The Copilot from Part I is now the Pilot.

Part II Goal: Solve the same problem using Pandas DataFrames with vectorised boolean operations.

Your pair will work through:

Step 1: Create a DataFrame from the weather data
Step 2: Create boolean condition columns
Step 3: Build the classification using vectorised operations
Step 4: Verify the results match NumPy approach
Step 5: Compare code readability
Step 6: Final reflection on both approaches
Optional: Learn about .loc[] for more advanced conditional assignment

Key Insight: Pandas vectorised operations let you chain conditions with & and | operators, making complex logic more readable than nested np.where() calls.

🎯 ACTION POINTS

👥 Together Swap roles! Confirm who is the new Pilot.
🧑‍✈️ New Pilot Create the DataFrame (Step 1) and discuss the advantages with your Copilot.
🧑‍✈️ New Pilot Create boolean condition columns (Step 2). Use vectorised comparisons!
- 🙋 New Copilot: Notice how readable these boolean columns are compared to nested conditions.
🧑‍✈️ New Pilot Build the classification (Step 3) using these boolean columns.
👥 Together Complete Steps 4-6, verifying results and reflecting on both approaches.
👥 Together If time allows, explore the Optional Extension on .loc[] for conditional assignment.

Wrap-Up & Next Steps (15 min)

Note to class teachers: Lead a brief class discussion about the contrasts they experienced. Ask for volunteers to share their Part I vs Part II reflections. Then emphasise to students that they should use vectorisation approaches (.apply(), and the .groupby() they will learn next week). Inform students that the Mini Project I will be available at the end of the day - they can look out for a Moodle notification. Suggest that students who want to get started can collect data before revisiting this week’s concepts. Mention that W05 will cover groupby() and seaborn visualisation (also needed for Mini-Project 1).

Class Discussion:

Your class teacher will lead a brief discussion comparing the NumPy and Pandas approaches.

Looking Ahead:

✍️ Mini-Project 1 will be released today at 6pm (due Week 06, Thursday 8pm)
- Eager to start? You can explore the API and collect data while this week’s concepts sink in
- Next week covers more Pandas operations and seaborn visualisation (also needed for Mini-Project 1)
🖥️ W05 Lecture: Advanced data transformations and visualisation design
Reading Week (W06): Dedicated time to complete Mini-Project 1 and additional drop-in sessions will be available

🔗 Useful Resources

📊 Essential Guides

3️⃣ Data Science Workflow: Complete workflow stages
4️⃣ Git & GitHub Guide: Version control commands

💻 Course Materials

🖥️ W04 Lecture: Vectorisation with NumPy and Pandas
📝 W04 Practice: Your loop-based heatwave detection solution
✍️ Mini-Project 1: Your first assessed project (due Week 06)

🆘 Getting Help

Slack: Post questions to #help channel
Office Hours: Book via StudentHub
Check staff availability on ✋ Contact Hours

🌐 External Resources

NumPy Documentation: Array operations and np.where()
Pandas Documentation: DataFrames and .apply() method
Open-Meteo API: Weather data source

💡 Key Takeaway: For complex logic, Pandas vectorised operations with boolean masks are almost always clearer and more maintainable than nested NumPy conditionals. The structured approach with named boolean columns makes your logic transparent and debuggable!