DS105 2025-2026 Autumn Term Icon

πŸ’» Week 04 Lab

From NumPy to Pandas: Pair Programming

Author

Dr Jon Cardoso-Silva

Published

27 October 2025

πŸ₯… Learning Goals

By the end of this lab, you should be able to: i) Experience the limitations of nested np.where() operations, ii) Use Pandas vectorised boolean operations for complex logic, iii) Chain multiple conditions with & and | operators, iv) Apply conditional logic to create new DataFrame columns efficiently.

This lab builds directly on πŸ–₯️ W04 Lecture where you learned about vectorisation with NumPy and Pandas. Today you’ll work in pairs to experience why Pandas vectorised operations are often clearer than nested np.where() calls.

πŸ“‹ Preparation

  • Attend or at least watch the πŸ–₯️ W04 Lecture, where you saw NumPy arrays, np.where(), and Pandas DataFrames
  • Find a partner for pair programming. You’ll swap roles halfway through the lab (your class teacher will mediate this)

πŸ›£οΈ Lab Roadmap

Part Activity Type Focus Time Outcome
Part 0 🎯 ACTION POINTS Setup & Roles 10 min Download notebook + negotiate Pilot/Copilot roles
Part I 🎯 ACTION POINTS NumPy Struggle 30 min Experience nested np.where() complexity
Part II 🎯 ACTION POINTS Pandas Vectorisation 30 min Use boolean masks + conditional column creation
Wrap-Up πŸ—£οΈ TEACHING MOMENT Reflection & Mini-Project 1 15 min Compare approaches + preview assessment

πŸ’‘ Today’s Design: The NumPy section is intentionally difficult. The goal is to experience the pain of nested conditionals before discovering how Pandas vectorised operations make the same logic clearer and more maintainable.

Part 0: Setup & Pair Programming Roles (10 min)

Note to class teachers: You can either assign pairs or let students negotiate roles. Emphasise that the less experienced person should be Pilot for Part I, and that you’ll swap roles for Part II. Circulate to ensure everyone has the notebook and data file.

Setup for Today:

  1. Check that you can see the notebook and data file

    These should be in your Nuvolos folder already. If not, use the buttons below to download the materials and save them to /files/week04/ on Nuvolos.

🌟 Optional: If you completed the W03 Extra Git/GitHub setup

If you followed the πŸ†• W03 Extra: Step-by-step Git/GitHub instructions, you have a my-ds105a-notes repository!

You can copy these files there for backup:

# Copy notebook and data to your Git repository
cp /files/week04/W04-NB02-Lab-NumPy-to-Pandas.ipynb /files/my-ds105a-notes/week04/
cp /files/week04/data/london_summer_2024_weather.json /files/my-ds105a-notes/week04/data/

Then work from /files/my-ds105a-notes/week04/ instead. This way your work is version-controlled!

  1. Negotiate roles

    • πŸ§‘β€βœˆοΈ Pilot: The person typing the code
    • πŸ™‹ Copilot: The person guiding and checking the code

    Tip: The less experienced person should be Pilot for Part I. You’ll swap roles for Part II.

  2. Open the notebook and introduce yourselves

    Fill in your names and roles at the top of the notebook.

πŸ”— Today’s Challenge: You’ll classify London summer weather into categories like β€œHot & Dry” and β€œMild & Wet” using BOTH temperature and rainfall. First with NumPy (hard!), then with Pandas (cleaner!).

Part I: The NumPy Struggle (30 min)

Note to class teachers: Students work through Part I of the notebook (Steps 1-5). The Pilot types while the Copilot guides. Don’t rescue them from the nested np.where() struggle unless they’re completely stuck. This difficulty is pedagogically intentional. The reflection questions at Step 5 are critical, ensure pairs discuss them.

Part I Goal: Experience the limitations of nested np.where() for complex logic.

Your pair will work through:

  • Step 1: Create NumPy arrays from the weather data
  • Step 2: Understand np.where() with a simple example
  • Step 3: Implement nested np.where() for weather classification (this will be challenging!)
  • Step 4: Verify your results
  • Step 5: Reflect on the experience

Key Questions to Discuss:

  • How easy was it to write the nested np.where() code?
  • How easy is it to read and understand now that it’s written?
  • Could you debug this code if it had a mistake?
  • What would happen if you needed to add another variable to the classification?

🎯 ACTION POINTS

  1. πŸ§‘β€βœˆοΈ Pilot Run the setup cells (Library Imports and Loading the Data) to load the summer weather data.

  2. πŸ‘₯ Together Review the classification table to understand what you’re building.

  3. πŸ§‘β€βœˆοΈ Pilot Work through Steps 1-4 with guidance from the Copilot.

    • πŸ™‹ Copilot: Help think through the logic, but don’t take over. The struggle is part of learning!
  4. πŸ‘₯ Together Complete Step 5 (reflection questions) before moving on.

πŸ’­ Remember: If you’re struggling with nested np.where(), that’s exactly the point! This experience will make Part II much more satisfying.

Part II: The Pandas Vectorisation Solution (30 min)

Note to class teachers: Announce role swap before Part II starts. The previous Copilot is now the Pilot. Students work through Part II (Steps 1-6). This should feel more structured than Part I. The approach uses boolean masks and vectorised operations rather than nested conditionals. The final reflection comparing both approaches is essential. If groups finish early, encourage them to explore the Optional Extension on .loc[].

⚠️ SWAP ROLES! The Copilot from Part I is now the Pilot.

Part II Goal: Solve the same problem using Pandas DataFrames with vectorised boolean operations.

Your pair will work through:

  • Step 1: Create a DataFrame from the weather data
  • Step 2: Create boolean condition columns
  • Step 3: Build the classification using vectorised operations
  • Step 4: Verify the results match NumPy approach
  • Step 5: Compare code readability
  • Step 6: Final reflection on both approaches
  • Optional: Learn about .loc[] for more advanced conditional assignment

Key Insight: Pandas vectorised operations let you chain conditions with & and | operators, making complex logic more readable than nested np.where() calls.

🎯 ACTION POINTS

  1. πŸ‘₯ Together Swap roles! Confirm who is the new Pilot.

  2. πŸ§‘β€βœˆοΈ New Pilot Create the DataFrame (Step 1) and discuss the advantages with your Copilot.

  3. πŸ§‘β€βœˆοΈ New Pilot Create boolean condition columns (Step 2). Use vectorised comparisons!

    • πŸ™‹ New Copilot: Notice how readable these boolean columns are compared to nested conditions.
  4. πŸ§‘β€βœˆοΈ New Pilot Build the classification (Step 3) using these boolean columns.

  5. πŸ‘₯ Together Complete Steps 4-6, verifying results and reflecting on both approaches.

  6. πŸ‘₯ Together If time allows, explore the Optional Extension on .loc[] for conditional assignment.


Wrap-Up & Next Steps (15 min)

Note to class teachers: Lead a brief class discussion about the contrasts they experienced. Ask for volunteers to share their Part I vs Part II reflections. Then emphasise to students that they should use vectorisation approaches (.apply(), and the .groupby() they will learn next week). Inform students that the Mini Project I will be available at the end of the day - they can look out for a Moodle notification. Suggest that students who want to get started can collect data before revisiting this week’s concepts. Mention that W05 will cover groupby() and seaborn visualisation (also needed for Mini-Project 1).

Class Discussion:

Your class teacher will lead a brief discussion comparing the NumPy and Pandas approaches.

Looking Ahead:

  • ✍️ Mini-Project 1 will be released today at 6pm (due Week 06, Thursday 8pm)
    • Eager to start? You can explore the API and collect data while this week’s concepts sink in
    • Next week covers more Pandas operations and seaborn visualisation (also needed for Mini-Project 1)
  • πŸ–₯️ W05 Lecture: Advanced data transformations and visualisation design
  • Reading Week (W06): Dedicated time to complete Mini-Project 1 and additional drop-in sessions will be available

πŸ”— Useful Resources

πŸ“Š Essential Guides

πŸ’» Course Materials

  • πŸ–₯️ W04 Lecture: Vectorisation with NumPy and Pandas
  • πŸ“ W04 Practice: Your loop-based heatwave detection solution
  • ✍️ Mini-Project 1: Your first assessed project (due Week 06)

πŸ†˜ Getting Help

  • Slack: Post questions to #help channel
  • Office Hours: Book via StudentHub
  • Check staff availability on βœ‹ Contact Hours

🌐 External Resources

πŸ’‘ Key Takeaway: For complex logic, Pandas vectorised operations with boolean masks are almost always clearer and more maintainable than nested NumPy conditionals. The structured approach with named boolean columns makes your logic transparent and debuggable!