π» Week 04 Lab
From NumPy to Pandas: Pair Programming
By the end of this lab, you should be able to: i) Experience the limitations of nested np.where() operations, ii) Use Pandas vectorised boolean operations for complex logic, iii) Chain multiple conditions with & and | operators, iv) Apply conditional logic to create new DataFrame columns efficiently.
This lab builds directly on π₯οΈ W04 Lecture where you learned about vectorisation with NumPy and Pandas. Today youβll work in pairs to experience why Pandas vectorised operations are often clearer than nested np.where() calls.
π Preparation
- Attend or at least watch the π₯οΈ W04 Lecture, where you saw NumPy arrays,
np.where(), and Pandas DataFrames - Find a partner for pair programming. Youβll swap roles halfway through the lab (your class teacher will mediate this)
π£οΈ Lab Roadmap
| Part | Activity Type | Focus | Time | Outcome |
|---|---|---|---|---|
| Part 0 | π― ACTION POINTS | Setup & Roles | 10 min | Download notebook + negotiate Pilot/Copilot roles |
| Part I | π― ACTION POINTS | NumPy Struggle | 30 min | Experience nested np.where() complexity |
| Part II | π― ACTION POINTS | Pandas Vectorisation | 30 min | Use boolean masks + conditional column creation |
| Wrap-Up | π£οΈ TEACHING MOMENT | Reflection & Mini-Project 1 | 15 min | Compare approaches + preview assessment |
π‘ Todayβs Design: The NumPy section is intentionally difficult. The goal is to experience the pain of nested conditionals before discovering how Pandas vectorised operations make the same logic clearer and more maintainable.
Part 0: Setup & Pair Programming Roles (10 min)
Note to class teachers: You can either assign pairs or let students negotiate roles. Emphasise that the less experienced person should be Pilot for Part I, and that youβll swap roles for Part II. Circulate to ensure everyone has the notebook and data file.
Setup for Today:
Check that you can see the notebook and data file
These should be in your Nuvolos folder already. If not, use the buttons below to download the materials and save them to
/files/week04/on Nuvolos.
π Optional: If you completed the W03 Extra Git/GitHub setup
If you followed the π W03 Extra: Step-by-step Git/GitHub instructions, you have a my-ds105a-notes repository!
You can copy these files there for backup:
# Copy notebook and data to your Git repository
cp /files/week04/W04-NB02-Lab-NumPy-to-Pandas.ipynb /files/my-ds105a-notes/week04/
cp /files/week04/data/london_summer_2024_weather.json /files/my-ds105a-notes/week04/data/Then work from /files/my-ds105a-notes/week04/ instead. This way your work is version-controlled!
Negotiate roles
- π§ββοΈ Pilot: The person typing the code
- π Copilot: The person guiding and checking the code
Tip: The less experienced person should be Pilot for Part I. Youβll swap roles for Part II.
Open the notebook and introduce yourselves
Fill in your names and roles at the top of the notebook.
π Todayβs Challenge: Youβll classify London summer weather into categories like βHot & Dryβ and βMild & Wetβ using BOTH temperature and rainfall. First with NumPy (hard!), then with Pandas (cleaner!).
Part I: The NumPy Struggle (30 min)
Note to class teachers: Students work through Part I of the notebook (Steps 1-5). The Pilot types while the Copilot guides. Donβt rescue them from the nested
np.where()struggle unless theyβre completely stuck. This difficulty is pedagogically intentional. The reflection questions at Step 5 are critical, ensure pairs discuss them.
Part I Goal: Experience the limitations of nested np.where() for complex logic.
Your pair will work through:
- Step 1: Create NumPy arrays from the weather data
- Step 2: Understand
np.where()with a simple example - Step 3: Implement nested
np.where()for weather classification (this will be challenging!) - Step 4: Verify your results
- Step 5: Reflect on the experience
Key Questions to Discuss:
- How easy was it to write the nested
np.where()code? - How easy is it to read and understand now that itβs written?
- Could you debug this code if it had a mistake?
- What would happen if you needed to add another variable to the classification?
π― ACTION POINTS
π§ββοΈ Pilot Run the setup cells (Library Imports and Loading the Data) to load the summer weather data.
π₯ Together Review the classification table to understand what youβre building.
π§ββοΈ Pilot Work through Steps 1-4 with guidance from the Copilot.
- π Copilot: Help think through the logic, but donβt take over. The struggle is part of learning!
π₯ Together Complete Step 5 (reflection questions) before moving on.
π Remember: If youβre struggling with nested np.where(), thatβs exactly the point! This experience will make Part II much more satisfying.
Part II: The Pandas Vectorisation Solution (30 min)
Note to class teachers: Announce role swap before Part II starts. The previous Copilot is now the Pilot. Students work through Part II (Steps 1-6). This should feel more structured than Part I. The approach uses boolean masks and vectorised operations rather than nested conditionals. The final reflection comparing both approaches is essential. If groups finish early, encourage them to explore the Optional Extension on
.loc[].
β οΈ SWAP ROLES! The Copilot from Part I is now the Pilot.
Part II Goal: Solve the same problem using Pandas DataFrames with vectorised boolean operations.
Your pair will work through:
- Step 1: Create a DataFrame from the weather data
- Step 2: Create boolean condition columns
- Step 3: Build the classification using vectorised operations
- Step 4: Verify the results match NumPy approach
- Step 5: Compare code readability
- Step 6: Final reflection on both approaches
- Optional: Learn about
.loc[]for more advanced conditional assignment
Key Insight: Pandas vectorised operations let you chain conditions with & and | operators, making complex logic more readable than nested np.where() calls.
π― ACTION POINTS
π₯ Together Swap roles! Confirm who is the new Pilot.
π§ββοΈ New Pilot Create the DataFrame (Step 1) and discuss the advantages with your Copilot.
π§ββοΈ New Pilot Create boolean condition columns (Step 2). Use vectorised comparisons!
- π New Copilot: Notice how readable these boolean columns are compared to nested conditions.
π§ββοΈ New Pilot Build the classification (Step 3) using these boolean columns.
π₯ Together Complete Steps 4-6, verifying results and reflecting on both approaches.
π₯ Together If time allows, explore the Optional Extension on
.loc[]for conditional assignment.
Wrap-Up & Next Steps (15 min)
Note to class teachers: Lead a brief class discussion about the contrasts they experienced. Ask for volunteers to share their Part I vs Part II reflections. Then emphasise to students that they should use vectorisation approaches (
.apply(), and the.groupby()they will learn next week). Inform students that the Mini Project I will be available at the end of the day - they can look out for a Moodle notification. Suggest that students who want to get started can collect data before revisiting this weekβs concepts. Mention that W05 will covergroupby()and seaborn visualisation (also needed for Mini-Project 1).
Class Discussion:
Your class teacher will lead a brief discussion comparing the NumPy and Pandas approaches.
Looking Ahead:
- βοΈ Mini-Project 1 will be released today at 6pm (due Week 06, Thursday 8pm)
- Eager to start? You can explore the API and collect data while this weekβs concepts sink in
- Next week covers more Pandas operations and seaborn visualisation (also needed for Mini-Project 1)
- π₯οΈ W05 Lecture: Advanced data transformations and visualisation design
- Reading Week (W06): Dedicated time to complete Mini-Project 1 and additional drop-in sessions will be available
π Useful Resources
π Essential Guides
- 3οΈβ£ Data Science Workflow: Complete workflow stages
- 4οΈβ£ Git & GitHub Guide: Version control commands
π» Course Materials
- π₯οΈ W04 Lecture: Vectorisation with NumPy and Pandas
- π W04 Practice: Your loop-based heatwave detection solution
- βοΈ Mini-Project 1: Your first assessed project (due Week 06)
π Getting Help
- Slack: Post questions to
#helpchannel - Office Hours: Book via StudentHub
- Check staff availability on β Contact Hours
π External Resources
NumPy Documentation: Array operations and
np.where()Pandas Documentation: DataFrames and
.apply()methodOpen-Meteo API: Weather data source
π‘ Key Takeaway: For complex logic, Pandas vectorised operations with boolean masks are almost always clearer and more maintainable than nested NumPy conditionals. The structured approach with named boolean columns makes your logic transparent and debuggable!
