DS105 2025-2026 Autumn Term Icon

πŸ–₯️ Week 05 Lecture

Data Transformation and Visualization Design

Author

Dr Jon Cardoso-Silva

Published

27 October 2025

πŸ“ Logistics

Time and Location: Thursday, 30 October 2025, 16:00 - 18:00, CLM 5.02

Last week you struggled with nested np.where() in the W04 Lab when classifying weather. Today, you’ll learn a much cleaner approach through custom functions and how to summarise temporal data to reveal insights. Then we’ll apply these transformations to create compelling visualisations.

πŸ“‹ Preparation

  • Complete the πŸ’» W04 Lab (pair programming!)
  • Start exploring ✍️ Mini-Project 1 (released last week)
  • The skills you learn today will directly support your Mini-Project 1 work

πŸ—£οΈ Lecture Overview

Part 1: From Loops to Functions (35 min)

  • Why functions solve the nested np.where() problem
  • Writing your own functions with def
  • Using .apply() to process entire datasets
  • When functions make code clearer

Part 2: Temporal Data & Grouping (35 min)

  • Converting timestamps to datetime objects
  • Extracting date components with .dt
  • Using .groupby() to summarise by year, month, day
  • Why summarisation makes patterns visible

BREAK (10 min)

Part 3: Visualization Philosophy (25 min)

  • The plot_df pattern: prepare data, then visualise
  • Seaborn plot types: bar plots and line charts
  • Narrative titles that state findings, not descriptions

Part 4: Bringing It Together (15 min)

  • Complete workflow demonstration
  • How this applies to Mini-Project 1

πŸ““ Lecture Materials

Today’s lecture uses slides with a demonstration notebook for live coding. All materials will be available in your Nuvolos workspace under the week05/ folder, or you can download them directly below.

Lecture Slides

Today’s lecture covers advanced pandas transformations and seaborn visualization design.

Lecture Demonstration Notebook

This notebook accompanies the slides with code examples you can run yourself.

Data Files

The lecture uses extended W04 Lab weather data (20 years of temperature and rainfall):

πŸ’‘ Key Concepts

  • Custom functions: Extract complex logic into testable, reusable code
  • .apply() method: Process entire datasets without explicit loops
  • DateTime operations: Convert timestamps and extract date components
  • .groupby() aggregations: Summarise data by categories or time periods
  • The plot_df pattern: Always prepare your plotting data first
  • Narrative titles: State your findings, don’t describe the data

πŸ”– Appendix

Post‑Lecture Actions

  • Review the lecture slides and notebook
  • Complete the πŸ’» W05 Lab (seaborn styling focus)
  • Continue working on ✍️ Mini-Project 1
  • Attend W05 Lab and drop-in sessions

Useful Links

Looking Ahead

  • Tomorrow (W05 Friday): Seaborn styling lab
  • Mini-Project 1: Ongoing - keep collecting data and experimenting
  • Week 06: Reading Week – focus time for Mini-Project 1 completion
  • Deadline: W06 Thursday 8pm (submit via GitHub)