💻 Week 08 Lab

Reshaping and merging workshop: build your NB02

Author

Dr Jon Cardoso-Silva

Published

24 May 2026

🥅 Learning Goals

By the end of this lab, you should be able to: i) merge your TfL journey data with the ONS Postcode Directory using clean postcode keys, ii) reshape your merged data for analysis using melt and pivot_table, iii) have a working NB02 pipeline in your MP2 repo that transforms raw data into a processed DataFrame ready for EDA.

This lab builds directly on 🖥️ W08 Lecture, where you learned pd.concat(), .melt(), .pivot_table(), and pd.merge() using synthetic data. Today you practise those tools on shared exercises, then connect them to your own ✍️ Mini-Project 2 data.

📋 Preparation

Attend or watch the 🖥️ W08 Lecture
Bring your MP2 repository open in Nuvolos — your data/raw/ folder must contain london_postcodes-ons-postcodes-directory-feb22.csv; if not, follow the setup instructions on the ✍️ Mini-Project 2 page
Open the lab notebook from the Nuvolos shared folder (/files/week08/)
If you are not working on Nuvolos, download the lab notebook below:

🛣️ Lab Roadmap

Part	Activity Type	Focus	Time	Outcome
Part 1	🎯 ACTION POINTS	Path challenge: locate your ONS file	15 min	You load the ONS CSV from your MP2 repo using a relative path
Part 2	🗣️ TEACHING MOMENT	Guided merge with dirty postcodes	25 min	Everyone merges a practice DataFrame with the real ONS data
Part 3	🗣️ TEACHING MOMENT	Reshaping and visualisation	20 min	Everyone produces a melt → strip+box+mean plot
Part 4	🎯 ACTION POINTS	Work on your MP2	30 min	Your NB01 and NB02 are done before W09 Lecture
Wrap-Up	🗣️ TEACHING MOMENT	By-Thursday checklist	10 min	Everyone knows what to finish before W09

💡 Note: Parts 1–3 use shared data and a shared lab notebook. Part 4 is in your own MP2 repository.

Part 1: Check if you truly understand relative paths (15 min)

The lab notebook lives at /files/week08/ on Nuvolos. Your MP2 repository lives at /files/<your-github-repo-folder>/. Your ONS CSV sits at:

/files/<your-github-repo-folder>/data/raw/london_postcodes-ons-postcodes-directory-feb22.csv

🎯 ACTION POINTS

Open the lab notebook from /files/week08/
Define ONS_PATH as a relative path from the notebook to your ONS CSV.

That is, without copying the ONS file or hardcoding an absolute path! From inside the W08 NB02 notebook, you should be able to load the ONS CSV using pd.read_csv(ONS_PATH) and see the expected output.
Run the check cell: you should see ONS Postcode Directory loaded: (326214, ...) or similar
If you get a FileNotFoundError, check: is your repo folder name spelled correctly? Are you one ../ too deep?

Tip

Think back to 🖥️ W03. The ../ pattern moves you up one directory level. From /files/week08/, one ../ takes you to /files/ — from there, you can navigate into your repo folder.

Part 2: The merge challenge (25 min)

Note to class teachers: Keep this synchronised. The goal is for everyone to produce the same df_merged. Do not hint at the solution — let students work through it. Walk through the instructor notebook solution at the end of this part.

Your lab notebook defines this practice DataFrame for you:

df_practice = pd.DataFrame({
    "destination_name": [
        "Barking", "Barking",
        "Richmond", "Richmond",
        "Old Marylebone Rd",
    ],
    "destination_postcode": [
        "ig11 0ab",
        " IG11 0AB ",
        "TW9 1dn ",
        " tw9 1dn",
        "NW8 9JW",
    ],
    "duration_min": [62, 58, 44, 47, 38],
    "time_band": ["peak", "off-peak", "peak", "off-peak", "peak"],
})

🎯 ACTION POINTS

Merge df_practice with df_ons_full (keeping only pcds, oslaua, lsoa11, imd from the ONS side) so that df_merged looks exactly like this:

	destination _name	destination _postcode	duration _min	time _band	pcds	oslaua	lsoa11	imd
0	Barking	IG11 0AB	62	peak	IG11 0AB	E09000002	E01000092	6348.0
1	Barking	IG11 0AB	58	off-peak	IG11 0AB	E09000002	E01000092	6348.0
2	Richmond	TW9 1DN	44	peak	TW9 1DN	E09000027	E01003876	28654.0
3	Richmond	TW9 1DN	47	off-peak	TW9 1DN	E09000027	E01003876	28654.0
4	Old Marylebone Rd	NW8 9JW	38	peak	NaN	NaN	NaN	NaN

💡 Note: The class teacher will walk through a solution at the end of this part.

Part 3: Reshaping and visualisation (20 min)

Note to class teachers: Use df_all from the shared lecture data (data/tfl_journeys_all.csv). The groupby step is provided in the notebook - the challenge is melt and plotting. Before students start the plot, run the barplot discussion: “We have plot_df ready. I know we’ll cover visualisation properly next week, but Jon made one point about barplots today. What was it? What should we use instead when n is small?”

Starting from the 40-row synthetic dataset in the shared data/ folder, practise the full pipeline.

🎯 ACTION POINTS

Run the provided groupby cell to produce summary (mean duration per destination × time band)
Use .melt() on summary to create plot_df with columns destination, time_band, mean_duration_min
Produce a strip + box + mean overlay plot of duration_min from df_all, split by destination and time band.

Consider swapping the x axis with the hue. Does it make a difference if destination is on the x axis or if time band is on the x axis? Which one do you prefer for this dataset, and why?

A polite panda holding a survey form, looking hopeful

Tell the LSE about your experience in this course!

ℹ

(6 out of 103 of you have completed the course survey)

0% – 50%

50% – 75%

75% – 100%

While you settle into Part 4, could we ask a small but important favour? The LSE runs a course survey every term, and your feedback genuinely shapes how this module is taught next year. It takes about 3 minutes. 🐼

Click here to complete the survey

💡 Note: Please assess all the instructors you have interacted with
(Jon counts as a teacher too!).

Last updated: 12 March 2026

Part 4: Work on your Mini-Project 2 (30 min)

Note to class teachers: Students now work in their own repos. Circulate and prompt each student to articulate their ONS decision out loud before they start coding. The common mistake is merging ONS mechanically without knowing why. Also check for ../data/raw/ paths in NB02.

You have loaded the ONS data, merged on a cleaned postcode key, and practised the reshape-to-plot pipeline. Before you open your own NB02, think through how the ONS dataset fits into your specific project.

A decision to make:

How could you use the ONS data in a way that demonstrates you’ve genuinely understood today’s material — not just run the code?

Would you use it to select destination postcodes in NB01 — browsing oslaua, imd, or lsoa11 to justify where you looked?
Or would you use it after collection in NB02 — merging ONS attributes into your journey data so you can group or filter by geography in NB03?
Could you do both, and does combining both actually strengthen your analysis or just add noise?

There is no single right answer. The key is that your decision is documented in REPORT.md and visible in your code.

Wrap-Up & Next Steps (10 min)

Note to class teachers: Close by running through the by-Thursday checklist out loud. Ask two or three students to share their ONS decision and why. Students should leave knowing what is still outstanding, not just what they ran today.

Before You Leave:

You successfully loaded your ONS CSV using a relative path from a different folder
You can explain why row 4 in the merge exercise produced NaN and what that means for your own data
You know whether you are going to use ONS in NB01, NB02, or both, and you can say why
Your by-Thursday checklist is realistic given where you are right now

Looking Ahead:

Week 09 Lecture: EDA quality checks, mean vs median, correlation traps, and an introduction to closeread
Week 09 Lab: refine your EDA, work on REPORT.md, peer review
Week 10: ✍️ Mini-Project 2 deadline is Monday 23 March, 8 pm

🔗 Useful Resources

💻 Course Materials

🖥️ W08 Lecture

✍️ Mini-Project 2 brief

🆘 Getting Help

Slack: Post questions to #help channel
Office Hours: Book via StudentHub
Check staff availability on ✋ Contact Hours