π» Week 08 Lab
Reshaping and merging workshop: build your NB02
By the end of this lab, you should be able to: i) merge your TfL journey data with the ONS Postcode Directory using clean postcode keys, ii) reshape your merged data for analysis using melt and pivot_table, iii) have a working NB02 pipeline in your MP2 repo that transforms raw data into a processed DataFrame ready for EDA.
This lab builds directly on π₯οΈ W08 Lecture, where you learned pd.concat(), .melt(), .pivot_table(), and pd.merge() using synthetic data. Today you practise those tools on shared exercises, then connect them to your own βοΈ Mini-Project 2 data.
π Preparation
- Attend or watch the π₯οΈ W08 Lecture
- Bring your MP2 repository open in Nuvolos β your
data/raw/folder must containlondon_postcodes-ons-postcodes-directory-feb22.csv; if not, follow the setup instructions on the βοΈ Mini-Project 2 page - Open the lab notebook from the Nuvolos shared folder (
/files/week08/) - If you are not working on Nuvolos, download the lab notebook below:
π£οΈ Lab Roadmap
| Part | Activity Type | Focus | Time | Outcome |
|---|---|---|---|---|
| Part 1 | π― ACTION POINTS | Path challenge: locate your ONS file | 15 min | You load the ONS CSV from your MP2 repo using a relative path |
| Part 2 | π£οΈ TEACHING MOMENT | Guided merge with dirty postcodes | 25 min | Everyone merges a practice DataFrame with the real ONS data |
| Part 3 | π£οΈ TEACHING MOMENT | Reshaping and visualisation | 20 min | Everyone produces a melt β strip+box+mean plot |
| Part 4 | π― ACTION POINTS | Work on your MP2 | 30 min | Your NB01 and NB02 are done before W09 Lecture |
| Wrap-Up | π£οΈ TEACHING MOMENT | By-Thursday checklist | 10 min | Everyone knows what to finish before W09 |
π‘ Note: Parts 1β3 use shared data and a shared lab notebook. Part 4 is in your own MP2 repository.
Part 1: Check if you truly understand relative paths (15 min)
The lab notebook lives at /files/week08/ on Nuvolos. Your MP2 repository lives at /files/<your-github-repo-folder>/. Your ONS CSV sits at:
/files/<your-github-repo-folder>/data/raw/london_postcodes-ons-postcodes-directory-feb22.csv
π― ACTION POINTS
Open the lab notebook from
/files/week08/Define
ONS_PATHas a relative path from the notebook to your ONS CSV.That is, without copying the ONS file or hardcoding an absolute path! From inside the W08 NB02 notebook, you should be able to load the ONS CSV using
pd.read_csv(ONS_PATH)and see the expected output.Run the check cell: you should see
ONS Postcode Directory loaded: (326214, ...)or similarIf you get a
FileNotFoundError, check: is your repo folder name spelled correctly? Are you one../too deep?
Think back to π₯οΈ W03. The ../ pattern moves you up one directory level. From /files/week08/, one ../ takes you to /files/ β from there, you can navigate into your repo folder.
Part 2: The merge challenge (25 min)
Note to class teachers: Keep this synchronised. The goal is for everyone to produce the same
df_merged. Do not hint at the solution β let students work through it. Walk through the instructor notebook solution at the end of this part.
Your lab notebook defines this practice DataFrame for you:
df_practice = pd.DataFrame({
"destination_name": [
"Barking", "Barking",
"Richmond", "Richmond",
"Old Marylebone Rd",
],
"destination_postcode": [
"ig11 0ab",
" IG11 0AB ",
"TW9 1dn ",
" tw9 1dn",
"NW8 9JW",
],
"duration_min": [62, 58, 44, 47, 38],
"time_band": ["peak", "off-peak", "peak", "off-peak", "peak"],
})π― ACTION POINTS
Merge df_practice with df_ons_full (keeping only pcds, oslaua, lsoa11, imd from the ONS side) so that df_merged looks exactly like this:
| destination _name |
destination _postcode |
duration _min |
time _band |
pcds | oslaua | lsoa11 | imd | |
|---|---|---|---|---|---|---|---|---|
| 0 | Barking | IG11 0AB | 62 | peak | IG11 0AB | E09000002 | E01000092 | 6348.0 |
| 1 | Barking | IG11 0AB | 58 | off-peak | IG11 0AB | E09000002 | E01000092 | 6348.0 |
| 2 | Richmond | TW9 1DN | 44 | peak | TW9 1DN | E09000027 | E01003876 | 28654.0 |
| 3 | Richmond | TW9 1DN | 47 | off-peak | TW9 1DN | E09000027 | E01003876 | 28654.0 |
| 4 | Old Marylebone Rd | NW8 9JW | 38 | peak | NaN | NaN | NaN | NaN |
π‘ Note: The class teacher will walk through a solution at the end of this part.
Part 3: Reshaping and visualisation (20 min)
Note to class teachers: Use
df_allfrom the shared lecture data (data/tfl_journeys_all.csv). The groupby step is provided in the notebook - the challenge is melt and plotting. Before students start the plot, run the barplot discussion: βWe haveplot_dfready. I know weβll cover visualisation properly next week, but Jon made one point about barplots today. What was it? What should we use instead when n is small?β
Starting from the 40-row synthetic dataset in the shared data/ folder, practise the full pipeline.
π― ACTION POINTS
Run the provided
groupbycell to producesummary(mean duration per destination Γ time band)Use
.melt()onsummaryto createplot_dfwith columnsdestination,time_band,mean_duration_minProduce a strip + box + mean overlay plot of
duration_minfromdf_all, split by destination and time band.Consider swapping the
xaxis with thehue. Does it make a difference if destination is on the x axis or if time band is on the x axis? Which one do you prefer for this dataset, and why?

While you settle into Part 4, could we ask a small but important favour? The LSE runs a course survey every term, and your feedback genuinely shapes how this module is taught next year. It takes about 3 minutes. πΌ
π‘ Note: Please assess all the instructors you have interacted with
(Jon counts as a teacher too!).
Last updated: 12 March 2026
Part 4: Work on your Mini-Project 2 (30 min)
Note to class teachers: Students now work in their own repos. Circulate and prompt each student to articulate their ONS decision out loud before they start coding. The common mistake is merging ONS mechanically without knowing why. Also check for
../data/raw/paths in NB02.
You have loaded the ONS data, merged on a cleaned postcode key, and practised the reshape-to-plot pipeline. Before you open your own NB02, think through how the ONS dataset fits into your specific project.
A decision to make:
How could you use the ONS data in a way that demonstrates youβve genuinely understood todayβs material β not just run the code?
- Would you use it to select destination postcodes in
NB01β browsingoslaua,imd, orlsoa11to justify where you looked? - Or would you use it after collection in
NB02β merging ONS attributes into your journey data so you can group or filter by geography inNB03? - Could you do both, and does combining both actually strengthen your analysis or just add noise?
There is no single right answer. The key is that your decision is documented in REPORT.md and visible in your code.
Wrap-Up & Next Steps (10 min)
Note to class teachers: Close by running through the by-Thursday checklist out loud. Ask two or three students to share their ONS decision and why. Students should leave knowing what is still outstanding, not just what they ran today.
Before You Leave:
- You successfully loaded your ONS CSV using a relative path from a different folder
- You can explain why row 4 in the merge exercise produced NaN and what that means for your own data
- You know whether you are going to use ONS in NB01, NB02, or both, and you can say why
- Your by-Thursday checklist is realistic given where you are right now
Looking Ahead:
- Week 09 Lecture: EDA quality checks, mean vs median, correlation traps, and an introduction to closeread
- Week 09 Lab: refine your EDA, work on
REPORT.md, peer review - Week 10: βοΈ Mini-Project 2 deadline is Monday 23 March, 8 pm
π Useful Resources
π Getting Help
- Slack: Post questions to
#helpchannel - Office Hours: Book via StudentHub
- Check staff availability on β Contact Hours
