π» Week 07 Lab
Normalising Nested JSON for Analysis
π Logistics
Time and Location: Friday, 14 November 2025. Check your timetable for your precise class time and location.
Todayβs lab applies yesterdayβs lecture content. Youβll work with real OpenSanctions data, using pd.json_normalize() and related reshaping tools to rebuild a target visualisation through DataFrame engineering.
π Preparation
Before coming to lab, ensure you have:
- β Attended the π₯οΈ W07 Lecture
- β Reviewed JSON normalisation concepts from the lecture
- β
Basic familiarity with pandas
groupby()operations
π£οΈ Roadmap
Part 1: Understand the Complete Pipeline (30-40 min)
We start with a complete code block that transforms nested OpenSanctions JSON into a tidy DataFrame. Weβll step through it together to understand how pd.json_normalize() handles nested structures and how we clean list-like columns.
Part 2: Build plot_df (40-50 min)
Your main task: engineer a plot_df DataFrame with two columns (sanction_country and num_targets) that powers the plotting code. Youβll aggregate the data using groupby(), then create your visualisation.
Optional sub-step: Convert ISO country codes to full country names using pycountry to improve plot readability.
Part 3: Work on Mini Project 2 (0-10 min)
Review the βοΈ Mini Project 2 requirements and check the appendix in there for tactical planning tips. Use this time to ask TAs questions or start planning your approach.
Note: The skills you practice today (pd.json_normalize() and .explode()) will be essential for handling TfL Journey Planner API responses in Mini Project 2.
π Getting Started
Option 1: Using Nuvolos (Recommended)
On
Nuvolos, navigate to the Week 07 materials in your week07 folder. The notebook and relevant JSON file will be there:
week07/
βββ data/
β βββ opensanctions/
β βββ targets_sample_4000.jsonl
βββ W07-NB03-Lab-OpenSanctions.ipynb
Open the W07-NB03-Lab-OpenSanctions.ipynb file in VSCode to begin.
If you followed my guidance in W03 Extra advice, you can add this notebook to your my-ds105a-notes repository (for example under week07/).
Option 2: Download Lab Files Directly
Download the lab files to work on your own machine:
π½ Data Specification
Weβre using data from the OpenSanctions project, which includes information about individuals and entities sanctioned by governments and international organisations worldwide.
Key points about the dataset:
Focus: Targets. Individuals and entities that have been sanctioned, including information about names, countries, and other properties.
Sample size: 4,000 records. This provides manageable data while showing meaningful patterns.
Format: JSON Lines. Each line is a complete JSON object representing one sanctioned target.
π References
Useful references for lab techniques:
The
pd.json_normalize()function to convert JSON data into tabular formatThe DataFrame.explode() method to handle columns containing lists
The DataFrame.groupby() method, combined with
agg()to aggregate data
