💻 Week 07 Lab

Normalising Nested JSON for Analysis

Author

Dr Jon Cardoso-Silva

Published

14 November 2025

📍 Logistics

Time and Location: Friday, 14 November 2025. Check your timetable for your precise class time and location.

Today’s lab applies yesterday’s lecture content. You’ll work with real OpenSanctions data, using pd.json_normalize() and related reshaping tools to rebuild a target visualisation through DataFrame engineering.

📋 Preparation

Before coming to lab, ensure you have:

✅ Attended the 🖥️ W07 Lecture
✅ Reviewed JSON normalisation concepts from the lecture
✅ Basic familiarity with pandas groupby() operations

🛣️ Roadmap

Part 1: Understand the Complete Pipeline (30-40 min)

We start with a complete code block that transforms nested OpenSanctions JSON into a tidy DataFrame. We’ll step through it together to understand how pd.json_normalize() handles nested structures and how we clean list-like columns.

Part 2: Build `plot_df` (40-50 min)

Your main task: engineer a plot_df DataFrame with two columns (sanction_country and num_targets) that powers the plotting code. You’ll aggregate the data using groupby(), then create your visualisation.

Optional sub-step: Convert ISO country codes to full country names using pycountry to improve plot readability.

Part 3: Work on Mini Project 2 (0-10 min)

Review the ✍️ Mini Project 2 requirements and check the appendix in there for tactical planning tips. Use this time to ask TAs questions or start planning your approach.

Note: The skills you practice today (pd.json_normalize() and .explode()) will be essential for handling TfL Journey Planner API responses in Mini Project 2.

📚 Getting Started

Option 1: Using Nuvolos (Recommended)

On Nuvolos, navigate to the Week 07 materials in your week07 folder. The notebook and relevant JSON file will be there:

week07/
├── data/
│   └── opensanctions/
│       └── targets_sample_4000.jsonl
└── W07-NB03-Lab-OpenSanctions.ipynb

Open the W07-NB03-Lab-OpenSanctions.ipynb file in VSCode to begin.

If you followed my guidance in W03 Extra advice, you can add this notebook to your my-ds105a-notes repository (for example under week07/).

Option 2: Download Lab Files Directly

Download the lab files to work on your own machine:

💽 Data Specification

We’re using data from the OpenSanctions project, which includes information about individuals and entities sanctioned by governments and international organisations worldwide.

Key points about the dataset:

Focus: Targets. Individuals and entities that have been sanctioned, including information about names, countries, and other properties.
Sample size: 4,000 records. This provides manageable data while showing meaningful patterns.
Format: JSON Lines. Each line is a complete JSON object representing one sanctioned target.

📚 References

Useful references for lab techniques:

The pd.json_normalize() function to convert JSON data into tabular format
The DataFrame.explode() method to handle columns containing lists
The DataFrame.groupby() method, combined with agg() to aggregate data