DS105 2025-2026 Autumn Term Icon

πŸ’» Week 07 Lab

Normalising Nested JSON for Analysis

Author

Dr Jon Cardoso-Silva

Published

14 November 2025

πŸ“ Logistics

Time and Location: Friday, 14 November 2025. Check your timetable for your precise class time and location.

Today’s lab applies yesterday’s lecture content. You’ll work with real OpenSanctions data, using pd.json_normalize() and related reshaping tools to rebuild a target visualisation through DataFrame engineering.

πŸ“‹ Preparation

Before coming to lab, ensure you have:

  • βœ… Attended the πŸ–₯️ W07 Lecture
  • βœ… Reviewed JSON normalisation concepts from the lecture
  • βœ… Basic familiarity with pandas groupby() operations

πŸ›£οΈ Roadmap

Part 1: Understand the Complete Pipeline (30-40 min)

We start with a complete code block that transforms nested OpenSanctions JSON into a tidy DataFrame. We’ll step through it together to understand how pd.json_normalize() handles nested structures and how we clean list-like columns.

Part 2: Build plot_df (40-50 min)

Your main task: engineer a plot_df DataFrame with two columns (sanction_country and num_targets) that powers the plotting code. You’ll aggregate the data using groupby(), then create your visualisation.

Optional sub-step: Convert ISO country codes to full country names using pycountry to improve plot readability.

Part 3: Work on Mini Project 2 (0-10 min)

Review the ✍️ Mini Project 2 requirements and check the appendix in there for tactical planning tips. Use this time to ask TAs questions or start planning your approach.

Note: The skills you practice today (pd.json_normalize() and .explode()) will be essential for handling TfL Journey Planner API responses in Mini Project 2.

πŸ“š Getting Started

Option 2: Download Lab Files Directly

Download the lab files to work on your own machine:

πŸ’½ Data Specification

We’re using data from the OpenSanctions project, which includes information about individuals and entities sanctioned by governments and international organisations worldwide.

Key points about the dataset:

  • Focus: Targets. Individuals and entities that have been sanctioned, including information about names, countries, and other properties.

  • Sample size: 4,000 records. This provides manageable data while showing meaningful patterns.

  • Format: JSON Lines. Each line is a complete JSON object representing one sanctioned target.

πŸ“š References

Useful references for lab techniques: