π§ͺ Week 07 Lab
Practice normalising JSON data and using the groupby -> apply -> combine strategy

Last Updated: 6 March 2025, 19:00 GMT
πTime and Location: Friday, 7 March 2025. Check your timetable for the precise time and location of your class.
π Preparation
To come prepared to this lab, make sure you have:
- Attended the π£οΈ Week 07 lecture
- Reviewed the JSON normalization concepts covered in the lecture
- Basic familiarity with pandas groupby operations
π£οΈ Roadmap
Here is how we will achieve the goal for this lab:
Part I: βοΈ Set Up (10 min)
Option 1: Using Nuvolos (Recommended)
If you are working on Nuvolos, follow these steps:
Navigate to the Week 7 materials in your my-ds105w-notes
folder. The notebook is in the Week 07 folder, while the data and figures are in the root directories:
my-ds105w-notes/ # Root directory
βββ data/ # Data directory (at root level)
β βββ opensanctions/ # Contains the data files for this lab
β βββ ... (other data folders)
β
βββ figures/ # Figures directory (at root level)
β βββ w07-lab/ # Contains reference figures for this lab
β βββ ... (other figure folders)
β
βββ Week 01 - ...
βββ Week 02 - ...
βββ Week 07 - JSON Normalization and Data Reshaping/
βββ W07-Lab-Notebook.ipynb # Open this notebook to begin
Note: The exact order these folders appear in your file explorer may differ depending on your sorting preferences. The important thing is to locate the W07-Lab-Notebook.ipynb
file in the Week 07 folder, and ensure you can access the data in the data/opensanctions/
directory.
Simply open the W07-Lab-Notebook.ipynb
file on VSCode to begin working on the lab exercises.
Option 2: Download the Lab Files Directly
If you prefer to work on your own machine, you can download the lab files:
After downloading, extract the files and open the W07-Lab-Notebook.ipynb file in your preferred environment.
Part II: π Practice (70-80 min)
π½ DATA SPECIFICATION CARD:
Weβre going to use data from the OpenSanctions project. This dataset includes information about individuals and entities that governments and international organizations have sanctioned worldwide. OpenSanctions is operated by a German company, OpenSanctions Datenbanken GmbH, and has received funding from the German Federal Ministry for Education and Research. They offer a paid API for accessing the data, but you can also download the data in bulk for free, for academic and research purposes.
A few things to know about the dataset:
We are focusing on Targets. These are the individuals and entities that have been sanctioned. This dataset includes information about the name, country, and other βpropertiesβ of the targets.
We have filtered for Russian Targets. This in part because Alex, who provided us with the data sample for this lab, is doing a PhD where he focuses on studying Russia, and also because the dataset is large and we want to make it more manageable for this lab.
We are using a small random sample. Again, this is to make the dataset more manageable for this lab. The full dataset is much larger.
Follow the instructions in the lab notebook to complete the exercises.
Notes:
- You can work alone or in small groups for this.
- If you want, feel free to play a game of π§ββοΈ Pilot and π Copilot (s) like weβve done in the past.
What the exercises will cover:
- Using
pd.json_normalize
to flatten nested JSON data. - Using
DataFrame.explode
to expand lists in a column. - Using
pd.merge
to combine dataframes. - Using the
groupby
method in pandas.
π References
Here are some useful references for the techniques weβll be using in this lab:
The
pd.json_normalize()
function to convert JSON data more easily into tabular formatThe DataFrame.explode() function to handle cases when columns are made out of lists
The DataFrame.groupby() function, combined with apply() and agg() to aggregate data