πŸ§ͺ Week 07 Lab

Practice normalising JSON data and using the groupby -> apply -> combine strategy

Author
Published

07 March 2025

πŸ₯… Learning Goals
By the end of this lab, you should be able to: i) Use pd.json_normalize() to flatten nested JSON data, ii) Handle columns containing lists using DataFrame.explode(), iii) Apply the split-apply-combine pattern with groupby(), iv) Transform complex data structures into analysis-ready formats.
DS105W course icon

Last Updated: 6 March 2025, 19:00 GMT

πŸ“Time and Location: Friday, 7 March 2025. Check your timetable for the precise time and location of your class.

πŸ“‹ Preparation

To come prepared to this lab, make sure you have:

  • Attended the πŸ—£οΈ Week 07 lecture
  • Reviewed the JSON normalization concepts covered in the lecture
  • Basic familiarity with pandas groupby operations

πŸ›£οΈ Roadmap

Here is how we will achieve the goal for this lab:

Part I: βš™οΈ Set Up (10 min)

Option 2: Download the Lab Files Directly

If you prefer to work on your own machine, you can download the lab files:

After downloading, extract the files and open the W07-Lab-Notebook.ipynb file in your preferred environment.

Part II: πŸ“š Practice (70-80 min)

πŸ’½ DATA SPECIFICATION CARD:

We’re going to use data from the OpenSanctions project. This dataset includes information about individuals and entities that governments and international organizations have sanctioned worldwide. OpenSanctions is operated by a German company, OpenSanctions Datenbanken GmbH, and has received funding from the German Federal Ministry for Education and Research. They offer a paid API for accessing the data, but you can also download the data in bulk for free, for academic and research purposes.

A few things to know about the dataset:

  • We are focusing on Targets. These are the individuals and entities that have been sanctioned. This dataset includes information about the name, country, and other β€˜properties’ of the targets.

  • We have filtered for Russian Targets. This in part because Alex, who provided us with the data sample for this lab, is doing a PhD where he focuses on studying Russia, and also because the dataset is large and we want to make it more manageable for this lab.

  • We are using a small random sample. Again, this is to make the dataset more manageable for this lab. The full dataset is much larger.

Follow the instructions in the lab notebook to complete the exercises.

Notes:

  • You can work alone or in small groups for this.
  • If you want, feel free to play a game of πŸ§‘β€βœˆοΈ Pilot and πŸ™‹ Copilot (s) like we’ve done in the past.

What the exercises will cover:

  • Using pd.json_normalize to flatten nested JSON data.
  • Using DataFrame.explode to expand lists in a column.
  • Using pd.merge to combine dataframes.
  • Using the groupby method in pandas.

πŸ“š References

Here are some useful references for the techniques we’ll be using in this lab: