🧑🏫 Week 07 Lecture
Normalising JSON data + the Group → Apply → Combine strategy
📃 Schedule
📍Location: Thursday 14 November 2024, 4 pm - 6 pm at CLM.5.02
Very often, the data we get from APIs are not conveniently tabular like a CSV format but are more like nested JSON structures – see the response sample on the Spotify API for an example. Our focus this week is in disentangling complex JSON data.
This week, you will learn a few new pandas tricks:
- The
pd.json_normalize()
function to convert JSON data more easily into tabular format - The DataFrame.explode() function to handle cases when columns are made out of lists
- The DataFrame.groupby() function, combined with apply() and agg() to aggregate data
I will add new notebooks to the lse-ds105/ds105a-2024 repository. You should be able to find the Jupyter Notebook in the W07-material
folder around Wednesday afternoon.
📋 Preparation
- Create a fork of our shared repository, lse-ds105/ds105a-2024, by following the instructions on its README
- Go back to your fork page on Wednesday afternoon and update it to get the new materials