DS105A – Data for Data Science
🗓️ 13 Nov 2025
pd.json_normalize() to flatten nested JSON structuresconcat(), explode(), melt()) that you’ll use again in W08 and W09Why this matters: These skills directly support your ✍️ Mini-Project 2 work and make your MP1 code simpler.

I will share a link to a Mentimeter poll to gather your experience with the ✍️ Mini-Project 1.

Let’s look at excellent work from one of your classmates who compared two London boroughs affected by ULEZ expansion.
👉 I will share the model solution on Nuvolos after the lecture.
London had this policy to reduce air pollution associated with road traffic called ULEZ. It’s been here for some time but in 2021 and 2023 it was expanded to outer London boroughs.
Aden Grandcourt, one of your classmates, framed his project as a comparison of the impact of ULEZ on air quality (\(NO_2\) concentration) in two neighbouring boroughs:
This solution addresses the main question we posed: “Is London’s air getting better or worse?” in a creative and interesting way.

Credit: Aden Grandcourt (✍️ Mini-Project 1 submission) - with permission.
Photo source: National Park City.
| Location | Pre to 1st Expansion % | 1st to 2nd Expansion % | Overall % Change |
|---|---|---|---|
| Haringey | -13.10% | -7.08% | -19.26% |
| Enfield | -7.56% | -6.41% | -13.48% |
Credit: Aden Grandcourt (✍️ Mini-Project 1 submission) - with permission.
Here’s how many of you solved the JSON parsing problem in MP1:
✅ This approach is fully aligned with how we taught in this course so far!
The student understood the data structure, navigated nested dictionaries correctly, and successfully extracted the needed values.
Code adapted from Aden Grandcourt (✍️ Mini-Project 1 submission) - with permission.
I did a Ctrl + F search for json_normalize in your submission notebooks and found that 44/81 of you discovered this simpler approach
Before (for loop):
After (json_normalize()):
json_normalize() DoesInput: Nested JSON structure
Output: Clean DataFrame
| dt | components.no2 | components.pm2_5 |
|---|---|---|
| 1606435200 | 15.2 | 8.5 |
📋 Note: Nested dictionaries become columns with dot notation (components.no2). Lists of dictionaries become rows.
You could also get rid of that import json and use pandas straightaway to parse the JSON data into a DataFrame:
If that was the same JSON we saw in the previous slide, the output would look like this:
This produces a DataFrame with a single row (index: 0) and a single column called list - not the best name for a column.
📋 Note: Note, though, that this is not a very useful DataFrame. The data is nested in the list column. We will need to normalise it - also using json_normalize() - to get a useful DataFrame.
You could also have chosen the typ to be series to get a Series object instead of a DataFrame.
The output would look like this:
📋 Remember: pandas Series are like one-dimensional numpy arrays with labelled indices. That is, kind of like a dictionary but with more functionality. You can do things like series.apply(), series.value_counts(), series.sort_values(), etc.
📋 Also remember: Each column in a pandas DataFrame is a Series.
json_normalize() Deep Dive
Let’s explore the powerful parameters that make json_normalize() handle complex structures.
What it does:
components.no2)Full example
Here is code you could have used in your ✍️ Mini Project 1 if you knew about json_normalize():
I will completely ignore the coord key and focus just on list:
Which looks like this:
Full example
Here is code you could have used in your ✍️ Mini Project 1 if you knew about json_normalize():
I can convert this list of dictionaries into a (nested) DataFrame using pd.DataFrame():
Which looks like this:
| main | components | dt | |
|---|---|---|---|
| 0 | {‘aqi’: 2} | {‘co’: 347.14, ‘no’: 33.53, ‘no2’: 41.13, ‘o3’: 0.01, ‘so2’: 7.51, ‘pm2_5’: 18.81, ‘pm10’: 21.35, ‘nh3’: 0.25} | 1606435200 |
| 1 | {‘aqi’: 2} | {‘co’: 293.73, ‘no’: 11.18, ‘no2’: 42.16, ‘o3’: 0.21, ‘so2’: 7.27, ‘pm2_5’: 15.68, ‘pm10’: 18.17, ‘nh3’: 0.01} | 1606438800 |
| 2 | {‘aqi’: 2} | {‘co’: 277.04, ‘no’: 5.64, ‘no2’: 41.81, ‘o3’: 0.32, ‘so2’: 7.33, ‘pm2_5’: 15.31, ‘pm10’: 17.65, ‘nh3’: 0.01} | 1606442400 |
Full example
Here is code you could have used in your ✍️ Mini Project 1 if you knew about json_normalize():
I can clean it up with the knowledge we have gained so far:
Producing this DataFrame:
| components | dt | Borough | aqi | |
|---|---|---|---|---|
| 0 | {‘co’: 347.14, ‘no’: 33.53, ‘no2’: 41.13, ‘o3’: 0.01, ‘so2’: 7.51, ‘pm2_5’: 18.81, ‘pm10’: 21.35, ‘nh3’: 0.25} | 2020-11-27 00:00:00 | Haringey | 2 |
| 1 | {‘co’: 293.73, ‘no’: 11.18, ‘no2’: 42.16, ‘o3’: 0.21, ‘so2’: 7.27, ‘pm2_5’: 15.68, ‘pm10’: 18.17, ‘nh3’: 0.01} | 2020-11-27 01:00:00 | Haringey | 2 |
| 2 | {‘co’: 277.04, ‘no’: 5.64, ‘no2’: 41.81, ‘o3’: 0.32, ‘so2’: 7.33, ‘pm2_5’: 15.31, ‘pm10’: 17.65, ‘nh3’: 0.01} | 2020-11-27 02:00:00 | Haringey | 2 |
Full example
Here is code you could have used in your ✍️ Mini Project 1 if you knew about json_normalize():
Now I need to flatten the components dictionary into columns:
which looks like this:
| co | no | no2 | o3 | so2 | pm2_5 | pm10 | nh3 | |
|---|---|---|---|---|---|---|---|---|
| 0 | 347.14 | 33.53 | 41.13 | 0.01 | 7.51 | 18.81 | 21.35 | 0.25 |
| 1 | 293.73 | 11.18 | 42.16 | 0.21 | 7.27 | 15.68 | 18.17 | 0.01 |
| 2 | 277.04 | 5.64 | 41.81 | 0.32 | 7.33 | 15.31 | 17.65 | 0.01 |
It’s starting to look nice! 😀
To ‘connect’ both DataFrames, we can use a 🆕 new function: pd.concat() to combine them:
which would look like this:
| dt | Borough | aqi | co | no | no2 | o3 | so2 | pm2_5 | pm10 | nh3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-11-27 00:00:00 | Haringey | 2 | 347.14 | 33.53 | 41.13 | 0.01 | 7.51 | 18.81 | 21.35 | 0.25 |
| 1 | 2020-11-27 01:00:00 | Haringey | 2 | 293.73 | 11.18 | 42.16 | 0.21 | 7.27 | 15.68 | 18.17 | 0.01 |
| 2 | 2020-11-27 02:00:00 | Haringey | 2 | 277.04 | 5.64 | 41.81 | 0.32 | 7.33 | 15.31 | 17.65 | 0.01 |
If you had multiple boroughs, you could create a custom function to do that whole procedure for you:
def get_air_quality_data(borough_name):
borough_data = pd.read_json(filename, typ='series')
... # same as before
components = pd.json_normalize(borough_data['components'])
output = pd.concat([borough_data.drop(columns=['components']), components], axis=1)
output.assign(Borough=borough_name)
return outputAnd then you could call it like this:
record_path to Expand Nested ListsSometimes you want to keep the parent information when expanding a nested list.
data: The nested list to expand/normalize.
record_path: Path to a key whose value is a list. Each item in that list becomes one row.
meta: Which fields from the parent object to keep (added to each row)
meta_prefix: Prefix to add to meta columns (prevents conflicts)
Critical point: record_path must point to a key whose value is a list. The list is what creates multiple rows. If the value isn’t a list, you’ll get an error.
Think of it this way: record_path says “make one row per item in this list”, and meta says “but also include this information from the parent”.
max_levelJust so you know: You can limit how many levels of nesting get flattened into columns.
Change the dot notation separator:
When to use: If your column names already contain dots, or you prefer underscores.
json_normalize() Accept?Works with:
pd.Series containing dictionaries or listsKey point: If you have a DataFrame column containing nested JSON, pass that Series to json_normalize().
json_normalize()Use it when:
Don’t use it when:
pd.DataFrame())Here’s what the JSON from TfL Journey Planner API looks like (you’ll see this in your MP2 NB01):
Challenge: The transport mode is buried inside legs, which is a list inside each journey. You’ll need to normalize journeys first, then handle the legs list.

Time to put your skills to the test!
Task: Normalize the JSON structure I’m giving you right now into an informative, analysis-ready DataFrame. Share your solution on Slack with a screenshot of your DataFrame.
Time: 15 minutes
Prize: Most elegant solution wins a tote bag! 🎁
Criteria for “elegant”:
json_normalize() parameters

After the break:
pd.concat().explode().melt()
Beyond json_normalize(), here are other reshaping tools. We’re showing you these now so you can recognise them. You’ll see more examples in W08 (databases) and W09 (visualisations).
🫨 DON’T PANIC! We’re showing you these now so you can recognise them. You’ll see more examples in 🖥️ W08 Lecture and 🖥️ W09 Lecture.
pd.concat()When you have multiple DataFrames (e.g., from different locations or time periods):
Before:
df_location1:
| location | NO2 |
|---|---|
| Haringey | 20.5 |
df_location2:
| location | NO2 |
|---|---|
| Enfield | 15.3 |
After:
df_all:
| location | NO2 |
|---|---|
| Haringey | 20.5 |
| Enfield | 15.3 |
Key parameter: ignore_index=True resets the index (0, 1, 2… instead of keeping original indices)
concat()Use it when:
Example from MP1: Combining Haringey and Enfield data into one DataFrame
.explode()Sometimes after normalization, columns contain lists. .explode() creates one row per list element.
Before:
| name | emails |
|---|---|
| Alice | [‘a@example.com’, ‘b@example.com’] |
| Bob | [‘c@example.com’] |
After:
| name | emails |
|---|---|
| Alice | a@example.com |
| Alice | b@example.com |
| Bob | c@example.com |
json_normalize() and .explode()Real-world scenario: normalize first, then explode list columns.
.melt(): Wide to LongTransform from “wide” format (many columns) to “long” format (fewer columns, more rows).
Wide format:
| date | NO2 | PM2.5 |
|---|---|---|
| 2020-01-01 | 20.5 | 8.5 |
Long format:
| date | pollutant | concentration |
|---|---|---|
| 2020-01-01 | NO2 | 20.5 |
| 2020-01-01 | PM2.5 | 8.5 |
The long format is useful for plotting with seaborn.
.melt()?Useful for seaborn plots (we’ll see more in W09):
Use it when:
| Function | Use When |
|---|---|
json_normalize() |
Flattening nested JSON from APIs |
pd.concat() |
Combining multiple DataFrames |
.explode() |
Columns contain lists that need separate rows |
.melt() |
Converting wide format to long for plotting |

Today you learned:
pd.json_normalize() - Automatically flattens nested JSONrecord_path and meta - Control how nested lists expandpd.concat(), .explode(), .melt() - Reshaping tools you’ll see again in W08 and W09Remember: Your for loop solutions from MP1 were correct! These tools make your code simpler, but understanding the data structure (which you demonstrated) is the most important skill.
In tomorrow’s 💻 W07 Lab, you’ll:
Released Week 07, due Week 10 (30% of final grade)
You’ll need these skills to:
json_normalize()The full model solution from ✍️ Mini-Project 1 (with excellent reflections) is now available on Nuvolos.
Location: mp1-model-solution/ on Nuvolos
Resources:
#help on SlackLooking ahead: 🖥️ W08 Lecture introduces databases and SQL, a natural next step after mastering data reshaping!
![]()
LSE DS105A (2025/26)