🗣️ Week 07 Lecture
JSON Normalization & Data Reshaping

📍Time and Location: Thursday, 06 March 2025 from 4-6 pm at MAR.1.04
This week’s lecture will focus on techniques for handling complex, nested data structures - particularly JSON data - and transforming them into analysis-ready formats. These skills are essential for real-world data science work where data rarely comes in the perfect format you need.
📓 Interactive Puzzle-Based Learning
This lecture will follow a unique format:
Puzzle-Solve-Learn Cycle: For each data challenge, you will:
- Work in groups of 3-4 to solve a real-world data puzzle using only the concepts and techniques you’ve learned so far
- Share and compare your solutions with other groups
- Learn a powerful new pandas technique that elegantly solves the problem
Competitive Element: The members of the group with the top solutions, most aligned with the DS105 coding philosophy, will win Data Science Institute’s tote bags! 🎁
Hands-On Learning: The focus is on creatively solving problems, practicing a bit of that ambiguous “cozy vs frustrating” feeling that comes with learning new coding techniques. Hopefully, this will help you build intuition and deeper understanding for the new techniques.
📋 Preparation
🎬 Lecture Material
The lecture will be structured around four data puzzles, each introducing a key technique for handling complex data:
📥 Lecture Notebooks
Download the notebooks for today’s lecture:
🧩 The Puzzles
Puzzle 1: “The Split-Apply-Combine Strategy”
- Understanding the fundamental pattern for data aggregation
- Learning to use
groupby()
,apply()
, andagg()
methods - Creating summary statistics by group
Puzzle 2: “The Spotify Artist Network”
- Working with nested JSON data about artists and their collaborations
- Learning to use
pd.json_normalize()
to flatten nested structures
Puzzle 3: “The Netflix Binge”
- Handling nested lists within JSON objects
- Using
DataFrame.explode()
to expand list elements into separate rows
Puzzle 4: “Instagram Analytics”
- Transforming multi-level dictionaries with time periods
- Using
DataFrame.melt()
to reshape wide data into long format
📋 TAKE NOTE:
- For each puzzle, we’ll start with the raw data structure and work toward a tidy, analysis-ready format
- The focus is on understanding the conceptual approach, not just memorizing functions
- These techniques will be directly applicable to your coursework and future data science projects
📥 Post-Lecture Actions
- Review the Jupyter notebooks from today’s lecture (will be shared after class)
- Practice with the sample solutions notebook
- Read the Tidy Data paper by Hadley Wickham
- Use the
#help
channel on Slack if you need clarification or help