DS105 2025-2026 Autumn Term Icon

πŸ’» Week 08 Lab

SQL Basics: From SELECT to JOIN

Author

Dr Jon Cardoso-Silva

Published

21 November 2025

πŸ“ Logistics

Time and Location: Friday, 21 November 2025. Check your timetable for your precise class time and location.

Today’s lab focuses on learning SQL by querying a pre-built SQLite database. You’ll progressively build SQL skills, from simple SELECT statements to complex aggregations, and recreate the visualisation from W07 Lab using SQL instead of pandas.

πŸ“‹ Preparation

Before coming to lab, ensure you have:

  • βœ… Attended the πŸ–₯️ W08 Lecture
  • βœ… Reviewed database concepts and SQL basics from the lecture
  • βœ… Completed the W07 Lab (you’ll see how SQL achieves the same results as pandas)

🎯 Learning Strategy

This lab teaches SQL through progressive skill building. Each task adds ONE new SQL concept. By the end, you’ll independently write a query combining SELECT, WHERE, COUNT DISTINCT, GROUP BY, ORDER BY, and LIMIT.

How to succeed:

  1. Do not skip skills - each builds on the previous one
  2. Write pandas translations by yourself and avoid over-relying on AI
  3. Ask questions immediately when SQL syntax confuses you
  4. Notice when SQL feels intuitive (SELECT, WHERE) vs when it requires more thought (GROUP BY, COUNT DISTINCT, JOIN)

Note to class teachers: Monitor student understanding at these critical points:
- Skill 3 (COUNT) vs Skill 4 (COUNT DISTINCT) - is the distinction between COUNT and COUNT DISTINCT clear to them?
- Skill 5 (GROUP BY) - is the notion of split-apply-combine clear (regardless of SQL)?
- Skill 6 (JOIN) - does the concept of JOINing tables make sense?
- Part 3 synthesis - can they combine pieces without help/over-reliance on AI?

πŸ›£οΈ Roadmap

Part 1: Database Schema Exploration (15 min)

Explore the OpenSanctions SQLite database structure. Use SQLite Viewer or PRAGMA queries to identify tables, examine schemas, and understand relationships between tables.

Part 2: Progressive SQL Skill Building (50 min)

Build SQL skills step by step:

  1. Skill 1: SELECT columns (5 min)
  2. Skill 2: WHERE filtering (8 min)
  3. Skill 3: COUNT aggregation (8 min)
  4. Skill 4: COUNT DISTINCT (8 min)
  5. Skill 5: GROUP BY (10 min)
  6. Skill 6: JOIN tables (10 min)

After each SQL skill, write the pandas equivalent to verify understanding.

Learning Objective: By Part 3, independently write a SQL query that combines SELECT, WHERE, COUNT DISTINCT, GROUP BY, ORDER BY, and LIMIT.

Part 3: Creating plot_df and Visualization (15 min)

Recreate the W07 Lab visualization (top 10 sanctioning countries) using SQL. Compare SQL approach with pandas to see how both achieve the same result.

Part 4: Reflection (classroom discussion)

Discuss when to use SQL vs pandas for data analysis, and whether SQLite would help your ✍️ Mini Project 2 work.

πŸ“š Getting Started

Option 2: Download Lab Files Directly

Download the lab files to work on your own machine:

πŸ’½ Data Specification

We’re using a pre-built SQLite database containing the same OpenSanctions data from W07 Lab, but stored in normalised tables instead of nested JSON.

Database structure:

  • targets table: Sanctioned individuals/entities with target_id, caption, country, first_seen, last_seen
  • sanctions table: Sanction events with sanction_id, target_id, country, status, start_date, end_date
  • target_aliases table: Alternative names for targets with target_id and alias

Key points:

  • Same data as W07 Lab, different structure (normalised vs nested)
  • ~4,000 target records with associated sanctions
  • Uses foreign keys to connect tables (e.g., target_id links targets and sanctions)

πŸ”§ Optional Extension

If you want to see how this database was built from the nested JSON, explore:

W08-NB03-Lab-OpenSanctions-SQLite.ipynb - Shows the complete process of transforming nested JSON into normalised tables and creating the SQLite database

This is useful if you need extra practice with pd.json_normalize() for ✍️ Mini Project 2.

πŸ“š References

Useful references for lab techniques: