🖥️ Week 02, Day 04 - Lecture
Data Reshaping: From pandas to SQL
By the end of this session, you should be able to: i) Explain the limitations of using multiple CSV files for related data. ii) Use pandas.merge()
to combine related DataFrames. iii) Understand the benefits of using a relational database (SQLite). iv) Design a simple database schema for posts and comments. v) Store pandas DataFrames in an SQLite database. vi) Write basic SQL queries to retrieve and aggregate data (SELECT
, GROUP BY
, JOIN
).
Welcome back! Yesterday, we mastered collecting rich data from authenticated APIs. Today, we tackle the next logical question: how do we organise, store, and connect related datasets effectively?
We’ll start with a familiar tool, pandas, and then see why databases are the professional standard for this kind of work.
⏰ Date and Time: Thursday, 24 July 2025 | 10.00am - 1.00pm 📍 Location: CKK.2.06 (see LSE’s 🗺️ campus map)
🗣️ Today’s Plan
- 🧩 Part 1: The Data Organisation Challenge (45 minutes)
- Group Activity: How would you design tables for Reddit posts and comments?
- The
pandas
solution: Usingpd.merge()
to combine our data.
- 💾 Part 2: Introduction to Databases with SQLite (50 minutes)
- Why databases? The limits of CSV files.
- Storing our pandas DataFrames in a simple SQLite database.
- 🔍 Part 3: Querying with SQL (25 minutes)
- The basics of SQL:
SELECT
,GROUP BY
, andJOIN
. - Answering the same questions, but now with the power of SQL.
- The basics of SQL:
🎬 Lecture Slides
Use your keyboard arrows to move through the slides below. You can also view them in fullscreen.
Prefer a PDF? Download the slides here:
(Sometimes the PDF export is a bit buggy and some text may appear with different formatting.)
📓 Lecture Notebook
We’ll be using a hands-on notebook for all parts of today’s session. Download it here:
Save it to your me204-study-notes
repository. You should also see it on Nuvolos.
After the Lecture
This afternoon’s lab is all about practice. You’ll take the Reddit data you’ve personally collected and apply the full workflow: merge it, design a database for it, store it, and query it. This will be a crucial skill for your upcoming assignments.
🔗 Extra Resources: Databases & SQL