๐ฅ๏ธ Week 03, Day 01 - Lecture
Designing Databases and Merging Data
pandas.merge()
to combine related DataFrames, ii) Explain the limitations of using flat files for relational data, iii) Design a simple, normalised database schema, iv) Store pandas DataFrames in a SQLite database, v) Write basic SQL JOIN
queries to retrieve and combine data.
Welcome back! We will start by checking in on your progress with the ๐ฆ Final Project - todayโs session is is directly relevant to your project! Weโll learn how to organise your collected data into proper database structures, design relationships between different parts of your data, and query your data efficiently using SQL. This content will help you with NB02 (database design) and NB03 (analysis) - the next steps after your data collection.
โฐ Monday, 28 July 2025 | 10:00am - 1:00pm ๐ Location: CKK.2.06 (see LSEโs ๐บ๏ธ campus map)
Project Check-in & Todayโs Roadmap
10:00 โ 10:15
๐ Project Status Check
We will start with a quick poll to see how everyone is doing with their ๐ฆ Final Project before we dive into todayโs content.
Part 1: From pandas to Databases
10:15 โ 10:45
Setting the Stage
Weโll combine Reddit posts and comments using pandas.merge()
. This shows how to join related data using tools you know.
Key Concepts
- Data Collection: We will reuse the data collection code from the ๐ฅ๏ธ Week 02 Day 04 Lecture.
- The pandas Bridge: Understanding
pd.merge()
and join types - Limitations: Why CSV files donโt work well at scale
Watch how we link posts and comments using post_id
. This shows the basics of relational data.
Quick Stretch Break & Check-in
10:45 โ 10:50
Part 2: Database Design Fundamentals
10:50 โ 11:30
Why Use Databases?
Weโll explore why organisations need databases instead of large CSV files, focusing on data integrity and scalability, as we explore how databases are structured in practice.
Database Schema Design
- Normalisation & Keys: How to structure data relationships
- SQLite Introduction: Creating tables with proper data types. A nod to our ๐ฅ๏ธ Week 01 Day 02 Lecture.
- Populating the Database: Moving data from pandas to SQLite
Key Learning Moment
Learn to design a two-table schema - an essential step for your ๐ฆ Final Project.
โ๏ธ Coffee Break
11:30 โ 11:45
Part 3: SQL Queries and Analysis
11:45 โ 13:00
SQL JOIN
Weโll recreate our pandas merge using SQL JOIN to show how databases handle data relationships.
Query Building Blocks
- Basic SELECT: Getting data
- WHERE clauses: Filtering
- JOIN operations: Combining tables
- GROUP BY: Summarizing data
Follow Along
Compare SQL and pandas solutions side-by-side.
๐ Lecture Notebook
Get the notebook for todayโs hands-on work:
Save it to your me204-study-notes
repository or find it in Nuvolos under lab-notebooks/
.
After the Lecture
This afternoon: we will recap and revisit some SQL and then help you organise your collected data into proper database structures and writing SQL queries to analyse it. Youโll apply todayโs concepts directly to your project data.
๐ Extra Resources: Databases & SQL