π£οΈ Week 08 Lecture
More Data Reshaping Techniques and Introduction to Databases
Response rate: 8%
(5 out of 63 DS105W students have completed the course evaluation survey)
Could you assist us in achieving the <strong>75% mark</strong>? Course evaluation surveys are <a href="https://moodle.lse.ac.uk/mod/forum/discuss.php?d=385499#course-evaluation-survey" style="color: #58426c; text-decoration: underline;">extremely important</a> for us, the instructors of DS105. <br><br>
Click here to provide your official feedback on this course. π‘ Note: Please assess all the instructors you have interacted with (and Jon also counts as your teacher!).
Last updated: 13 March 2025, 15:45

πTime and Location: Thursday, 13 March 2025 from 4-6 pm at MAR.1.04
This weekβs lecture will complete our exploration of data reshaping techniques and introduce you to the concept of databases, which will be essential skills for your βοΈ Mini-Project 2. We will build directly on last weekβs techniques while adding new tools to your data science toolkit.
π Interactive Puzzle-Based Learning Continues
Following the success of last weekβs format, we will continue with our puzzle-based learning approach:
New Puzzles, Same Format: We will tackle two new puzzles focused on techniques that mirror the exact challenges you will face when working with Reddit data in your βοΈ Mini-Project 2.
Puzzle 3 (about
explode()
): βThe Reddit Tags ChallengeβPuzzle 4 (about
melt()
): βThe Reddit Engagement Metrics Challengeβ
Competitive Element Continues: The tote bag competition continues! Teams with the best solutions will earn points towards winning Data Science Instituteβs tote bags. π
For this reason, I will not share the puzzles with you until the lecture starts
π Preparation
Before the lecture
- Review the techniques we covered in Week 07 (Split-Apply-Combine and JSON Normalisation)
- Ensure you can access
Nuvolos
- Bring your laptop to participate in the interactive puzzles
- Get a head start on Mini-Project 2: Consider creating a Reddit account and setting up API credentials (instructions on Moodle)
π¬ Lecture Material
The lecture will be structured in two main parts:
Part 1: Completing Data Reshaping Techniques
We will finish our exploration of data reshaping techniques with two powerful pandas functions:
DataFrame.explode(): Expanding list elements into separate rows.
DataFrame.melt(): Transforming wide data into long format.
Part 2: Introduction to Databases
After a short break, we will dive into databases:
Database Fundamentals: Understanding relational databases and their advantages. Why use databases instead of CSV/JSON files?
Working with SQLite: A lightweight database perfect for your projects. Creating database connections, storing pandas DataFrames in SQLite, and querying data from SQLite.
Database Design for Reddit Data: Practical examples of how to structure your data. Creating appropriate tables for posts, comments, and subreddits, and establishing relationships between tables.
π₯ Lecture Notebooks
The lecture notebooks will be available here and on Nuvolos at the start of the lecture.
Download the notebooks for todayβs lecture:
π MINI-PROJECT 2 CONNECTION:
Todayβs puzzles are specifically designed to prepare you for the Reddit Engagement Analysis project:
- The techniques we will cover are exactly what you will need to process and analyse Reddit API data
- The database skills will help you efficiently store and query the data you collect
- The visualisation approaches will directly translate to creating compelling visuals for your project report
π₯ Post-Lecture Actions
- Review the Jupyter notebooks from todayβs lecture
- Set up your Reddit API access if you havenβt already (instructions on Moodle)
- Start exploring potential subreddits for your βοΈ Mini-Project 2
- Practice with the sample solutions notebook
- Use the
#help
channel on Slack if you need clarification or assistance