

๐ฃ๏ธ Week 08 Lecture
More Data Reshaping Techniques and Introduction to Databases
Response rate: 8%
(5 out of 63 DS105W students have completed the course evaluation survey)
Could you assist us in achieving the <strong>75% mark</strong>? Course evaluation surveys are <a href="https://moodle.lse.ac.uk/mod/forum/discuss.php?d=385499#course-evaluation-survey" style="color: #58426c; text-decoration: underline;">extremely important</a> for us, the instructors of DS105. <br><br>
Click here to provide your official feedback on this course. ๐ก Note: Please assess all the instructors you have interacted with (and Jon also counts as your teacher!).
Last updated: 13 March 2025, 15:45
๐Time and Location: Thursday, 13 March 2025 from 4-6 pm at MAR.1.04
This weekโs lecture will complete our exploration of data reshaping techniques and introduce you to the concept of databases, which will be essential skills for your โ๏ธ Mini-Project 2. We will build directly on last weekโs techniques while adding new tools to your data science toolkit.
๐ Interactive Puzzle-Based Learning Continues
Following the success of last weekโs format, we will continue with our puzzle-based learning approach:
New Puzzles, Same Format: We will tackle two new puzzles focused on techniques that mirror the exact challenges you will face when working with Reddit data in your โ๏ธ Mini-Project 2.
Puzzle 3 (about
explode()
): โThe Reddit Tags ChallengeโPuzzle 4 (about
melt()
): โThe Reddit Engagement Metrics Challengeโ
Competitive Element Continues: The tote bag competition continues! Teams with the best solutions will earn points towards winning Data Science Instituteโs tote bags. ๐
For this reason, I will not share the puzzles with you until the lecture starts
๐ Preparation
Before the lecture
- Review the techniques we covered in Week 07 (Split-Apply-Combine and JSON Normalisation)
- Ensure you can access
Nuvolos
- Bring your laptop to participate in the interactive puzzles
- Get a head start on Mini-Project 2: Consider creating a Reddit account and setting up API credentials (instructions on Moodle)
๐ฌ Lecture Material
The lecture will be structured in two main parts:
Part 1: Completing Data Reshaping Techniques
We will finish our exploration of data reshaping techniques with two powerful pandas functions:
DataFrame.explode(): Expanding list elements into separate rows.
DataFrame.melt(): Transforming wide data into long format.
Part 2: Introduction to Databases
After a short break, we will dive into databases:
Database Fundamentals: Understanding relational databases and their advantages. Why use databases instead of CSV/JSON files?
Working with SQLite: A lightweight database perfect for your projects. Creating database connections, storing pandas DataFrames in SQLite, and querying data from SQLite.
Database Design for Reddit Data: Practical examples of how to structure your data. Creating appropriate tables for posts, comments, and subreddits, and establishing relationships between tables.
๐ฅ Lecture Notebooks
The lecture notebooks will be available here and on Nuvolos at the start of the lecture.
Download the notebooks for todayโs lecture:
๐ MINI-PROJECT 2 CONNECTION:
Todayโs puzzles are specifically designed to prepare you for the Reddit Engagement Analysis project:
- The techniques we will cover are exactly what you will need to process and analyse Reddit API data
- The database skills will help you efficiently store and query the data you collect
- The visualisation approaches will directly translate to creating compelling visuals for your project report
๐ฅ Post-Lecture Actions
- Review the Jupyter notebooks from todayโs lecture
- Set up your Reddit API access if you havenโt already (instructions on Moodle)
- Start exploring potential subreddits for your โ๏ธ Mini-Project 2
- Practice with the sample solutions notebook
- Use the
#help
channel on Slack if you need clarification or assistance