πŸ—£οΈ Week 09 Lecture

Designing a Good Database Schema & Best Practices for Data Visualisation

Author

Dr Jon Cardoso-Silva

Published

19 March 2025

πŸ₯… Learning Goals
By the end of this lecture, you should be able to: i) Design an effective relational database schema, ii) Apply normalisation principles to enhance data structure, iii) Recognise and implement best practices for data visualisation, iv) Create informative visualisations that effectively communicate insights.
DS105W course icon

πŸ“Time and Location: Thursday, 20 March 2025 from 4-6 pm at MAR.1.04

This week’s lecture builds directly on our previous introduction to databases and focuses on designing effective database schemas and best practices for data visualisation. These skills will be directly applicable to your ✍️ Mini-Project 2 as you work on creating structured databases and communicating insights through visualisations.

πŸ“‹ Preparation

Before the lecture

  • Review the database concepts covered in Week 08
  • Ensure you can access Nuvolos (or ensure your computer is all set up).
  • Bring your laptop to participate in the interactive demonstrations
  • Be prepared to adapt the code you will see live in the lecture to your own data

🎬 Lecture Material

The lecture will be structured in two main parts:

Part 1: Designing a Good Database Schema

Building on last week’s introduction to databases, we will explore how to design effective schemas that properly organise your data:

  1. Database Fundamentals for Data Scientists

    Understanding tables as collections of related data with unique identifiers that connect to each other

  2. Tidy Data Principles for Databases

    Applying the same principles we use for DataFrames to create well-structured database tables

  3. Creating Efficient Database Tables

    Converting your Reddit data into a database that maintains relationships between posts, comments, and subreddits

Here’s how the data structure might look for our Spotify example (which parallels what you’ll need for your Reddit data):

erDiagram
    ARTISTS ||--o{ ALBUMS : creates
    ALBUMS ||--o{ TRACKS : contains
    
    ARTISTS {
        string artist_id PK
        string name
        int popularity
        int followers
        string genres
    }
    
    ALBUMS {
        string album_id PK
        string artist_id FK
        string name
        string release_date
        int total_tracks
    }
    
    TRACKS {
        string track_id PK
        string album_id FK
        string name
        int track_number
        int duration_ms
    }

This diagram illustrates the relationships between our three main entities. Your Reddit database will follow a similar structure, with subreddits, posts, and comments instead.

Part 2: Best Practices for Data Visualisation

In the second hour, we will analyse effective visualisations from the Office for National Statistics (ONS) and learn how to apply these principles to your own work:

  1. Analysing the ONS Educational Attainment Article (link)

    Learning from real-world examples of effective data storytelling in an official publication.

  2. Principles of Effective Data Visualisation

    Understanding how to maximise information while minimising visual clutter.

  3. Whose responsibility is it to extract insights from data?

    Explore the dashboard versus analysis approach we have in this course and how it is evidenced in the ONS article.

πŸ“₯ Lecture Notebooks

Download the notebooks and files for today’s lecture:

πŸ“‹ MINI-PROJECT 2 CONNECTION:

Today’s content is directly applicable to your Mini-Project 2:

  • The database schema principles will help you organise your Reddit data effectively
  • The visualisation best practices will improve how you communicate your findings
  • Both elements are essential for creating a polished, professional final project

πŸ“₯ Post-Lecture Actions

  1. Review the Jupyter notebooks from today’s lecture
  2. Apply the database schema principles to your Mini-Project 2
  3. Revise any visualisations you have created using the best practices discussed
  4. Finalise your Mini-Project 2 submission
  5. Use the #help channel on Slack if you need clarification or assistance

πŸ“š Additional Resources

Database Schema Design

Data Visualisation