πŸ—“οΈ Week 08 - Data summarisation and more grammar-of-graphics

Theme: Cleaning and reshaping data

Author

In this week’s lecture, we’re going to explore a different approach to collecting data from the web. Instead of scraping data from a single page, we’ll connect to something called an API. This will enable us to gather data from a website in a more organised manner, usually in JSON or XML formats.

Additionally, we’ll dive into a theoretical framework for data visualisation known as the grammar of graphics. We’ll use the plotnine library in Python to apply this framework and create more effective and engaging visualisations. Looking forward to an exciting week!

πŸ“ƒ Lecture Schedule

πŸ“Location: Thursday, 7 March 2024, 4 pm - 6 pm at MAR.1.04

πŸ‘¨β€πŸ« Lecture Material

πŸŽ₯ Looking for lecture recordings? You can only find those on Moodle, typically a day after the lecture. If you can’t find the recordings, please contact πŸ“§ .

GitHub Repository

This week’s lecture is available, as usual, via GitHub. Click on the link below to access the repository and follow along with the lecture material.

LINK TO GITHUB REPOSITORY

Goals

By the end of this lecture, you should be able to:

  • Understand how to use APIs to collect data from the web. In particular, Reddit’s API.
  • Use pd.json_normalize() function to flatten JSON data into a DataFrame.
  • Reflect on how to pose questions to data and visualise data to answer those questions.