ποΈ Week 08 - Data summarisation and more grammar-of-graphics
Theme: Cleaning and reshaping data
In this weekβs lecture, weβre going to explore a different approach to collecting data from the web. Instead of scraping data from a single page, weβll connect to something called an API. This will enable us to gather data from a website in a more organised manner, usually in JSON or XML formats.
Additionally, weβll dive into a theoretical framework for data visualisation known as the grammar of graphics. Weβll use the plotnine
library in Python to apply this framework and create more effective and engaging visualisations. Looking forward to an exciting week!
π Lecture Schedule
πLocation: Thursday, 7 March 2024, 4 pm - 6 pm at MAR.1.04
π¨βπ« Lecture Material
π₯ Looking for lecture recordings? You can only find those on Moodle, typically a day after the lecture. If you canβt find the recordings, please contact π§ .
GitHub Repository
This weekβs lecture is available, as usual, via GitHub. Click on the link below to access the repository and follow along with the lecture material.
Goals
By the end of this lecture, you should be able to:
- Understand how to use APIs to collect data from the web. In particular, Redditβs API.
- Use
pd.json_normalize()
function to flatten JSON data into a DataFrame. - Reflect on how to pose questions to data and visualise data to answer those questions.