ποΈ Week 07 - Putting it all together: from web scraping to initial data cleaning
Theme: Cleaning and reshaping data
The original title of this lecture was "Data Summarisation and the Grammar of graphics". However, following your feedback via Slack and office hours, this weekβs class will focus on doing a practical example of web scraping and initial data cleaning, thus putting together the skills we have learned so far.
π Lecture Schedule
πLocation: Thursday 29 February 2024, 4 pm - 6 pm at MAR.1.04
π¨βπ« Lecture Material
π₯ Looking for lecture recordings? You can only find those on Moodle, typically a day after the lecture. If you canβt find the recordings, please contact π§ .
GitHub Repository
We will create a GitHub repository at the lecture. Once created, the link below will take you to the repository.
Goals
In this lecture, we revisit the core concepts of web scraping by compiling a list of the last instances of UK general elections from Wikipedia.
The case study covers the following topics:
- Finding CSS/XPath selectors on a page
- Writing functions for web scraping tasks
- List comprehensions for data extraction
- Using pd.apply() for data manipulation
The repository will be created from scratch during the lecture, providing a hands-on approach to Git commands and web scraping techniques. This case study aims to reinforce the concepts learned and help students apply them to their W08 assignment.