🗓️ Week 05 - More web scraping: CSS Selectors, XPaths and Pagination

Theme: Collecting data

Author

You only just started exploring Web Scraping in the lab last week. This week, we will go a bit deeper and learn how to use CSS Selectors and XPaths to extract data from websites. We will also learn how to scrape multiple pages of a website. While you might already have some intuition about how to do this, we will reinforce the best practices and learn how to do it in a more systematic way.

Note: our lecture schedule has changed a bit since Week 03, when we all got stuck with the installation of Git. Since then, we had to move some of the material around and only now we are getting back on track. I will update the 📔 Syllabus page in a few days to reflect these changes.

📚 Preparation

If you feel like you didn’t fully grasp the idea behind HTML and CSS just from last week’s lab, I recommend you check out the recently published 🔖 Week 04 - Appendix page.

There is one notebook there that will help you understand the fundamentals, and we will soon (early W05) publish another one with some more programming tips.

📃 Lecture Schedule

📍Location: Thursday 26 October 2023, 4 pm - 6 pm at CKK.1.04

👨‍🏫 Lecture Material

🎥 Looking for lecture recordings? You can only find those on Moodle, typically a day after the lecture. If you can’t find the recordings, please contact 📧 .

Material

Before attending the lecture, you should have installed the following packages:

pip install pandas
pip install requests
pip install scrapy

I will use the following Jupyter notebook in the lecture: