🗓️ Week 09 - Conda environments, databases and join operations

Theme: Cleaning and reshaping data

Author

We ended up not covering enough pandas last week! It was useful to show you details about python packages and the requirements.txt, though.

This means I will return to my original plan for Week 08 and show you the power of the pandas’ apply() method and how to use groupby() to group data.

To illustrate these concepts, I will still use the IMDb Non-Commercial Datasets this week. These datasets are a collection of data files that contain information about movies, actors, actresses, directors, producers, etc. The data is provided in TSV (tab-separated values) format, which is very similar to CSV. Look at the section below, where I explain how to download the data.

🎯 Learning Objectives

Same as what I had planned for Week 08:

the concept of ‘tidy’ data
using pd.apply() to clean data
the notion of anonymous functions (lambda functions)
grouping data with groupby()
using our custom functions with groupby()

📚 PREPARATION

To come well prepared for the lecture, clone the following GitHub repository:

🖇️ LINK TO REPOSITORY

📃 Lecture Schedule

📍Location: Thursday 23 November 2023, 4 pm - 6 pm at CKK.1.04

👨‍🏫 Lecture Material

🎥 Looking for lecture recordings? You can only find those on Moodle, typically a day after the lecture. If you can’t find the recordings, please contact 📧 .

Material

This week’s lecture material is available under this dedicated GitHub repository:

🖇️ LINK TO REPOSITORY

Solutions to the exercises and live demos in the Jupyter Notebook of this lecture will NOT be posted here afterwards. We will create these solutions together during the lecture.