ποΈ Week 09 - Conda environments, databases and join operations
Theme: Cleaning and reshaping data
We ended up not covering enough pandas last week! It was useful to show you details about python packages and the requirements.txt
, though.
This means I will return to my original plan for Week 08 and show you the power of the pandasβ apply()
method and how to use groupby()
to group data.
To illustrate these concepts, I will still use the IMDb Non-Commercial Datasets this week. These datasets are a collection of data files that contain information about movies, actors, actresses, directors, producers, etc. The data is provided in TSV (tab-separated values) format, which is very similar to CSV. Look at the section below, where I explain how to download the data.
π― Learning Objectives
Same as what I had planned for Week 08:
- the concept of βtidyβ data
- using
pd.apply()
to clean data - the notion of anonymous functions (
lambda
functions) - grouping data with
groupby()
- using our custom functions with
groupby()
π PREPARATION
To come well prepared for the lecture, clone the following GitHub repository:
ποΈ LINK TO REPOSITORY
π Lecture Schedule
πLocation: Thursday 23 November 2023, 4 pm - 6 pm at CKK.1.04
π¨βπ« Lecture Material
π₯ Looking for lecture recordings? You can only find those on Moodle, typically a day after the lecture. If you canβt find the recordings, please contact π§ .
Material
This weekβs lecture material is available under this dedicated GitHub repository:
ποΈ LINK TO REPOSITORY
Solutions to the exercises and live demos in the Jupyter Notebook of this lecture will NOT be posted here afterwards. We will create these solutions together during the lecture.