πŸ—“οΈ Week 10 - Databases + Data reshaping + Basics of Text Mining

Theme: Cleaning and reshaping data

Author

Yep, there’s still more to learn about data cleaning and reshaping! This week, we’ll look at combining data from multiple data frames, reshaping data so it is easier/faster to plot, and using regular expressions to extract information from text data. We’ll also look at databases and SQL and how to use them in Python.

Keeping up with the hands-on spirit of this course, the lecture material will be delivered via GitHub, and we’ll use Jupyter Notebooks for the live demos.

🎯 Learning Objectives

  • Combining data from multiple data frames (pd.merge())
  • Reshaping data so it is easier/faster to plot (pd.pivot(), pd.melt())
  • An introduction to databases and SQL - for when CSV files get too big (the basics of SQLite + pd.read_sql())
  • An introduction to regular expression, using the re package in Python

πŸ“š PREPARATION

To come well prepared for the lecture, clone the following GitHub repository:

πŸ–‡οΈ LINK TO REPOSITORY

πŸ“ƒ Lecture Schedule

πŸ“Location: Thursday 30 November 2023, 4 pm - 6 pm at CKK.1.04

πŸ‘¨β€πŸ« Lecture Material

πŸŽ₯ Looking for lecture recordings? You can only find those on Moodle, typically a day after the lecture. If you can’t find the recordings, please contact πŸ“§ .

Material

This week’s lecture material is available under this dedicated GitHub repository:

πŸ–‡οΈ LINK TO REPOSITORY

Solutions to the exercises and live demos in the Jupyter Notebook of this lecture will NOT be posted here afterwards. Everything will be in the GitHub repository.