πŸ§‘β€πŸ« Week 04 Lecture

Exploratory Data Analysis with pandas

Author
Published

23 October 2024

Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

πŸ“ƒ Schedule

πŸ“Location: Thursday 24 October 2024, 4 pm - 6 pm at CLM.5.02

  • 4:00 pm - 4:30 pm: I will create a repo on GitHub for today’s lecture and discuss the Git Rituals first introduced in the πŸ“ W04 Formative - Task 2

  • 4:30 pm - 5:00 pm: Exploring the capabilities of pandas, Python’s main data manipulation library, and comparing it to lists and dictionaries

  • 5.00 pm - 5.10pm: πŸƒβ€β™‚οΈβ€βž‘οΈQuick breakπŸƒβ€β™‚οΈ

  • 5:10 pm - 6:00 pm: Curiosity-driven exploratory data analysis: I will write code to answer your data questions.

πŸ“‹ Preparation

πŸ“ Lecture Notes

πŸ“‹ TAKE NOTE:

  • You won’t find β€œslides for studying” in this course. I do use slides in my lectures, but they serve as a visual aid to help me organise my thoughts. I tend to post those slides after the lecture on Slack, along with other links and resources.

  • Let me know if you want me to add notes on any specific topic or expand on something you might want to revisit later.

I created the GitHub logo lse-ds105/ds105a-2024 repository live in the lecture and added an initial folder with data and two Jupyter Notebooks.

How to clone this repository

  1. Go to the repository page

  2. Fork it! Click on the Fork button in the top right corner of the page. This will create a copy of the repository but under your account (so you can make mistakes as you learn without breaking the original).

Figure 1. Forking the repository.
  1. Configure the forked repository

Figure 2. Leave everything as is.
  1. You should see the repository under your account now.

Figure 3. You should see the repository under your account.
  1. Rename or delete the old repository

    If you had already cloned the original repository, you need to rename or delete it before cloning the new one, to avoid conflicts.

    Whether you are on Nuvolos or your own computer:

    • If you took notes on the repository yesterday that you want to keep, rename the folder from ds105a-2024 to something like ds105a-2024-old. If you want to do this via the Terminal, run the command mv ds105a-2024 ds105a-2024-old on bash, zsh or the Powershell
    • Otherwise, it’s safe to delete the folder. If you want to do this via the Terminal, run the command rm -rf ds105a-2024 on bash or zsh. If you are on Powershell, run rmdir ds105a-2024 (Press β€˜A’ when asked)

Just give me the files

If you prefer, you can download the files as they were before we started the lecture:

🚨 NOTE: NB02 is empty here, as it was created live in the lecture. For the up-to-date version of the notebook, refer to the GitHub repository.