πŸ’» Week 03 Lab

Turning lists and dictionaries into dataframes

Author
Published

17 October 2024

Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

πŸ“Location: Friday, 18 October 2024. Check your timetable for the precise time of your lab.

πŸ₯… Learning Objectives

We want you to learn/practice the following goals in this lab:

πŸ“‹ Preparation

The real preparation required for this lab is to come willing to make mistakes in front of others (it can be a bit comfortable) and to help others when they make mistakes (even if the answers are super obvious to you).

πŸ“£ We will experiment with something called pair programming today.

One of you will be the πŸ§‘β€βœˆοΈ Pilot, the person typing the code on the Jupyter Notebook. The other(s) will act as πŸ™‹ Copilot (s) and will NOT write code. Instead, they will help guide the Pilot in what they should do next.

πŸ’‘ TIP 1: Paradoxically, this might work best if the πŸ§‘β€βœˆοΈ Pilot is the person with less programming experience. But you can swap roles at any time if you want.

πŸ’‘ TIP 2: If you are playing the role of πŸ™‹ Copilot and you want to share a piece of code, send a Direct Message (DM) to the other(s) on Slack.

Although this is called pair programming, it is OK to do it in groups. Ask your class teacher for help if you need clarification.

Here are the course materials that are most closely related to this lab:

πŸ›£οΈ Roadmap

Here is how we will achieve the goal for this lab:

Part I: βš™οΈ Set Up (10 min)

Everyone should set up their environment for this lab, regardless of the role they will play.

🎯 INDIVIDUAL ACTION POINTS

Let’s assemble all the files we need for this lab.

  1. Create a folder W03-Lab for this lab and open it in VS Code. Add a code and a data folder inside it.

You can either use Nuvolos or VS Code on your own machine.

  1. Download the world_cities.csv file and place it in the data folder.

You already download this file for the πŸ“ W03 Formative Exercise. You can copy it from there.

Remember how we downloaded the world_cities.csv file

This file is available on the joelacus / world-cities repository on GitHub logo GitHub and there are two ways you can download it:

  1. Go to that page, then click on the file named world_cities.csv. Once the file opens, right-click the β€œRaw” button and select β€œSave link as…” to save the file to your computer. If you are on Nuvolos, right-click on the data folder and select β€œUpload…” to upload the file.

  2. Go to that page, then click on the file named world_cities.csv. Once the file opens, right-click on the β€œRaw” button and select β€œCopy Link”. This will store the URL of the file in your clipboard. If you paste this URL into the terminal, you can use your knowledge of curl (from the πŸ’» W02 Lab) to download the file directly to the data folder.

  1. Download the Jupyter Notebook template below, rename it to NB01 - Data Collection.ipynb, and place it in the W03-Lab/code folder.

  2. Create an empty Jupyter Notebook NB02 - Simple Data Analysis.ipynb under the W03-Lab/code folder.

Part II: Coding in pairs (40 min)

Your goal here is to write Python code using the requests library and collect the specified data from the OpenMeteo’s Historical Weather API, the same we’ve been using in the course since last week.

πŸ’½ DATA SPECIFICATION CARD:

  • City: A selected city from the world_cities.csv file.
  • Date Period: Every single day of the year 2023.
  • Variables: Daily minimum and maximum temperatures.

🦸 Your class teacher is equipped with the superpower of conflict mediation. Ask them for help if you are struggling to collaborate or if you feel stuck as a team.

🎯 ACTION POINTS:

  1. πŸ‘₯ Together Negotiate the roles:

    • Ideally, the person less experienced or less confident with their programming skills should take the role of πŸ§‘β€βœˆοΈ Pilot.

    • πŸ™‹ Copilot(s) should not take over if the Pilot is struggling to understand or follow a suggestion. The whole point of this activity is to figure out how to best express what you know and don’t know to others.

    You can decide if and when you want to swap roles for the rest of the lab.

  2. πŸ‘₯ Together Choose a city from the world_cities.csv file.

  3. πŸ§‘β€βœˆοΈ Pilot Open the first notebook on VS Code.

    Then, edit section 1.2 to specify the city you chose.

  4. πŸ§‘β€βœˆοΈ Pilot Gather the latitude and longitude.

    Read and run the Python code contained inside section 1.3 of the notebook, then add the necessary code inside section 1.4.

    • πŸ§‘β€βœˆοΈ Pilot: before you write any code/markdown, you should communicate what you want to do first and why.

    • πŸ™‹ Copilot(s): help the Pilot understand how to use the pre-defined get_lat_lon() function and how to check if it all worked fine.

  5. πŸ§‘β€βœˆοΈ Pilot Build the URL.

    Add whatever code is necessary under section β€˜2. Collecting Data’ such that you end up with a variable url that contains the correct URL to the API that meets the requirements of the data specification card.

    • πŸ‘₯ Together: Feel free to consult and reuse code we’ve written in the past.

    • πŸ§‘β€βœˆοΈ Pilot: before you write any code/markdown, you should communicate what you want to do first and why.

    • πŸ™‹ Copilot(s): You can help by consulting the API documentation or sharing pieces of code with the Pilot via Slack DMs. If you share pieces of code with the Pilot, explain why they are relevant.

How to find out the URL

It might take some investigative work to determine the URL you need to use to download the data.

When you go to the official page of the API, you will find a lot of parameters you can tweak to get the data you want.

Figure 1. On the OpenMeteo Historical Weather API page, you can tweak the latitude, longitude, timezone, start date, and end date to match your needs. If you keep scrolling down, you will find more variables: Hourly Variables, Daily Variables, etc.

πŸ‘‰  Tweak the parameters to the latitude and longitude of a city of your choice (you can browse the world_cities.csv file) and set the start and end date to the first and last day of 2023.

After changing the parameters, scroll down to the API Response section. You will find the full address (the URL) you need there:

Figure 2. It might not be fully visible here but you will find the full address (the URL) you need under the API Response section on the page.

πŸ‘‰ This URL is unique to the specific parameters you chose.

  1. πŸ§‘β€βœˆοΈ Pilot Send a request to the URL and convert the response to a Python dictionary.

    • πŸ™‹ Copilot(s): Again, if you end up sharing pieces of code, explain why it is relevant and how to adapt it to the current task.
  2. πŸ† Reshape the data.

    Your goal now is to manipulate the JSON response such that you end up with the Python dictionary like this:

    {
    "country": "Country Code",
    "city": "City Name",
    "date": ["2023-01-01", "2023-01-02", ..., "2023-12-31"],
    "min_temp": [float, float, ..., float],
    "max_temp": [float, float, ..., float]
    }

    In other words, you need to extract a list of all the dates, a list of all the minimum temperatures, and a list of all the maximum temperatures from the JSON response and then place them in a dictionary with the keys country, city, date, min_temp, and max_temp.

    • πŸ§‘β€βœˆοΈ Pilot: before you write anything, explain what you would do first.

    • πŸ™‹ Copilot(s): Resist the urge to take over if the Pilot does not understand your suggestion. Explain (or demonstrate) your thinking differently until you both understand what to do.

  3. Save the data to data/daily_temp.json. Open the file on VS Code to confirm that it has the correct structure.

Part III: πŸ“Š Data Analysis (30-40 min)

We saved the data to a file so that we don’t have to collect the same data over and over again every time we want to continue our analysis.

🎯 ACTION POINTS:

Keep playing the roles of Pilot and Copilot. We just won’t specify in detail how you should work together.

  1. Create a NB02 - Simple Data Analysis.ipynb notebook.

  2. Add minimal documentation to the notebook: who wrote it, when, and what it is about.

  3. Load the data from data/daily_temp.json into a Python dictionary.

  4. Convert the dictionary into a pandas DataFrame.

    Use the code below to create a DataFrame from the dictionary:

    df = pd.DataFrame(data)

    where data is the dictionary you loaded from the JSON file. Add import pandas as pd to the top of the notebook if you haven’t done so already.

  5. Take a look at the data. Life is easier when we work with tables.

    Use df.head() to see the first few rows of the DataFrame. Use df.tail() to see the last few rows.

  6. Plot the temperatures. Pandas come with the helpful .plot() function that helps us to get quick insights from a table.

    Use the code below to plot the minimum and maximum temperatures:

    df.plot(x='date', y=['min_temp', 'max_temp'], figsize=(12, 6))

    Change the dimensions of the plot if you feel the plot is too small or too big.

  7. Check min and maximum temperatures for the whole year.

    Once you select a column on Pandas, you can use the .min() and .max() functions to get the minimum and maximum values, respectively.

    What is the minimum temperature of all?

    df['min_temp'].min()

    What is the maximum temperature of all?

    df['max_temp'].max()
  8. πŸ† When was the minimal/maximum temperature?

    We haven’t taught you how to do this yet. This task will require some online research and investigative work. (Remember to keep playing the role of Pilot and Copilot.)

    Here are a few tips:

    • The columns of a Pandas DataFrame can be converted to simple Python lists using the .tolist() method. For example, df['min_temp'].tolist() will give you a list of all the minimum temperatures.    

    • You can always work with pure Python lists and then use your knowledge of for loops and len() to find the minimum value of a list and its index (position in the list).

    • Alternatively, check out the official pandas documentation. Can you find a guide that explains how to filter data based on a condition?


⭐ Bonus Task

Are you a high-performing team? Here is a bonus task to challenge you further.

πŸ’½ DATA SPECIFICATION CARD:

  • City: A selected city from the world_cities.csv file.
  • Date Period: Every single day from 1st January 2004 to 31st December 2023.
  • Variables: Daily minimum and maximum temperatures and precipitation sum.
  1. Edit NB01 - Data Collection.ipynb to adapt to the new data specification card.

  2. Collect the data for the new data specification card and overwrite the data/daily_temp.json file.

Change the structure of the keys in the dictionary to include the new variables.

  1. On NB02 - Simple Data Analysis.ipynb, figure out how to plot the precipitation over the years.

  2. πŸ† Try to write code that answers the following questions:

  • What was the day with the highest precipitation in the last 20 years?

  • What was the month with the highest precipitation in the past 20 years? E.g.: was it January 2004? March 2023? Etc.

  • Is there a month that is consistently the wettest across the years?


Need more challenge? Here’s a super bonus task for you and your team:

πŸš€ Challenge Task

Collect data for London and five other Western European capitals for the past twenty years. Then, answer: Has it rained more in London than in these other capitals over the past twenty years?