โœ… Week 02 Lab Solution

Collecting data from APIs (Weather data)

Author
Published

12 October 2024

Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

In this page you will find tips and solutions to the ๐Ÿ’ป W02 Lab

1. Using VS Code

In the video below I (silently) demonstrate how to open a folder in VS Code, create files with a mix of shell commands and the drop-down menus, and how to interact with files in the editor. You can add comments and questions directly to the video if anything is unclear to you.

3. Solving the ๐Ÿ† Challenge

The challenge task asks:

Overall, how did the hourly temperature change from last year to this year? Compare each hour of todayโ€™s forecast and last yearโ€™s records and then take an average.

Assume I extracted the 24-hour forecast for today and the 24-hour historical data for the same day last year and saved them in the variables forecast_temp and historical_temp, respectively. These lists have the same length (24 elements) and the same order of hours (position 0 corresponds to midnight, position 1 to 1am, and so on). Watch the video above to see how to extract this data from the JSON files.

Before you jump to a solution, I want you to take ๐Ÿ‘ผ baby steps 1. Think explicitly of what you know and what you donโ€™t know and take it from there.

Hereโ€™s the type of reasoning process I would like you to follow:

3.1. โ€œWhat do I know?โ€

  • Lists are things that hold elements
  • I can access elements in a list by their position (also called index). For example, forecast_temp[0] gives me a number that is stored in the first position of the list forecast_temp.
  • The two lists are comparable because they have the same length and all the elements are numbers.
  • I can compare two numbers using the comparison operators (<, >, ==, !=, <=, >=) just like I do in real life. For example, 5 > 3 is True and 5 < 3 is False.

3.2 โ€œLet me compare the elements manually firstโ€

Doing manual comparisons is a good way to test ideas before writing more complex code. If I want compare the first element of the two lists, I can do it like this:

forecast_temp[0] > historical_temp[0]

Then, Iโ€™d swap the positions of the two lists to see if the comparison changes:

forecast_temp[1] > historical_temp[1]

But what I need to do is store the difference, not simply compare them. Instead of > I should use - (subtraction) to calculate the difference:

# Difference between what's in 
# the forecast today vs
# the historical temperature at the same hour last year
forecast_temp[0] - historical_temp[0]

3.3 โ€œI need a way to store a bunch of differencesโ€

Storing multiple elements is a job for a list. I can create an empty list and append the differences to it:

differences = []

differences.append(forecast_temp[0] - historical_temp[0])
differences.append(forecast_temp[1] - historical_temp[1])
...

Great! If I keep repeating that until I reach the end of the lists, I will have all the differences stored in the list differences.

โ€œJon says itโ€™s not a good idea to repeat code too much. I should use something else to avoid repeating the same code over and over.โ€

3.4 โ€œA paradox: I need to repeat the same code without repeating the same codeโ€

Ideally, you would revisit the material (Python pre-sessionals or the Control flow section of ๐Ÿง‘โ€๐Ÿซ W02 Lecture Notes) until you come across the notion of for loops.

Loops are part of the building blocks of programming. You write the code once and โ€˜asksโ€™ your computer to do that repeatedly for you. How do you specify the number of times? In Python, we use the range() function for this.

Type this on a Python shell or a Jupyter Notebook cell:

list(range(24))

The range function creates a sequence (a list!) of numbers from 0 to 23. This is perfect for our case because we have 24 hours in a day.

differences = []

for i in range(24):
    differences.append(forecast_temp[i] - historical_temp[i])

The code above is equivalent to you writing the code 24 times, but in a much elegant and concise way. The first time the loop runs, i receives the value 0, the second time 1, and so on until 23.

If you find the above too cluttered, you can split the code into two lines:


differences = []

for i in range(24):
    # Create a variable to store the difference
    diff = forecast_temp[i] - historical_temp[i]

    # Now it's clearer to me that 
    # I'm just adding a number to the list
    differences.append(diff)

Either way works. The second version is more verbose but might be easier to understand.

๐Ÿ’ญ Think about it:How could I replace the number 24 in there such that in the future I can reuse this same code with a list of any arbitrary length?

3.5 โ€œI have all the differences. Now what?โ€

You have all the differences stored in the list differences. Now you need to calculate the average.

The average is calculated by adding all the numbers and dividing by the total number of elements. In Python, you can use the sum() function to add all the elements of a list and the len() function to get the total number of elements.

The new version of the code would look like this:

differences = []

for i in range(24):
    diff = forecast_temp[i] - historical_temp[i]
    differences.append(diff)

# Calculate the average
average = sum(differences) / len(differences)

print(f"The average temperature difference is {average:.2f}ยฐC")

And this is a solution to the challenge task. There are still things you can do to improve this code, but this is a good starting point.

Footnotes

  1. If you are a beginner in coding, rushing to ChatGPT just to get a solution is the worst thing you can do for your learning. A Generative AI tools will surely give you a working solution but you wonโ€™t be in control of your learning and as a consequence, you will struggle massively when we introduce more complex elements. Be in control!โ†ฉ๏ธŽ