💻 Week 01, Day 02 - Lab

Working with CSV/JSON and collecting weather data

Author

Dr Jon Cardoso-Silva

Last updated

15 July 2025

🥅 Learning Objectives

By the end of this lab, you should be able to: i) Explore API documentation to understand different endpoints and parameters, ii) Construct targeted API requests for both forecast and historical data, iii) Transform and clean a JSON response into a pandas DataFrame suitable for analysis, iv) Formulate a quantitative definition to investigate an open-ended question, v) Write a well-reasoned conclusion that justifies your analytical choices.

ME204 course icon

In this morning’s lecture, we saw how to use APIs and pandas to structure data for analysis. Now, you’ll apply those skills to a classic question, one that will form the basis of your midterm assignment: Is London really as rainy as the movies make it out to be?

Instead of just following instructions, your task is to act as a data journalist. You will explore the data, decide how to define “rainy,” and build a small-scale analysis to support your conclusion.

Tuesday, 15 July 2025 | Either 2:00-3.30pm or 3.30-5:00pm 📍 Check your timetable for the location of your class

🛣️ Lab Roadmap

This lab document provides a roadmap for the exercises you will complete in your Jupyter Notebook. Open the ME204_W01D02_Lab.ipynb notebook which should be in your lab-notebooks/ on Nuvolos and follow along.

If nececessary, you can also download the notebook from the link below:

Part I: Acquiring and Storing Data (50 min)

To answer our question, we first need to understand our data source, retrieve the data, and store it locally. We will use the Open-Meteo API and save our results to both JSON and CSV files.

Exploring API Endpoints

Your class teacher will start by demonstrating how to:

  1. Navigate the Open-Meteo documentation.
  2. Identify the different API endpoints for forecast vs. archive (historical) data.
  3. Compare the available hourly and daily variables (e.g., rain vs rain_sum).
  4. Construct a request that combines these different parameters.

🎯 ACTION POINTS

Now it’s your turn. In your notebook, you will fetch historical data and save it.

  1. Get Historical Data.

    In section 1.1 of your notebook, make a request to the historical API endpoint (https://archive-api.open-meteo.com/v1/archive). Fetch data for London for all of 2023. Request the following daily variables: weather_code,precipitation_sum,rain_sum.

  2. Save the Raw JSON.

    In section 1.2, take the JSON content from your response object and save it to a file named london_2023_raw.json. This is a crucial step for reproducibility, as it gives you a local copy of the exact data you received.

  3. Create and Clean a DataFrame.

    In section 1.3, take the historical JSON response and convert the daily data into a pandas DataFrame. Then, clean it up:

    • Use pd.to_datetime() to convert the time column to proper datetime objects.
    • Set the time column as the DataFrame’s index using df.set_index('time', inplace=True).
  4. Save the Clean DataFrame to CSV.

    In section 1.4, save your cleaned DataFrame to a file named london_2023_processed.csv. Make sure not to include the pandas index in the CSV file.

  5. Verify by Loading the CSV.

    In section 1.5, to confirm everything worked, read london_2023_processed.csv back into a new DataFrame variable and display its first few rows.

Part II: Designing Your Analysis (40 min)

Now for the creative part. You have the tools to get the data, but what data do you need? And how will you use it to answer the question?

Think First, Code Later

Before you write any more code, take 10 minutes to plan your analysis. In section 2.1 of your notebook, write down your answers to the following questions in a markdown cell:

  1. How will you define a “rainy day”? There’s no single right answer. Is it any day with any rain (rain_sum > 0)? A day with more than a certain amount (e.g., > 1mm) to exclude light drizzle? Or maybe a day with a specific weather_code?
  2. What time period will you analyse? One year? Five years? Thirty years? What are the pros and cons of choosing a shorter or longer period?
  3. How will you measure “raininess”? Will you count the total number of rainy days per year? Calculate the average rainfall? Find the longest stretch of consecutive rainy days?

🎯 ACTION POINTS

This is the main challenge. You will now implement your own analytical strategy.

  1. Implement Your Strategy.

    In section 2.2, write the code to execute the plan you just designed. You will likely need to:

    • Make a new, targeted request to the historical API for the time frame you decided on.
    • Create a DataFrame.
    • Add a new column to your DataFrame that flags days as “rainy” based on your definition.
  2. Calculate and Conclude.

    In section 2.3, perform the final calculation based on your plan (e.g., counting the rainy days).

  3. Justify Your Findings.

    In section 2.4, write a markdown cell explaining your findings. Your conclusion should:

    • State your answer to the question: “Is London really that rainy?”
    • Crucially, explain how your definition and time frame support your conclusion.
🗣️ CLASSROOM DISCUSSION (15 min)

Towards the end of the session, your class teacher will lead a discussion. Be prepared to share your proposed definitions and time frames.

  • What definitions did others come up with?
  • What are the strengths and weaknesses of each approach?
  • How might different definitions lead to different conclusions?

The goal here is not to agree on one method, but to understand that data analysis involves making reasoned choices and being able to defend them.

✨ Wrap-up

Excellent work. You have just completed a miniature, end-to-end data analysis project. You started with a question, explored a data source, designed a methodology, and interpreted the results. This is the “way of thinking” that is essential for a data engineer, and it’s what separates a rigorous analysis from a casual guess.


🚀 Bonus Task: Comparative Analysis

This section is for students who are already comfortable with the basics and want a challenge.

In the final section of your notebook, try one of the following:

  1. Comparative Analysis: Is London rainy compared to what? Pick another major European capital (e.g., Rome: lat=41.9, lon=12.5 or Paris: lat=48.85, lon=2.35) and perform the same analysis. How does London stack up?
  2. Seasonal Analysis: Does London’s “raininess” change with the seasons? Group your data by month or season and see if you can spot a pattern. A bar chart would be a great way to visualize this.