✍️ Mini-Project I: Summer Heat Study (20%)

2024/25 Winter Term

Author

Published

13 February 2025

🥅 Learning Goals

By the end of this assignment, you will: i) Design and implement a complete data analysis workflow, ii) Apply API data collection techniques at scale, iii) Transform raw JSON data into analytical insights, iv) Practice professional documentation standards

Last Updated: 18 February 2025 to modify the wording of the marking guide. The essence is still the same, but the previous writing was too “checklisty”. Also, the distinction between what constitutes a strong (~70%) and an excellent submission should be clearer now.

This assignment will use OpenMeteo’s API for data collection and analysis, with a focus on creativity within structured constraints.

Overview

This assignment builds upon the skills developed in 📝 W04 Formative Exercise, extending from single-city analysis to a multi-city comparison. While the Week 04 practice exercise focused on basic API usage and data organization, this project requires more sophisticated analysis and visualization techniques.

📚 Preparation

You must click on a GitHub Classroom link ¹ to create your designated repository. Do not create a separate repository.
Clone the repository to Nuvolos (or your local machine, if you are brave) and create the necessary folders and files according to the rest of the instructions.

Submission

📅 Due Date: 27 February 2025, 8pm UK time

📤 Submission Method: Push your work to your allocated GitHub repository. DO NOT submit via Moodle.

🤖 AI Usage: You are allowed to use AI tools for whatever you want in this assignment. You won’t lose any marks for copy-pasting code from AI but you might lose marks if your coding choices deviate from the course material without proper justification.

Remember what we’ve been discussing in lectures: those who are relying too much on AI are not learning as much because instead of relying on their own notes or coming to instructors for help, they are pasting AI’s output into their code without any understand of it. You might not realise but this is very apparent to us markers 😬.

If you submit after this date without an authorised extension, you will receive a late submission penalty.

Need an Extension?

If you have extenuating circumstances that require an extension:

Email 📧 (Kevin) with details of your situation
Include the extensions form
Submit your request before the cut-off time. You will typically get an answer within 24 hours.

⚠️ Note: Extensions are granted only for valid extenuating circumstances, not for technical difficulties with Git, Nuvolos or time management issues. Start early and use our support resources (Slack, drop-in sessions, office hours) if you need help.

📝 Research Question

You’ve been commissioned by the Office of Curious Analytics (OCA) to investigate summer heat stress in world capitals. Your task is to answer:

“Which world capital experienced the most extreme summer conditions for outdoor activities in their most recent meteorological summer?”

💡 Note: Your analysis should focus on quantifying and comparing the severity of heat stress conditions across your chosen cities. You will need to justify both your choice of cities and your method for measuring “extreme” conditions.

🤔 Key Decisions

Here are some decisions you will need to make:

City Selection

Choose exactly 4 national capitals for analysis (e.g., London, Paris, Berlin, Rome)
All cities must be from the same hemisphere (Northern or Southern)
Explain why these cities make a good comparison set
A subjective explanation is acceptable (e.g., “I think these cities represent diverse climates within Europe”)

🚨 Please stick to 4 cities, don’t add more. Because everyone will be using Nuvolos, which is essentially a single shared machine, we need to be mindful of API usage.

Define Extreme Summer

Propose a definition of what constitutes an "extreme summer" for outdoor activities.

Here you can either:

Use the wet-bulb temperature variable and one of the thresholds indicated by the organisations mentioned in this The Atlantic article ².
This variable is available on OpenMeteo inside the "Additional Variables And Options" section, it’s just not listed prominently,
(we won’t tell you what the variable is called, but if you read the above carefully and search for it on the OpenMeteo website, you’ll easily find its name.)
Propose your own variable and thresholds. Just make sure your rationale is grounded in reputable sources (e.g., academic publications or guidelines of professional bodies).
(This time you can’t just simply come up with a variable and thresholds out of thin air. You will need to do a tiny bit of research to find a justification for your choice.)

Time granularity

Choose how to aggregate your data (hourly, daily, weekly, monthly or something else)
Explain why this granularity is appropriate for your analysis
Example: “I chose daily averages because extreme heat’s impact on outdoor activities is best assessed over full days rather than specific hours”

Define how you will compare cities

Will you count the number of hours exceeding the threshold per day or will you calculate averages? The type of analysis is up to you. Beyond the technical implementation and the quality of your code, we will assess how reasonable your choice is.

📋 Other Requirements

Here are a few other things you must adhere to in your work:

Timeframe

Although the time granularity is up to you, the period of analysis must be the last meteorological summer:

Northern Hemisphere: June 1st - August 31st, 2024
Southern Hemisphere: December 1st, 2023 - February 28th, 2024

💡 Note: You must collect data for the entire period, even if you later decide to focus on specific weeks or days for your analysis.

Coding Standards

You must use the requests library to send API requests.
(Building on W02 & W03 foundations)
You must use the json library to parse and store the API responses to file.
(Extending W02 & W03 skills)
You must use relative paths to read/write data to the data/raw and data/processed folders.
(A W03 concept)
All pre-processing of the data must be done using vectorised operations with pandas (preferrably) or numpy.
(W04 & W05 concepts)
All visualisations must be done using exclusively the lets-plot library.
(A W05 concept)
Any use of functions or programming concepts that did not feature on Dataquest or in lectures or in classes MUST be explicitly justified. What made this situation unique, forcing you to use something we didn’t cover in the course so far?
Code must aspire to be readable and self-explanatory. Use meaningful variable names and include comments to explain complex operations or key decisions.
(This will help future-you and others understand your code later)

📂 Repository Structure

Ensure your repository adheres to the following structure:

<github-repo-folder>/
|-- data/
|   |-- raw/
|   |-- processed/
|-- notebooks/
|   |-- NB01 - Data Collection.ipynb
|   |-- NB02 - Analysis.ipynb
|-- README.md

What goes in each notebook?

Open the boxes below for suggested structures for each notebook. While you have flexibility in how you organise your work, following these structures will help ensure your analysis is clear and complete.

NB01 - Data Collection.ipynb

This notebook focuses on gathering and storing the weather data. Its purpose is to document your data collection process, from API requests to file storage, ensuring your analysis is reproducible.

Section	Details	Suggested Level
Title and Overview	First Markdown cell that includes: i) Your name and LSE candidate number, ii) The notebook’s purpose, iii) A high-level summary of your approach	H1
Imports	Section where necessary Python packages are imported. All imports must be here and should not appear anywhere else in the notebook.	Not a heading
City Selection	Document and justify your choice of cities, including any relevant background information. Aim for a concise yet well-grounded explanation.	H2
Data Collection	Code and explanation for fetching data from OpenMeteo API.	H2
Data Storage	Code for saving the JSON files to the `data/raw` folder.	H2
Next Steps	Preview of your analysis approach and how this data will help answer your research question.	H2

NB02 - Analysis.ipynb

This notebook transforms your raw data into insights. It should tell a clear story about summer heat conditions in your chosen cities, supported by data and visualisations.

Section	Details	Suggested Level
Title and Overview	First Markdown cell that includes: i) Your name and LSE candidate number, ii) The notebook’s purpose, iii) A high-level summary of your approach	H1
Imports	Section where necessary Python packages are imported.	H2
Data Loading	Load the JSON files from the `data/raw` folder.	H2
Data Processing	Transform your data into a tabular format that is suitable for the subsequent analysis (exploratory data analysis and plots). Save that cleaned data to the `data/processed` folder as a CSV file.	H2
Analysis	Systematic investigation of your research question, with clear yet concise explanations of your methodology and findings.	H2
Visualisations	Carefully designed plots that support your analysis, with meaningful titles and clear interpretations.	H2
Conclusions	Synthesise your findings, acknowledge limitations, and suggest potential areas for further investigation.	H2

README

(Do this one last, after you’ve completed the notebooks.)

The point of a README is to act as a high-level overview of the project. It should be concise and to the point.

Section	Details	Heading Level
<Name of your project>	Give a name to your project and give us an overview of what it is about. (a few sentences)	H2
Methodology	Explain choices you’ve made and how you went about answering your research question.	H2
Usage	Tell us how to run the code and access the notebooks. Which packages do I need to install? How should I run the code?	H2
Results	Summarise your findings in a few sentences and include the plots here.	H2
(optional) AI Acknowledgment	A transparent statement on the use of AI tools, if applicable, including their impact on the project.	H2

✔️ Marking Guide

In line with the unwritten but widely-used UK marking conventions, grades must be awarded as follows:

40-49: Basic implementation with significant room for improvement
50-59: Working implementation meeting basic requirements
60-69: Good implementation demonstrating solid understanding
70+: Excellent implementation going beyond expectations, showing creativity and depth without over-engineering

Documentation and Repository Structure (0-25 marks)

A strong submission (~17-18 marks) will demonstrate:

Professional repository organisation with the correct structure of folders and files
Comprehensive yet ‘concise enough’ README that gives readers a good overview of the project and how to run the code
Good documentation principles in the notebooks, with meaningful section headings and comments in places where code is not self-explanatory
Thoughtful commit messages that tell a coherent story of the evolution of the project

Excellence in this category (beyond the basic requirements explicitly taught in the course) comes from:

Documentation that shows deep understanding of software engineering principles
Repository structure that demonstrates forward thinking about maintainability, showing that the project is organised to be scalable to many more cities and is future-proofed such that it can be easily extended to other types of analysis

Data Collection Code (0-25 marks)

A strong submission (~17-18 marks) will demonstrate:

Well-structured notebooks where each notebook has a single, focused overall purpose
Efficient and reliable interaction with the API, with code that handles errors and checks that the data is as expected
Clear code organisation with meaningful variable and function names
Thoughtful comments that explain “why” not just “what”
Data is stored in the appropriate place, with the correct file name and extension

Excellence in this category (beyond the basic requirements explicitly taught in the course) comes from:

Elegant solutions that anticipate and properly handle unexpected situations (like missing data or unusual API errors)
Code that is both efficient and highly readable
Creative yet practical approaches to data collection challenges without over-engineering the solution

Analysis and Visualizations (0-35 marks)

A strong submission (~25 marks) will demonstrate:

Clear methodology that defines what constitutes “extreme” conditions. Even if using the wet-bulb temperature variable, you must explain what it represents and justify why you chose the threshold you did and what it means for the analysis
Appropriate use of pandas (preferrably) or numpy operations. Instead of explicit for loops, you should use vectorised operations.
Effective use of lets-plot for visualization, with meaningful titles that convey a key takeaway of the plot and does not just describe the plot’s axes. We will penalise the use of other data viz tools harshly (unless you have a really good justification for why you couldn’t do that plot in lets-plot)
Thoughtful interpretation of results

Excellence in this category comes from:

Sophisticated analysis that shows deep understanding of the data
Visualisations that effectively communicate complex patterns
Critical examination of assumptions and limitations
Novel approaches to answering the research question

Creativity and Originality (0-15 marks)

A strong submission (~10 marks) will demonstrate:

Creative yet justified methodology choices
Innovative use of course concepts
Clear and engaging narrative
Professional presentation

Excellence in this category (beyond the basic requirements explicitly taught in the course) comes from:

Original approaches that enhance rather than complicate. Always remember: doing more is not better.
Thoughtful innovations that demonstrate deep understanding of the course material.
Engaging presentation that maintains professional standards
Creative solutions that could be applied in real-world scenarios

⚠️ Important Note: While we encourage exploration and creativity, using concepts or libraries not covered in the course without proper justification will result in mark deductions. If you need to use something we haven’t covered, you must explain why it was necessary and demonstrate your understanding of it.

💡 Note: This assignment contributes 20% to your final grade. It builds on concepts from Weeks 01-05 and prepares you for the more complex group project later in the term.

Feedback

Feedback will include:

Strengths and areas for improvement.
Suggestions for enhancing your approach in future assignments.

💡 Tip: The OpenMeteo API has rate limits. You should implement a delay between requests (e.g., using time.sleep()) to avoid being blocked.

Footnotes

Visit the Moodle version of this page to get the link. The link is private and only available for formally enrolled students.↩︎
Wet-bulb temperature is a key metric for understanding heat stress, as it accounts for both temperature and humidity. High wet-bulb values reduce the human body’s ability to cool down and can make outdoor activities unsafe. This assignment will require you to analyse wet-bulb temperature data and benchmark it against established thresholds for heat stress.↩︎