📝 W04 Practice: Your First Full Data Science Project

Heatwave Analysis with Git Workflows

Author

Dr Jon Cardoso-Silva

Published

15 October 2025

🎯 Learning Goals

By the end of this exercise, you will: i) Separate data collection from analysis using professional notebook workflows, ii) Design algorithms using loops and conditionals to detect multi-day temperature patterns, iii) Make and document technical decisions about data structures and risk mitigation, iv) Practice complete Git workflows with meaningful commit messages, v) Reflect on your problem-solving process and document AI usage transparently

Briefing

⏳	DEADLINE	Thursday, 23 October 2025, 12:00 GMT
📂	Repository Setup	GitHub Classroom Repository (link on Moodle)
💎	Key Learning Concept	Complete data workflow: collect → store → analyse → document

💡 Your First Formal Submission:

This exercise continues directly from the 💻 W03 Lab. You’ve already set up the repository and practised the Git ceremony. Now you’ll complete a full data analysis project.

You will receive individual feedback on your submission. This prepares you for the first graded assignment (Mini-Project 1), which will be released in W04 Lecture (worth 20% of your final grade).

📚 Essential Guides:

3️⃣ Data Science Workflow - Understand where each step fits in the bigger picture
4️⃣ Git & GitHub Guide - Reference for Git commands and workflows

⚠️ Important: We can only provide feedback on submissions received before the deadline. Late submissions will not receive feedback.

💡 This is Formative Practice: You don’t need to complete all sections to receive feedback. Submit whatever you’ve accomplished by the deadline, and we’ll provide individual feedback on your work. For General Course students and everyone else, this will count as ‘done’ even if you haven’t finished everything. The goal is learning, not perfection.

The Question

“How many heatwaves were there per year in London since 1990?”

You learned the answer to this in 📝 W01 Practice but can you write code to collect real data and produce the CSV file you saw back then?

🌡️ Heatwave Definition

UK Met Office Definition:

A heatwave occurs when there are at least 3 consecutive days with maximum temperature ≥ 28°C.

This is the official definition used by the UK Met Office for heatwave classification in the UK. You will use this definition consistently throughout your analysis.

Source: UK Met Office - Heatwave Definition

📁 Expected Repository Structure

Your repository should follow this exact structure:

w04-practice-your-username/
├── data/
│   ├── london_temperatures_1990_2025.json
│   └── london_heatwaves_by_year_1990_2025.csv
├── NB01-Data-Download.ipynb
├── NB02-Data-Analysis.ipynb
├── README.md
└── .gitignore

📊 Why this structure? Professional data science projects separate data, code, and documentation. This follows best practices explained in the 3️⃣ Data Science Workflow guide.

Step-by-Step Instructions

Part 0: AI Usage and Chat Log Documentation (5 min)

🤖 AI Policy Reminder: The use of AI tools is fully authorised in this course. You are not penalised for using AI in your DS105A course work.

Why we ask for your AI chat logs:

Together with the feedback on your actual project, we might be able to provide you with some tips on how to improve your use of AI for this assignment.

General-Purpose Generative AI Tools that you can use:

The following are the most popular general-purpose generative AI tools out there and which allows you to share a link to your chat log.

ChatGPT (OpenAI)
Gemini (Google)
Claude (Anthropic)
⭐️ My recommendation: I would, of course, prefer if you used the DS105A (2025/2026) Claude Project. I have been curating information for this custom Claude bot to help you with your DS105A coursework. Your chat log will also help me check if the bot is helping you or not.

How to Document Your AI Usage:

Keep dedicated chats about this assignment.

One thing I like to do is to start a fresh new chat and type “I will use this chat for the 📝 W04 Practice assignment, as part of the LSE DS105A (2025/2026) Data for Data Science course.”
Share the link to your chat log in Section 5 of NB02

Each chatbot has a different way to allow you to share your chat with others. Once you have completed the assignment, go back to the chat log windows and find the share functionality. Copy the link that will be provided and paste it in the appropriate section of your notebook (Section 5 of NB02).

We can assist you with that. Just post a question on the #help channel on Slack.

Part 1: Setting Up Your Repository (15 min)

The instructions here will assume you will be working on Nuvolos.

🎯 ACTION POINTS:

Accept the GitHub Classroom assignment at the link provided on Moodle. You will be taken to a page where you will have to Accept the assignment. After accepting, a personalised GitHub repository will be created for you. Grab the SSH URL from there.

🚫 For security reasons, I cannot post the invitation link here on the public website. Please click here to view the uncensored version of this page on Moodle. The invitation link is available there.
Open a terminal window in VS Code.
Navigate to /files/ and clone your assigned repository:
```
git clone <your-github-classroom-repo-url>
```
Remove the < and > symbols and replace the whole placeholder with the URL provided by GitHub Classroom.

📋 NOTE: This means that you will be working on a different GitHub repository than the one you created in the 🖥️ W03 Lecture. You are still encouraged to use your my-ds105a-notes repository for your private notes, but we will only mark your formative based on what you have on this new repository.
Navigate inside the cloned repository:
```
cd <repo-folder-name>
```
Confirm you are inside the correct directory using pwd.
Run ls to check that a README.md and .gitignore files exist.

Commit your initial setup:

git add .
git commit -m "Initial repository setup"
git push

Part 2: Create NB01-Data-Download.ipynb (30-45 min)

📊 Workflow Stage: This is the Data Collection stage. See the 3️⃣ Data Science Workflow guide to understand how this fits into the complete workflow.

Purpose of NB01: This notebook is dedicated to collecting historical temperature data from the Open-Meteo API and saving it to a JSON file. You will NOT do any analysis here - that happens in NB02.

Here are the things you’ll need to do in this notebook:

Make API requests
Work with JSON data
Save files to JSON

🎯 ACTION POINTS:

Create your first notebook (NB01-Data-Download.ipynb in VS Code)

Add a professional header (Markdown cell):

# 📝 W04 Practice: London Heatwave Analysis (1990-2025)

**LSE DS105A – Data for Data Science (2025/26)**

**Author:** [Your Name]  
**LSE Candidate Number:** [Your Candidate Number]
**Date:** [Today's Date]

**Purpose of this notebook:** [...].

Create your imports cell

Add a Python cell with ALL the imports you will need for this notebook (and only those you need).

💡 Hint: Refer to the 🖥️ W02 Lecture and 🖥️ W03 Lecture for import examples.

From here on, it is up to you to decide how to organise the rest of NB01. Feel free to create a mixture of Markdown and Python cells as you wish, as long as it is logical and easy to follow.
Collect 35 years of temperature data

Use the Open-Meteo API to collect daily maximum temperatures for London from 1990-2025.

💡 Reference Materials: You will find useful material in the slides and code shared in 🖥️ W02 Lecture, 🖥️ W03 Lecture and the 💻 W03 Lab.

You need to:
- Set up the API endpoint and parameters
- Make the request
- Handle the response
- Print confirmation of data collection
Save the data to JSON

Save your collected data to data/london_temperatures_1990_2025.json.

🤔 Decision: You can save the JSON exactly as received from the API, or you can ‘clip’ it to only include the data you need for analysis. Choose your approach and be ready to explain why.

💡 Reference: See the 💻 W03 Lab for JSON file saving examples. You will also find useful material in the slides and code shared in 🖥️ W02 Lecture, 🖥️ W03 Lecture.
Add your reflection note

Add a Markdown cell at the end of your notebook with this template:
```
💭 **Personal Reflection Note:**

[Your reflection here...]
```
In it, explain any decisions you made with regards to the API request and the JSON file saving. There is no need to explain the code itself, just focus on any decisions you might have made.
Commit your progress

Throughout this document, I make suggestions for when to commit your progress but feel free to commit more often.
```
git add .
git commit -m "Complete NB01: Collect temperature data from Open-Meteo API"
git push
```

Part 3: Create NB02-Data-Analysis.ipynb (45-60 min)

📊 Workflow Stage: This is the Data Analysis stage. You’ll use loops and conditionals to process the raw data and identify patterns (heatwaves). See the 3️⃣ Data Science Workflow guide for context.

Purpose of NB02: This notebook is dedicated to analysing the temperature data you collected in NB01. You will use pure Python (no pandas) to identify heatwaves and create summary statistics.

Here are the things you’ll need to do in this notebook:

Read JSON files
Work with dates and temperatures
Create CSV files
Use loops and conditionals

⚠️ Important:

You are NOT allowed to use pandas in this assignment!

Even if you are an advanced Python user, you must work with pure Python objects (lists, dictionaries) as you learned in the 🖥️ W02 Lecture and through your DataQuest lessons (📝 W02 Practice and 📝 W03 Practice). This builds your understanding of data structures before we introduce more advanced tools.

🎯 ACTION POINTS:

Create your second notebook (NB02-Data-Analysis.ipynb in VS Code)

Add a professional header (Markdown cell):

# 📊 W04 Practice: London Heatwave Analysis (1990-2025)

**LSE DS105A – Data for Data Science (2025/26)**

**Author:** [Your Name]  
**LSE Candidate Number:** [Your Candidate Number]
**Date:** [Today's Date]

**Purpose of this notebook:** [...].

Create your imports cell

Add a Python cell with ALL the imports you will need for this notebook (and only those you need).

Section 1: Data Loading

Create a new section in your notebook. Call it “Section 1: Data Loading”.

Use a H2 heading for it, ## Section 1: Data Loading.

Inside this section, feel free to create a mixture of Markdown and Python cells as you wish, as long as it is logical and easy to follow.
Read the JSON file

Write code to load the temperature data you collected in NB01.

💡 Reference Materials: You will find useful material in the slides and code shared in 🖥️ W02 Lecture, 🖥️ W03 Lecture and the 💻 W03 Lab.

🤔 Decision Point: You can work with the data as either:
- A dictionary of lists, or
- Two separate lists
Choose your approach and be ready to explain any risks associated with your choice and how you’ll mitigate them. There is no need to explain the code itself, just focus on any decisions you might have made.

💡 Hint: Think about what could go wrong further down the line with each approach. How will you ensure dates and temperatures stay aligned?

Add a reflection note somewhere in this section to explain your decision of data structure and any risks associated with it.
```
💭 **Personal Reflection Note:**

[Your reflection here...]
```
Commit your progress:
```
git add .
git commit -m "Complete NB02 Section 1: Data Loading with reflection notes"
git push
```

Section 2: Identifying Hot Days

Create a new section in your notebook. Call it “Section 2: Identifying Hot Days”.

Just as before, use a H2 heading and feel free to organise the content inside this section in a way that makes it easy to follow.
Create a list of hot days

Write code to create a boolean list, of the same length as the temperature data, to indicate which days had maximum temperature ≥ 28°C.

💡 Reference: This connects to what you learned in your DataQuest lessons about loops and conditionals (📝 W03 Practice) and the work you have done in the 💻 W03 Lab.

Commit your progress:

Throughout this document, I make suggestions for when to commit your progress but feel free to commit more often.
```
git add .
git commit -m "Complete NB02 Section 2: Identifying Hot Days"
git push
```

Section 3: Subset of Hot Days

Create a new section in your notebook. Call it “Section 3: Subset of Hot Days”.

Just as before, use a H2 heading and feel free to organise the content inside this section in a way that makes it easy to follow.
Create a subset of hot days only

Using for loops and if-else statements only (no pandas), create a new data structure that contains ONLY the hot days from your dataset. Make sure to maintain the date and temperature data for each hot day.

🤔 Decision Point: Choose your data structure:
- Dictionary with multiple keys
- Two separate lists
- List of dictionaries
Explain your choice and why it’s appropriate for your analysis. There is no need to explain the code itself, just focus on any decisions you might have made and why you made them.
```
💭 **Personal Reflection Note:**

[Your reflection here...]
```
Commit your progress:

Throughout this document, I make suggestions for when to commit your progress but feel free to commit more often.
```
git add .
git commit -m "Complete NB02 Section 3: Subset of Hot Days with reflection notes"
git push
```

Section 4: Recreating the CSV

Create a new section in your notebook. Call it “Section 4: Recreating the CSV”.

Just as before, use a H2 heading and feel free to organise the content inside this section in a way that makes it easy to follow.
Count heatwaves per year

Write code to use your hot days collection to count heatwaves per year. Remember: a heatwave is 3+ consecutive days ≥ 28°C.

💡 Reference: You’ll need to use loops to track consecutive hot days and identify when a heatwave occurs.
Create the summary CSV

Save your results to data/london_heatwaves_by_year_1990_2025.csv with the header year,heatwave_count and one row per year from 1990-2025.

💡 Reference: See the 💻 W03 Lab for CSV writing examples. Revisit the 📝 W01 Practice to remember how this CSV should look like.

Add a reflection note somewhere in this section to explain any challenges you faced in counting heatwaves and how you solved them. There is no need to explain the code itself, just focus on any decisions you might have made and why you made them.
```
💭 **Personal Reflection Note:**

[Your reflection here...]
```
Commit your progress:
```
git add .
git commit -m "Complete NB02 Section 4: Recreating the CSV with reflection notes"
git push
```

Section 5: AI Usage Documentation

Create a new section in your notebook. Call it “Section 5: AI Usage Documentation”.

Just as before, use a H2 heading and feel free to organise the content inside this section in a way that makes it easy to follow.
Document your AI usage

This can be as simple as adding a Markdown cell with the links to your chat logs. But if you want, you can also explain your approach to using AI to help you with this assignment.
Commit your completed analysis

Throughout this document, I make suggestions for when to commit your progress but feel free to commit more often.

git add .
git commit -m "Complete NB02 Section 5: AI Usage Documentation and finalise NB02"
git push

Part 4: Update Your README (15 min)

📊 Workflow Stage: This is the Documentation stage. Good documentation makes your project reproducible and understandable to others (including your future self!). See the 3️⃣ Data Science Workflow guide for why this matters.

🎯 ACTION POINTS:

Edit the README.md file to include:
- What this project is about
- A summary of your technical decisions and any challenges you faced
- Instructions on how someone (who is not involved in the course) should do to run your code
💡 Keep it simple: A good README explains your work clearly and concisely and makes good use of markdown formatting.
Commit your README

Throughout this document, I make suggestions for when to commit your progress but feel free to commit more often.
```
git add README.md
git commit -m "Add comprehensive README documentation"
git push
```
💡 Need help with Git? See the 4️⃣ Git & GitHub Guide for detailed Git workflow reference.

✅ Submission Checklist

Before the deadline (Thursday, 23 October 2025, 12:00 GMT), confirm:

Your repository follows the required structure
Both NB01-Data-Download.ipynb and NB02-Data-Analysis.ipynb include clear documentation and working code (if we reset and ‘Run All’ your notebooks, they should run without errors)
data/ folder contains both JSON and CSV files
london_heatwaves_by_year_1990_2025.csv has the correct format (year, heatwave_count)
README.md answers the required questions clearly
Both notebooks include reflection notes explaining key decisions
NB02 Section 5 documents AI usage (or states “None used”)
No pandas usage in either notebook
All changes are committed and pushed to GitHub

🚨 Critical: Everything must be pushed to GitHub before 12:00 on Thursday. We can only provide feedback on what exists in your repository at the deadline.

📊 How You’ll Be Assessed

Understanding the Marking Standards:

I don’t enjoy this but, unfortunately, UK Higher Education institutions are asked to be strict when grading assignments to mitigate fears over grade inflation.

Higher marks are to be awarded only for exceptional work that demonstrates both technical competence and deep engagement with the learning process. What this means for us is that excellence should come from elegant solutions within the assignment constraints, not from ‘doing more’ or over-complicating the code.

The good news is that if you’ve been attentive to the teaching materials and actively engaged with the practice exercises, it should be feasible to achieve a ‘Strong performance’ level (70-79 marks). I hope you’ve been focusing on understanding the concepts rather than chasing perfect marks!

You’ll receive a “practice mark” (not graded, for feedback only) out of 100 based on the criteria listed below.

Part A: Submission Requirements (0-40 marks)

This is essentially the same checklist above. You either meet the requirement or you don’t.

Repository Structure (8 marks): Correct folder hierarchy and file naming as specified
Notebooks Present & Functional (10 marks): Both NB01-Data-Download.ipynb and NB02-Data-Analysis.ipynb exist and run without errors when “Run All” is executed
Data Files Created (10 marks): The JSON and CSV files exist in the data/ folder and have the correct format
README Documentation (7 marks): Clear explanation of project purpose, technical decisions, and instructions for running the code
Pure Python Usage (5 marks): Only used the Python features expected in this assignment (lists, dictionaries, loops, conditionals, etc.) and no advanced programming features (we do not want to see pandas, custom functions, classes, etc.)

We can deduct a few marks if some of these things are almost there but not quite.

Part B: Quality of Thinking Documentation (0-30 marks)

This is assessed holistically (it’s not a checklist). We’re looking for evidence of authentic engagement with the problem-solving process.

What a 50% solution looks like (~15 marks): Your reflections are mostly generic but show some personal engagement and/or we find valid explanations of your choices but those do not have much depth.

What a 70% solution looks like (~21 marks): Your reflection notes show genuine personal voice with specific challenges mentioned. You explain your technical decisions concisely and with clear reasoning and demonstrate awareness of trade-offs. You document actual problems you faced and how you solved them.

What a 90% solution looks like (22+ marks): Exceptionally authentic reflection with detailed problem-solving narratives. Sophisticated understanding of technical trade-offs with clear risk mitigation strategies. If you are a coding beginner, we see evidence of how you have learned from the process and if you are an advanced coding user, we see how your prior experience has informed your choices.

We’re looking for evidence that you’ve engaged authentically with the problem-solving process, not generic responses.

Part C: Technical Excellence Within Constraints (0-30 marks)

This is assessed holistically (it’s not a checklist). We’re looking for elegant technical solutions within the assignment constraints.

Typical signs of a 50% solution (~15 marks): in summary, we don’t see evidence of you putting into practice your coding lessons from 📝 W02 Practice and 📝 W03 Practice.

Your code is not efficient or, worse, it is unnecessarily overly complex.
The notebook is not well-organised and/or is not easy to follow.
Your if/else and loops do not have a good logical flow.
Your code does not follow the coding principles taught in the lectures and labs.
There are signs that you just copy-pasted code from the internet/AI in a silly way, without thinking much about it.

Typical signs of a 70% solution (~21 marks): in summary, we see evidence that you have engaged with all the practice exercises and have been paying close attention to everything we taught you in the lectures and labs.

Your code is efficient and as simple as it can be without sacrificing the quality of the solution.
The notebook is well-organised and is easy to follow.
Your if/else and loops have a good logical flow.
It is evident that you have been paying attention to the coding principles taught in the lectures and labs.

Typical signs of a TOP solution (22+ marks): you showed excellent attention to detail and have a deep understanding of the coding principles taught in the lectures and labs without deviating from the assignment constraints (no pandas, no custom functions, etc.).

I am not able to prescribe what this might look like here. The best description is that this should be a submission that makes us go “WOW, guys! Stop what you are doing and look at what this person did in their submission here! It’s so creative and cool!”

Excellence is measured against W04 competency expectations, not advanced programming standards.

Grade Boundaries:

<40%: Really poor work with major issues (typically missing many core requirements)
40-55%: Basic implementation with significant room for improvement (typically missing some core requirements)
56-69%: Good implementation demonstrating solid understanding with small caveats and minor improvements possible
70-79%: Strong performance, demonstrating course learning effectively with authentic reflection
80+%: Fantastically good work! Sophisticated solutions and deep reflection within assignment scope

🔗 Useful Resources

📊 Essential Guides

3️⃣ Data Science Workflow: Complete workflow stages and best practices
4️⃣ Git & GitHub Guide: Version control commands and workflows

💻 Course Materials

🖥️ W03 Lecture: File I/O demonstration and API patterns
📝 W03 Practice: DataQuest loops and conditionals practice
💻 W03 Lab: Git workflows and control flow practice

🆘 Getting Help

Slack: Post questions to #help channel
Office Hours: Book via StudentHub
Self-Guided Practice: Fire up a new chat with our Custom AI assistant and type “Time for a challenge!”

Check staff availability on the ✋ Contact Hours page.

🌐 External Resources

Open-Meteo API Documentation: Weather data API reference

📢 Remember: This is practice for Mini-Project 1 (worth 20%), which will be released next week. Doing this well now makes that assessment much easier!