✍️ Midterm Project (25%)

ME204 (2025) – Data Engineering for the Social World

Author

Last updated

17 July 2025

This is your first graded assessment, worth 25% of your final grade for ME204. It is an individual project where you will have creative freedom, but you must apply the skills and principles we have covered in Week 1.

⏲️ Due Date: Tuesday, 22 July 2025 at 8 pm UK time.
⬆️ Submission: git push your work to your private GitHub Classroom repository before the deadline. We will mark the latest version pushed to your repo. GitHub automatically rejects pushes after the deadline, so do not leave it to the last minute.

Click on the button below to accept the assignment:

⚠️ Assignment link only available on Moodle.

If you are seeing this from the public website of ME204, you won’t be able to find the link to accept the assignment. Please visit the Moodle version of this page instead to find the link.

📚 The Commission

You’ve been commissioned by the fictitiously famous Office of Quirky Inquiries (OQI) to answer an age-old question. A secret source tells us that the OQI is not that interested in the actual answer, but rather in the process you took to arrive at it.

“Is London really as rainy as the movies make it out to be?”

📌 DECISIONS, DECISIONS, DECISIONS
The OQI requires you to use the OpenMeteo API but leaves all the decisions on how to go about answering the question up to you. You must creatively decide on the following:
The time period to consider.
The other cities or regions to compare with London.
How you define and measure ‘raininess’.
Which variables from the OpenMeteo API to use.

📝 Project Requirements

Your submission must be a well-organised GitHub repository containing two Jupyter Notebooks and a detailed README file.

1. `NB01 - Data Collection.ipynb`

This notebook should focus exclusively on collecting and saving your data.

Collect Data: Use the OpenMeteo API to fetch the data you need for your chosen cities and time period.
Volume of Data: Only collect daily data, do not collect hourly data as otherwise you will hit the limits of the API faster.

☣️ IMPORTANT: If one single person sends TOO MANY REQUESTS to the OpenMeteo API on Nuvolos, the API will limit EVERYONE ELSE who is using the same API from Nuvolos for the rest of the day. So do not overdo it.

Justify Your Choices: In a Markdown cell, concisely explain why you chose your specific cities, time period, and variables. This justification is crucial. (Max 250 words)
Save the Data: At the end of the notebook, save your final, combined dataset as a single .csv file inside a data/ folder (e.g., data/rain_data.csv).

2. `NB02 - Data Analysis.ipynb`

This notebook is for your analysis and visualisation.

Load Data: Start by loading the .csv file you created in NB01. Do not use the OpenMeteo API to load the data again.
Pre-process & Explore: Clean and prepare your data. You can create as many intermediary charts, tables, and plots as you need to explore the data and refine your thinking. These visualisations will not be graded directly as part of the ‘data visualisation’ criterion but will count towards your data exploration score.

☣️ IMPORTANT: While you should start by using loops and you can even complete the whole analysis that way, after Week 02 Day 01’s lecture, you must rewrite your code to use vectorised operations. Do not use any loops!

Produce the Final Chart: The very last section of your notebook must be the code that generates your single, final visualisation. This chart should:
- Use matplotlib or seaborn.
- Have a clear, narrative title and subtitle that helps interpret the findings.
- Be designed to answer the core question on its own, without requiring extra text. The plot alone must be able to stand on its own.

💡 TIP: Think Comparatively: The question asks whether London is “really that rainy” could be handled as a comparison between London and other cities.

Whilst you could focus solely on London’s patterns over time, the strongest analyses will compare London against other cities or regions. This doesn’t mean you need dozens of locations - even comparing London to 2-3 strategically chosen cities (perhaps a famously dry place like Madrid, a notoriously wet place like Bergen, and a “normal” European city) can create a compelling narrative.

Remember, you’re testing whether London’s reputation as a rainy city holds up.

3. `README.md`

This is the front page of your project. It must:

Briefly explain the project’s goal.
Summarise your methodology (your definition of ‘raininess’, cities chosen, etc.).
Embed the final visualisation from NB02.
Conclude with the line: “The result is in the plot:” right before the embedded image.

🤖 Use of Generative AI

This course adopts LSE’s “Position 3: Full authorised use of generative AI.” You are allowed to use AI tools, but to do so responsibly as a way to support your learning, not replace it. Remember the discussions we had in class about the limitations of GenAI tools.

Process over Output: We grade your work based on the evidence of your learning and your thought process. Simply generating a working solution with AI that doesn’t align with the methods and principles taught in class is likely to score poorly, as we can smell uncritical use of AI when reading your code/report.

✔️ Grading Rubric

Your project is worth 25% of your final grade. We will assess it based on the following criteria.

🧐 Repository & Documentation (30 marks)

Marks	Level	Description
<15	Poor / Fair	- `README.md` is missing or fails to explain the project and methodology. - Final plot is not embedded in the `README.md`. - Repository is disorganised (e.g., no `data/` folder). - Notebooks lack clear structure or comments. - Git commit history is minimal or unhelpful.
~23	Good!	- `README.md` clearly explains the project, methodology, and correctly embeds the final plot. - The repository is well-organised with a logical structure. - Git commit history shows regular commits with meaningful messages. - Notebooks are clean and well-commented with clear Markdown explanations.
25+	🏆 WOW	- The `README.md` is exceptionally clear, professional, and provides a perfect entry point to the project. - The project structure could serve as a template for others. - Git history demonstrates thoughtful development process with descriptive commit messages.

📥 Data Collection & Justification (35 marks)

Marks	Level	Description
<18	Poor / Fair	- Code in `NB01` fails to run or collect the necessary data. - The justification for data choices is missing, unclear, or exceeds the word count. - The final data is not saved correctly to a `.csv` file. - Data collection is inefficient or shows poor understanding of API usage.
~27	Good!	- `NB01` contains clean, working code to fetch data from the API. - The justification is concise, clear, and within the word limit, showing thoughtful consideration of choices. - The code is well-structured (e.g., uses functions appropriately). - Data is correctly saved to a `.csv` file in the `data/` folder. - Shows evidence of comparative thinking in data selection.
30+	🏆 WOW	- The data collection script is notably efficient, robust, or demonstrates advanced understanding. - The justification for data choices is particularly insightful and well-argued. - Code demonstrates excellent practices (error handling, clear variable names, logical flow).

📊 Data Analysis & Visualisation (35 marks)

Marks	Level	Description
<18	Poor / Fair	- The final chart in `NB02` is missing, unclear, or does not use `matplotlib`/`seaborn`. - The plot fails to address the project question. - Title and labels are missing or uninformative. - The data pre-processing contains significant errors or uses inefficient loops throughout. - Shows limited understanding of pandas operations.
~27	Good!	- The notebook shows a clear data pre-processing workflow using vectorised pandas operations. - Code demonstrates understanding of efficient data manipulation (no unnecessary loops). - The final chart is well-designed, clear, and effectively communicates a finding related to the core question. - The plot has a strong narrative title and subtitle that helps interpret the findings. - The choice of plot type is appropriate for the data and the question.
30+	🏆 WOW	- The final visualisation is exceptionally insightful and well-executed, demonstrating strong command of `matplotlib` or `seaborn` customisation. - Code shows sophisticated use of vectorised operations and pandas best practices. - The plot tells a compelling story on its own and demonstrates creative analytical thinking. - Analysis goes beyond basic comparisons to reveal nuanced insights about London’s rainfall patterns.