📝 W06 Summative (20%)

2024/25 Autumn Term

Author
Published

22 October 2024

Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

This is your first graded summative assignment, worth 20% of your final grade in this course. You will have a lot of freedom in how you approach this assignment, but you must use the skills and the coding philosophy emphasised in the teaching materials and exercises of the entirety of Weeks 01 to 05.

⏲️ Due Date:

⬆️ Submission:

How to get an extension

👉 Fill out the extension request form and 📧 before the deadline.

💡 On top of the usual reasons that merit an extension at LSE, I am willing to be lenient on this first submission and allow extensions for those who are finding this coursework difficult, provided that you’ve been making use of the support sessions available to you.

This means you can also send a extension request before the deadline if:

  1. you are new to programming in general OR
  2. you have already spent an unresonable amount of time on this summative without much progress OR
  3. you joined this course late, around Week 03
When sending the form, please mention the office hours or drop-in sessions you’ve attended or mention how you’ve been posting to #help on Slack. I will check our logs and grant extensions accordingly.

📚 Preparation

Read carefully. This summative builds on the skills you’ve developed in the past weeks.

  1. Attempt the 📝 W04 Formative Exercise first. This will give you a more solid grasp of Git/GitHub.

    Even if the W04 deadline has passed, you can still show us your work and ask questions on Slack or in support sessions. You just won’t receive the structured individual feedback like those who submitted on time.

  2. Find the GitHub assignment link:

    Click here to go to the Slack post in the #announcements channel where you will find the link to the GitHub assignment or click here to go to the Moodle version of this instructions where the link is available.

    We don’t share the link publicly. It is a private repository for enrolled students only.

  3. Accept the assignment: Click the link, sign in to GitHub, and click the button that says .

  4. Access your private repository: You’ll be redirected to a new private repository named ds105a-2024-w06-summative-<yourusername>.

    The repository will be totally empty, expect perhaps by a README.md file that is constantly edited by the GitHub bot. It is up to you to create the structure of the repository.

  5. Clone your repository: Clone your repository to your computer (or Nuvolos). A folder ds105a-2024-w06-summative-<yourusername> will appear on your computer. We will refer to this folder as <github-repo-folder> for the rest of this document.

  6. (Optional) Keep an AI chat window for this assignment.

    If you like to use OpenAI’s ChatGPT or Google’s Gemini for coding assistance, it might be a good idea to create a separate chat window just for this assignment 2. Then, when you’re done, export the chatlog and add the link to it in your repository.

Why, though?
  • We’ve been finding from the GENIAL research that AI chatbots can sometimes ‘hijack’ our learning process without us ever realising it. There is a risk that we miss out on key learning goals in an attempt to be more productive.

  • AI chatbots often suggest code solutions that work but are overly complex, odd, or against our ‘coding philosophy’. While this is not necessarily an issue, there are many ways to achieve the same thing in Python, it’s likely that you will struggle a lot in the future as complexity increases.

👉 If we suspect this happened to you, more than simply describing what you did wrong, we can point to where the AI chatbot might have led you astray so you can improve for next time.

📝 Instructions

Question

You’ve been commissioned by the fictitiously famous Office of Quirky Inquiries (OQI) to answer the old-age question 3:

“Is London really as rainy as the movies make it out to be?”

📌 DECISIONS, DECISIONS, DECISIONS

The OQI did not make any great demands other than that you must use OpenMeteo’s free API to answer this question. Therefore, it is up to you to creatively decide on the following:

  • The time period to consider.
  • The number and selection of other cities or regions to compare with London.
  • How you want to define and measure ‘raininess.’
  • Which variables from the OpenMeteo API to use. (Feel free to use multiple variables or create your own ‘raininess’ index.)

You can also choose to add static data sources to complement your analysis. For instance, you can include the world_cities.csv file we’ve used in the exercises of Weeks 03 and 04 of this course to make it easier to obtain the latitude and longitude of other cities.

Requirements

What we want to see in your submission:

  • An organised GitHub repository with a clear README.md file that not only briefly explains the project and how to run the code in your repository, but also prepares the reader for what to expect.

  • Organised Jupyter Notebook(s) with a good mix of markdown and code cells showing your data collection, exploratory data analysis, methodology explanation, and results.

  • Nice and neat Python code that is easy to follow, with comments where necessary.

  • At least two visualisations that help answer the question. The visualisations must be created with the lets-plot library.

  • Use of the pandas library as much as possible to prepare the data for the visualisations, although you can use your knowledge of lists and dictionaries if you find that easier (keep it tidy with functions though!).

✔️ How we will grade your work

I don’t enjoy this but, unfortunately, I must be strict when grading summative assignments to mitigate fears over grade inflation. Higher marks are reserved for those who demonstrate exceptional talent or effort, but in a way that aligns with the learning objectives and coding philosophy of this course. (Simply adding more analysis or complicating the code is not sufficient!) The good news is that, if you have been attentive to the teaching materials and actively engaged with the exercises and asked clarifying questions whenever you got stuck, it should still be feasible to achieve a ‘Good!’ level (70-75 marks).

Here is a rough rubric of how we will grade your work.

🧐 Documentation (0-30 marks)
Marks awarded Level Description
<10
marks
Poor If it matches one or more of the following:
- The README.md file is missing or incomplete
- The README.md file is not clear on how to run the code
- The README.md file does not prepare the reader for what to expect of this project
- The code and/or Jupyter Notebook(s) are not well-organised or were not provided
- The Jupyter Notebook(s) are not clear on what is being done
- The Jupyter Notebook(s) are not clear on the results
- The Jupyter Notebook(s) are not clear on the methodology
10-19
marks
Weak/Fair If the README.md file is clear on how to run the code and what to expect of this project, but we see some of the following:
- There is not a lot of justification for the decisions made in the project
- The code and/or Jupyter Notebook(s) are not well-organised or were not provided
- The Jupyter Notebook(s) consist mostly of Python cells with little markdown
- The Jupyter Notebook(s) are not clear enough on the methodology
- There is not a lot of interpretation of the results in the Jupyter Notebook(s)
~23
marks
Good! Great use of Markdown (not overused either) in the Jupyter Notebook(s) and the README.md file. The README.md file is clear on how to run the code and what to expect of this project. The Jupyter Notebook(s) are well-organised and clear on what is being done, the results, and the methodology. There is a good amount of justification for the decisions made in the project.
25+
marks
🏆 WOW The repository and notebooks look so professional! It looks like one of the software repos published in the Journal of Open Source Software!
📥 Data Collection Code (0-30 marks)
Marks awarded Level Description
<10
marks
Poor If it matches one or more of the following:
- No data was collected
- Data is collected but it only exists in the notebook (it was never saved to any file for reuse)
- The code is overly complex for no good reason
- The data collected is unrelated to the question.
- The code is full of errors.
<15
marks
Weak If the data collected was saved to a file(s) but we see some of the following:
- The data collected does not relate to the justification provided in the project
- The directory structure is not great. There is no clear separation of data files and code files
- The code is not split into smaller, reusable functions
- There are several errors in the code
- Some parts of the code were a bit difficult to follow or used things we haven’t covered in the course, and there was no justification or for it.
<18
marks
Fair If markers were indecided between Weak and Good, i.e., there were some good stuff but there is a lot of room for improvement.
+-23
marks
Good! If we see all of the good stuff:
- The data collected is saved to a file or files
- The variables collected are in line with the justification provided in the project
- The directory structure is good. There is a clear separation of data files and code files
- The code is split into smaller, reusable functions
- There are few errors in the code
- The code is easy to follow and adopts the coding philosophy we’ve covered in the course.
- Your git commit history and accompanying messages clearly reveal how you changed your mind and made improvements to this code
25+
marks
🏆 WOW If your data collection code exceeded our expectations (in a positive way).
This includes but is not limited to:
- Moving your functions to a .py file and importing them from within the Jupyter Notebook in a truly professional way!
- Creating a Python script (instead of a data collection notebook) to collect all the data which can be run from the Terminal. The script uses the argparse library in Python to process the parameters a user has informed. And, of course, the README.md explains how to run the script beautifully yet concisely. We couldn’t be more impressed!
📊 Exploratory Data Analysis (0-40 marks)

Feel free to use knowledge of statistics you might have acquired in other courses, but this, per se won’t get you a higher mark. We are looking for the application of the skills and coding philosophy we’ve covered in this course.

Marks awarded Level Description
<15
marks
Poor If it matches one or more of the following:
- Effectively no exploratory data analysis was done
- The code is full of errors.
- There are no visualisations, or they were not created with the lets-plot library (were created with matplotlib, seaborn, etc.)
- The visualisations are not clear or do not even remotely help answer the question.
- What the X and Y axes represent in the visualisations is unclear.
- The interpretation of the visualisations is very misguided.
- The pre-processing of the data for the visualisations was done in an overly complex way for no good reason.
- There was no attempt to use the pandas library to prepare the data for the visualisations.
<20
marks
Weak If the exploratory data analysis was done but we see some of the following:
- The code contains several errors that weren’t addressed at the time of submission
- The visualisations are not properly labelled, or the axes are not well-formatted
- The visualisations lack a good title
- The choice of visualisations is not the best to answer the question
- The interpretation of the visualisations is inaccurate
- The pre-processing of the data for the visualisations was done in a way that is overly complex for no good reason.
- The use of pandas was very minimal or not used efficiently.
- The visualisations were not created with the lets-plot library.
- If using functions, they did not work well with pandas or were used in a way that made the final code harder to follow.
<25
marks
Fair If markers were undecided between Weak and Good, i.e., there was some good stuff, but there is a lot of room for improvement.
+-30
marks
Good! If we see all of the good stuff:
- The exploratory data analysis was done well
- The code is error-free
- The visualisations are properly labelled and the axes are well-formatted
- The visualisations have a good title (and subtitle)
- The choice of visualisations is the best to answer the question
- Even if we can’t provide a conclusive Yes or No answer to the question, the interpretation of the visualisations is accurate.
- The pre-processing of the data for the visualisations was done in a way that is efficient and easy to follow.
- We observe good use of the group-by -> apply -> combine strategy in pandas to prepare the data for the visualisations.
- The visualisations were created using the lets-plot library.
- If using functions, they worked well with pandas and were used in a way that made the final code easier to follow.
- Your git commit history and accompanying messages clearly reveal how your project evolved as you made improvements to the EDA code
35+
marks
🏆 WOW If your exploratory data analysis exceeded our expectations (in a positive way).
This includes but is not limited to:
- The visualisations are so beautiful and insightful that we can’t help but be impressed! Impressive use of colour schemes and theme customisation!
- Honestly, your visualisations are so cool it could be published by The Pudding people!
- Every time you used pandas, you used method chaining. You are a true data scientist! You will love to learn about Polars and Julia and R and SQL!

Footnotes

  1. Not sure what I mean by git push and GitHub repository?! Check out the 📝 W04 Formative Exercise and don’t miss the 👨🏻‍🏫 Week 04 Lecture↩︎

  2. I don’t think you can share logs of tools like Google’s NotebookLM, but if you want, you can add a screenshot or describe the interactions somewhere if you think it will help us understand your thought process better.↩︎

  3. A secret source, who asked to remain anonymous, told us that OQI are not that interested in the actual answer, but rather the process you took to answer it.↩︎