Step 4: The final part of the process

2023/24 Autumn Term

It is sad, but all good things must come to an end.

Author

Dr. Jon Cardoso-Silva

What we want to see in the final project.

What we want to see

A web page that tells the story of your project. The text on your web page must have a maximum of 4000 words.
- What made you curious about this kind of data in the first place?
- How did you gather the data?
- What is in the data? What does it look like in general?
- What did you find out about the data? (Exploratory Data Analysis)
A GitHub repository that contains your source code.
- We want to see that you used the tools and best practices we taught you in class (all eleven weeks of the course).
- We will check the repo history to ensure all team members have contributed with commits to the project (it does not have to be an equal contribution).

How we will mark your final project

While the final project is worth 25% of your final grade, you will be marked on a 0-100 scale.

When you submit your final project, make sure your web page addresses the questions/requirements below:

Source	Criteria	Marks	Description
Webpage	Motivation	5	- The webpage explains what made the group curious about this data.
Webpage	Data	5	- The webpage succinctly lists the data sources and the data collection challenges
Webpage	Exploratory Data Analysis (EDA)	10	- The webpage paints a vivid picture of the data (things like: the number of data points, what are the different data types and the most relevant columns, summaries and distributions, etc.)
Webpage	Visualisation	10	- The plots look really nice - All labels are clear and visible - All variables are clearly identified. - The plots and tables paint a vivid picture of what the data looks like. - The group used ggplot (R) or plotnine (python) to generate the plots
Webpage	Storytelling	15	- The text is engaging and clear. - There is no fluff - The group described relevant technical steps without too many details. - There was a nice conclusion.

Source code	Organisation	10	- The source code is available in a group’s GitHub repository. The code is replicable. - There is a good structure of files and directories
Source code	Collaboration	5	- There is a list of everyone’s contributions to the project somewhere in the project’s webpage or README file. - All members contributed with at least one commit to the group’s GitHub repository. Note: we do not expect all group members to do the same thing; each person could have a different contribution. For example, one person could focus more on data collection while another takes care of the visualisations, and the other member could focus more on documentation.
Source code	Data cleaning	20	- We see a good use of pandas (python) or tidyverse (R) to clean up data. - Data types of the variables are consistent and make sense. - Missing values were identified and dealt with.
Source code	Data wrangling	20	- We see evidence of good use of pandas and/or tidyverse to filter, merge, reshape and pivot your data as needed for the analysis/plots.