🧐 Comments of the W07 Summative and model solutions

2023/24 Autumn Term

Author
Published

30 Nov 2023

This page contains an analysis of your submissions for the ✏️ W07 Summative, worth 20% of your final grade.

πŸ“Š Submission statistics

Total Accepted
Assignment
Not pushed
to GitHub
(empty repo)
% of
enrolled students
who submitted
60 57 0 93.4%

Three students did not submit the assignment. I hope you are all OK!

πŸ“ Comments

Model solutions

Could this be you? I haven’t selected model solutions yet.

πŸ… Practices worth distinction

Here, I highlight the best of what I saw in your submissions. I hope you can learn from each other!

Practice 1: Showing reasonable skepticism

Part 1 wasn’t too challenging in terms of coding, but it required some investigative work to understand the dataset’s variables. You might have noticed several variables containing the string bmr, key to answering the question. This could be discovered through the codebook or by using tidyverse code like:

... %>% 
  select(contains("bmr"))

Many of you reached a satisfactory solution, such as:

... %>% 
  mutate(bmr_transition_type = democracy_bmr > bmr_new)

This approach will be accepted, and you will typically score 15/20 marks if you also answered Question 4 correctly.

Some submissions exhibited exceptional reasoning by employing healthy skepticism, questioning the data’s accuracy with thoughts like, β€œWhat if the data is wrong?”. They verified if bmr_new was indeed the new BMR by comparing it with a lagged version of democracy_bmr. Their submissions highlighted a few discrepancies in the data set and showed us beautiful and neat graph visualisations that confirmed or complemented their suspicions. Those who adopted this approach (without neglecting the other questions in Part 1) are in strong contention for a total 20/20 score in this section.

Similarly, others have questioned whether our target variable (in Parts 2 & 3) was a good indicator of female empowerment. If well-grounded in academic literature or in the analysis of the data and well-argued, this criticism of our own definitions would also be a valid and impressive approach.

Practice 2: Proving arguments and conclusions with a good plot

Let me underline this once more. Simply adding a plot to show extra effort doesn’t automatically lead to a distinction. The submissions that really stood out were those where the plots effectively complemented or confirmed the author’s assertions.

I intend to update this page with examples from outstanding submissions, pending approval from their authors.

It’s worth noting that while we didn’t explicitly ask for plots in the assignment, creating some form of results summary, like a confusion matrix or a ROC curve, is almost inevitable and expected. This aligns with how we’ve been guiding you in interpreting classification models.

If your work includes clear, well-designed plots that either confirm or illustrate your findings very nicely, you can expect to be rewarded for your thorough approach πŸ™‚.

Practice 3: Doing some extra research

Some impressed us by referencing other suitable academic papers or official reports from well-known organisations. In cases where this was well used, it helped to strengthen the author’s arguments and conclusions and made for a very impressive submission.

πŸ“Œ Common Mistakes

CM1: Ignoring the Time Series Nature of the Data

The dataset is a time series, similar to the UK House Prices dataset we used at the start of the course. Therefore, you shouldn’t use future data to build a model predicting the past. This aligns with our consistent approach throughout this course, particularly in scenarios involving forecasting.

However, we understand the challenge posed by the numerous variables available for your models. Therefore, we might be more forgiving if your data usage wasn’t entirely temporally accurate. Our decision will be context-dependent. That is if it’s evident to markers that you worked on optimising your model diligently, focused on interpreting it via the lens of the metric you selected and how well your model generalised. After all, these were some of the assignment’s main objectives.

CM2: Formatting and Excessive Print Statements

Many submissions still contained numerous print statements. These are not only distracting but also complicate our assessment of your work. Please ensure they are removed before submission!

Additionally, some of you neglected to use the self-contained option in your notebooks, leading to reports that didn’t quite look right. Since this issue was highlighted in feedback for the previous assignment, we will be more rigorous in penalising this oversight.

CM3: Superficial Analysis of Classification Models

Several submissions only offered descriptions of the results from confusion matrices or ROC curves. This is not enough to demonstrate your understanding of a model’s performance – it’s something we can discern directly from the plots. You should delve deeper, interpreting and contextualising the results with the problem you are trying to solve via machine learning. Reflect on what the results suggest. Is the model effective/inadequate? How do these findings relate to the issue we’re addressing? What are the implications of these results?

We expect these interpretations to be brief, but they should be present. Otherwise, you risk losing marks.