π§ Comments of the W07 Summative and model solutions
2023/24 Autumn Term
This page contains an analysis of your submissions for the βοΈ W07 Summative, worth 20% of your final grade.
π Submission statistics
Total | Accepted Assignment |
Not pushed to GitHub (empty repo) |
% of enrolled students who submitted |
---|---|---|---|
60 | 57 | 0 | 93.4% |
Three students did not submit the assignment. I hope you are all OK!
π Practices worth distinction
Here, I highlight the best of what I saw in your submissions. I hope you can learn from each other!
Practice 1: Showing reasonable skepticism
Part 1 wasnβt too challenging in terms of coding, but it required some investigative work to understand the datasetβs variables. You might have noticed several variables containing the string bmr
, key to answering the question. This could be discovered through the codebook or by using tidyverse code like:
%>%
... select(contains("bmr"))
Many of you reached a satisfactory solution, such as:
%>%
... mutate(bmr_transition_type = democracy_bmr > bmr_new)
This approach will be accepted, and you will typically score 15/20 marks if you also answered Question 4 correctly.
Some submissions exhibited exceptional reasoning by employing healthy skepticism, questioning the dataβs accuracy with thoughts like, βWhat if the data is wrong?β. They verified if bmr_new
was indeed the new BMR by comparing it with a lagged version of democracy_bmr
. Their submissions highlighted a few discrepancies in the data set and showed us beautiful and neat graph visualisations that confirmed or complemented their suspicions. Those who adopted this approach (without neglecting the other questions in Part 1) are in strong contention for a total 20/20 score in this section.
Similarly, others have questioned whether our target variable (in Parts 2 & 3) was a good indicator of female empowerment. If well-grounded in academic literature or in the analysis of the data and well-argued, this criticism of our own definitions would also be a valid and impressive approach.
Practice 2: Proving arguments and conclusions with a good plot
Let me underline this once more. Simply adding a plot to show extra effort doesnβt automatically lead to a distinction. The submissions that really stood out were those where the plots effectively complemented or confirmed the authorβs assertions.
I intend to update this page with examples from outstanding submissions, pending approval from their authors.
Itβs worth noting that while we didnβt explicitly ask for plots in the assignment, creating some form of results summary, like a confusion matrix or a ROC curve, is almost inevitable and expected. This aligns with how weβve been guiding you in interpreting classification models.
If your work includes clear, well-designed plots that either confirm or illustrate your findings very nicely, you can expect to be rewarded for your thorough approach π.
Practice 3: Doing some extra research
Some impressed us by referencing other suitable academic papers or official reports from well-known organisations. In cases where this was well used, it helped to strengthen the authorβs arguments and conclusions and made for a very impressive submission.
π Common Mistakes
CM1: Ignoring the Time Series Nature of the Data
The dataset is a time series, similar to the UK House Prices dataset we used at the start of the course. Therefore, you shouldnβt use future data to build a model predicting the past. This aligns with our consistent approach throughout this course, particularly in scenarios involving forecasting.
However, we understand the challenge posed by the numerous variables available for your models. Therefore, we might be more forgiving if your data usage wasnβt entirely temporally accurate. Our decision will be context-dependent. That is if itβs evident to markers that you worked on optimising your model diligently, focused on interpreting it via the lens of the metric you selected and how well your model generalised. After all, these were some of the assignmentβs main objectives.
CM2: Formatting and Excessive Print Statements
Many submissions still contained numerous print statements. These are not only distracting but also complicate our assessment of your work. Please ensure they are removed before submission!
Additionally, some of you neglected to use the self-contained
option in your notebooks, leading to reports that didnβt quite look right. Since this issue was highlighted in feedback for the previous assignment, we will be more rigorous in penalising this oversight.
CM3: Superficial Analysis of Classification Models
Several submissions only offered descriptions of the results from confusion matrices or ROC curves. This is not enough to demonstrate your understanding of a modelβs performance β itβs something we can discern directly from the plots. You should delve deeper, interpreting and contextualising the results with the problem you are trying to solve via machine learning. Reflect on what the results suggest. Is the model effective/inadequate? How do these findings relate to the issue weβre addressing? What are the implications of these results?
We expect these interpretations to be brief, but they should be present. Otherwise, you risk losing marks.
π Comments
Model solutions
Could this be you? I havenβt selected model solutions yet.