📝 Coursework re-sits

2024/25 Summer Term

Author

Dr. Ghita Berrada and Dr Stuart Bramwell

Published

21 May 2025

⏲️ Due Date: 10 June at 5pm London time

If you update your files on GitHub after this date without an authorised extension, you will receive a late submission penalty.

Did you have an extenuating circumstance and need an extension? Send an e-mail to 📧

⚖️ Assignment Weight:

This assignment is worth 60% of your final grade in this course.

60%

Do you know your CANDIDATE NUMBER? You will need it.

“Your candidate number is a unique five digit number that ensures that your work is marked anonymously. It is different to your student number and will change every year. Candidate numbers can be accessed using LSE for You.”

Source: LSE

📝 Instructions

👉 Read it carefully, as some details might change from one assignment to another.

Go to our (i.e the DS202W’s) Slack workspace’s #resits channel to find a GitHub Classroom link entitled 📝 Coursework resits. Do not share this link with anyone outside this course!
Click on the link, sign in to GitHub, and then click on the green Accept this assignment button.
You will be redirected to a new private repository created just for you. The repository will be named ds202a_w-2025-resits-coursework-yourusername, where yourusername is your GitHub username. The repository will be private and will contain a README.md file with a copy of these instructions.
Recall what is your LSE CANDIDATE NUMBER. You will need it in the next step.
Create a <CANDIDATE_NUMBER>.qmd file with your answers, replacing the text <CANDIDATE_NUMBER> with your actual LSE number.

For example, if your candidate number is 12345, then your file should be named 12345.qmd.
Then, replace whatever is between the --- lines at the top of your newly created .qmd file with the following:
```
---
title: "DS202A/W - Coursework re-sits"
author: <CANDIDATE_NUMBER>
output: html
self-contained: true
---
```
Once again, replace the text <CANDIDATE_NUMBER> with your actual LSE CANDIDATE NUMBER. For example, if your candidate number is 12345, then your .qmd file should start with:
```
---
title: "DS202A/W - Coursework re-sits"
author: 12345
output: html
self-contained: true
---
```
Fill out the .qmd file with your answers. Make sure you provide a nicely formatted notebook.
- Use headers and code chunks to keep your work organised. This will make it easier for us to grade your work. Learn more about the basics of markdown formatting here.
Once done, click on the Render button at the top of the .qmd file. This will create an .html file with the same name as your .qmd file. For example, if your .qmd file is named 12345.qmd, then the .html file will be named 12345.html.
- If you added any code, ensure your .qmd code is reproducible. If we were to restart R and RStudio and run your notebook, it should run without errors, and we should get the same results as you did.
- If you choose to add code, please ensure your code confirms, reinforces, or complements your answers. Adding code just for the sake of it will not help you get a higher grade.
Push both files to your GitHub repository. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment. Not sure how to use Git on your computer? You can always add the files via the GitHub web interface.
Read the section How to get help and collaborate with others at the end of this document.

“What do I submit?”

You will submit two files:

A Quarto markdown file with the following naming convention: <CANDIDATE_NUMBER>.qmd, where <CANDIDATE_NUMBER> is your candidate number. For example, if your candidate number is 12345, then your file should be named 12345.qmd.
A self-contained HTML file render of the Quarto markdown file.

You don’t need to click to submit anything. Your assignment will be automatically submitted when you commit AND push your changes to GitHub. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment. Not sure how to use Git on your computer? You can always add the files via the GitHub web interface.

🗄️ Get the data

What data will you be using?

You will be using three distinct datasets for this exam.

Part 1

Your dataset comes from the European Social Survey(ESS) round 8. The ESS was created for the purpose of better understanding social attitudes across Europe. The dataset in question consists of respondents from the United Kingdom, who were surveyed in 2016 following the Brexit referendum, where 52% of voters voted to leave the European Union. We have trimmed the data set to include the following variables

vtleave: Whether or not a participant voted to leave the European Union.
female: Whether a participant identifies as female (reference category = male).
eduuni: Whether or not a participant has a university education.
prtvukip: Whether or not a participant voted for UKIP (the UK’s leading Eurosceptic party) in the last election.
atchctr: 0-10 attachment rating to the United Kingdom.
atcherp: 0-10 attachment rating to Europe.
imbgeco: 0-10 rating of how strongly a participant believes that immigration benefits the national economy.
imueclt: 0-10 rating of how strongly a participant believes that immigration benefits the national culture.
lrscale: 0-10 left- (0) to right- (10) wing scale.
hinctnta: Houshold income scale (categorical).

Preparation

Download the data by clicking on the button below.

Part 2

In this part, you will be relying on Spotify data that looks at 1990s hits. While the data set contains information on duration, time, and key signatures, please disregard these features and, instead, focus on features starting with danceability up to and including valence.

Preparation

Click on the button below to download the dataset:

Part 3

In this part, you will be relying on MHMisinfo. This dataset contains the text of mental health information videos, along with an expert judgement on whether or not the video in question contains misinformation.

Preparation

Click on the button below to download the dataset:

📋 Your Tasks

What do we actually want from you?

Context

While we provide data, we will not specify the insights we seek in some questions. Instead, we will task you with proposing your approach to the data/question. This mirrors real-world scenarios in data science and academic research, where you are often given a dataset and asked to derive insights or address a problem.

💡 Remember: if you decide to write R code, please ensure your code confirms, reinforces, or complements your answers and that it aligns with the style of code we practiced throughout the course. Adding code just for the sake of it will not help you get a higher grade.

Always focus on the quality of your explanations/justification of your modeling choices over the quantity of code: code by itself is never enough!

Part 1: Gaining insights about Brexit… (35 marks)

Your dataset, in this part, is the Brexit dataset

What factor contributed most to UK citizens voting for Brexit? Select and fine-tune a supervised learning algorithm (other than logistic regression). Does your model do well at predicting Brexiters / Remainers?

Part 2: Hits of the 1990s… (35 marks)

Your dataset, in this part, is the 1990s hits dataset

How can we identify 1990s pop hits that (a) typify the decade or (b) stand out as unusual or distinctive based on their musical characteristics?

Part 2: Mental health misinformation… (30 marks)

Your dataset, in this part, is the MHMisinfo dataset

In what ways do the themes in mental health advice videos differ when the videos contain misinformation compared to when they do not?

How to get help and collaborate with others

🙋 What if I am confused?

This is a test. Certain questions are intentionally open-ended and somewhat vague. Part of the assignment involves deciphering what we want from you.
We will assess your ability to identify which of the concepts you learned in class are relevant to a given problem at hand and how to apply them to solve it. Strive to achieve that neat balance of conciseness and completeness.
However, if you feel that a question is too ambiguous, please send an e-mail or a private Slack message to . If we deem your question valid, we will post a clarification on the public Slack channel. If you don’t get a response, assume that the question is not ambiguous and you should proceed with your best judgement.

👯 Can I get help from others?

You are allowed to discuss the assignment with others, work alongside each other, and help each other. However, you cannot share or copy code from others.
You are also allowed to use the internet, refer to course materials, and even Generative AI tools when working on your answers. Again, try to aim for originality, do not let these resources dictate your answers.
Consult the 🤖 Our Generative AI policy to see examples of how to report the use of Generative AI tools in your work.

🤖 Using AI help?

You can use Generative AI tools such as ChatGPT when doing this research and search online for help. If you use it, however minimal use you made, you are asked to report the AI tool you used and add an extra section to your notebook to explain how much you used it.

Note that while these tools can be helpful, they tend to generate responses that sound convincing but are not necessarily correct. Another problem is that they tend to create formulaic and repetitive responses, thus limiting your chances of getting a high mark. When it comes to coding, these tools tend to generate code that is not very efficient or old and does not follow the principles we teach in this course.

To see examples of how to report the use of AI tools, see 🤖 Our Generative AI policy.