๐ Group project
2024/25 Winter Term
- You should only provide code if it adds anything to your storytelling: make sure your code confirms, reinforces, or complements your storytelling. Adding code just for the sake of it will not help you get a higher grade.
- You should prioritize methods seen in the course. But, if they are not suitable for the problem at hand (i.e dataset you have been given/problem youโre trying to solve), you are obviously allowed to use methods not seen in the course. If you use any method not seen in the course, justify why youโre using it and explain how it works: we simply need to confirm that you know what youโre doing.
- Youโre writing a technical report for a scientific audience. Your main goal is to convince that audience that your analysis methods are sound and that you derive solid insights from your analysis.
- Be mindful of the balance between detail (do you need everything in the main text? Do you need all the code blocks to be visible?) and explanation/storytelling. The message youโre trying to convey needs to be clear to your audience.
โฒ๏ธ Due Date: Tuesday, May 13th, 5pm
If you update your files on GitHub after this date without an authorised extension, you will receive a late submission penalty.
Did you have an extenuating circumstance and need an extension? Send an e-mail to ๐ง
โ๏ธ Assignment Weight:
This assignment is worth 40% of your final grade in this course.
40%
๐ Instructions
๐ Read it carefully, as some details might change from one assignment to another.
Step 1 on Day 1 (April 28th): Choice of datasets/research questions and group formation
- Get acquainted with the dataset/research question pairs and rank them by order of preference by 5pm on April 28th in a document shared on Slackโs
#announcements
channel
The research questions for the group projects are, as follows:
Project | Dataset | Research Question | Notes |
---|---|---|---|
1 |
Varieties of Democracy |
Do democracies have different regime support coalitions than autocracies? |
Potential features to look at: v2regsupgroups_0 -v2regsupgroups_13
|
2 |
Sambanis data imputed by Muchlinski et al |
How best can we predict civil war? Choose three (and only three) algorithms to compare. |
|
3 |
World Values Survey, Wave 7 |
What factors shape attitudes towards economic redistribution? |
Potential features of interest: Q106 -Q108
|
4 |
Bank of England Inflation Attitudes Survey data + Bank of England/NMG household survey data |
How does the perception of inflation influence spending behaviour/patterns? |
|
5 |
World Bank World Development Indicators (WDI) + World Bank Infant Mortality Data |
What factors shape variation in infant mortality? |
|
6 |
European Social Survey, Wave 8 |
Is there variation in support regarding government assistance for different social groups? |
Potential features of interest: gvslvol , gvslvue and gvcldcr
|
7 |
European Social Survey, Wave 11 / The PopuList |
Are economic conditions primarily responsible for the support of radical right-wing populist parties? |
|
8 |
World Bank World Development Indicators (WDI) |
By which criteria should we classify economies? Does division based on GNI per capita levels make sense? |
|
9 |
MHMisinfo |
Do videos promoting health information and those spreading health misinformation differ systematically from each other? |
Potential feature of interest: label
|
10 |
Political Apologies Database |
Why do political leaders apologise? |
Potential feature of interest: description
|
In the same document, youโll also be asked to indicate your availability for the planned mentoring slots on April 30th and May 7th (being available for a slot means you can join either in-person or online - when booking slots, weโll give you the option to choose to join the sessions in-person or online)
The final group composition will be announced by 6.30pm on April 28th.
Step 2: Book mentoring session slots
- Check the Slackโs
#announcements
channel for a document on which youโll be able to book mentoring session slots. Book the slots by April 29th at 5pm.
Step 3: Create the group project repository on GitHub Classroom
Go to our Slack workspaceโs
#announcements
channel to find a GitHub Classroom link entitled ๐ Group project. Do not share this link with anyone outside this course!Click on the link, sign in to GitHub, and then click on the green
Accept this assignment
button. The first student from the team will be creating a new team (and giving it a name) while the others will join an existing team. So coordinate between yourselves on the team name so that you join the correct GitHub repositoryYou will be redirected to a new private repository created just for you. The repository will be named
ds202w-2024-2025-group-project-name-of-your-team
, wherename-of-your-team
is the team name youโve chosen. The repository will be private and will be blank unlike in previous assignments. Itโll be up to you to populate it. In particular, add:a README.md file : this should document the content of your repository and give instructions on how to use it. See more details about README files here
a
.qmd
file as well as a renderedHTML
file that correspond to your final group project report. These files should only contain the amount of analysis needed to answer the original research question. Donโt try every single machine learning method under the sun to solve the research questions, avoid explanations that are too verbose and only provide code if it adds anything to your storytelling. The.qmd
file should be named after your team.Fill out the
.qmd
file with your analysis. Only add code chunks if required for your storytelling. Still, you should provide a nicely formatted notebook.- Use headers (in particular section/subsection headers) and code chunks to keep your work organised. This will make it easier for us to grade your work. Learn more about the basics of markdown formatting here.
- donโt forget to reference your work properly if using ideas which are not your own
Once done, click on the
Render
button at the top of the.qmd
file. This will create an.html
file with the same name as your.qmd
file. For example, if your.qmd
file is named12345.qmd
, then the.html
file will be named12345.html
.If you added any code, ensure your
.qmd
code is reproducible. If we were to restart VSCode and run your notebook, it should run without errors, and we should get the same results as you did.If you choose to add code, please ensure your code confirms, reinforces, or complements your storytelling. Adding code just for the sake of it will not help you get a higher grade.
individual contribution reflections files (500 words max) (see Section 1.4 for this part)
โWhat do I submit?โ
You will submit:
A Quarto markdown file with the following naming convention:
<TEAM_NAME>.qmd
, where<TEAM_NAME>
is your candidate number. For example, if your team name isteam_alpha
, then your file should be namedteam_alpha.qmd
.An HTML file render of the Quarto markdown file. In case this wasnโt clear already, your HTML file needs to be self-contained (this was a requirement for every assignment until now and still is a requirement now).
In addition to these two files, each team member will submit an individual contribution reflection file of 500 words maximum. This should be submitted as a Markdown file, say, a reflections/<username>.md
where you replace <username>
with your GitHub username (โ ๏ธ donโt forget to send them to us i.e if you havenโt already done so!). In this file, you should outline:
- your technical contribution e.g which parts of the analysis you contributed to, which models you implemented, which code you wrote
- your role in the team collaboration e.g examples of how you supported your team members, coordinated or played a role in diffusing conflicts
- what you learned form this project e.g any skills you developed, challenges you overcame or areas you want to work on further in the future
Provide some evidence to back up your reflection file (e.g meeting notes, Slack discussion screenshots, links to GitHub classroom commits or pull requests).
You donโt need to click to submit anything. Your assignment will be automatically submitted when you commit
AND push
your changes to GitHub. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment. Not sure how to use Git on your computer? You can always add the files via the GitHub web interface.
๐ Your Task
What do we need from you?
Context
While we provide data and general research question, we will not be prescriptive in terms of choice of methods. Instead, we will task you with proposing your approach to the data.
Unlike other assignments, the data is also provided as is so you will have to choose your own features and do some amount of data cleaning before proceeding with your analysis.
This mirrors real-world scenarios in data science and academic research, where you are often given a dataset and asked to derive insights or address a problem.
Some things we want you to consider when tackling your research questions are as follows:
What are the characteristics specific to our dataset? What does that entail for our subsequent analyses/modeling?
Which features do we need and why? Do we need any pre-processing to make them usable?
Should we go for a supervised or unsupervised model or both?
How do we define our target variable (if you need one)? Is our definition sound? Does it have any limits?
โ๏ธ Assessment criteria
Here is a rough rubric for how weโll grade this project.
Component | Weight | Things that influence your grade |
---|---|---|
Report organisation and logic | 15% |
|
Clarity of the presentation | 15% |
|
Appropriateness of the methods chosen wrt to the problem at hand | 30% |
|
Quality of the interpretations | 30% |
|
Team coordination and project documentation | 10% |
|
A 70% or above score is considered a distinction in the typical LSE expectation. This means that you can expect to score around 70% if you provide adequate answers to all questions, in line with the learning outcomes of the course and the instructions provided.
Only if you go above and beyond what is asked of you in a meaningful way will you get a higher score. Simply adding more code or text will not get you a higher score. You need to add unique insights or analyses to get a distinction - we cannot tell you what these are, but these should be things that make us go, โwow, thatโs a great idea! I hadnโt thought of thatโ.
- DO NOT TRY EVERY SINGLE MODEL UNDER THE SUN to tackle to the research question. State your modeling hypotheses clearly, justify your choices and only choose a couple of models to try and solve your question.
- The goal is also not to solve the questions entirely but to get as close as possible to it.
๐ Getting help
Mentoring sessions
There will be two mentoring sessions before the submission of your final group project report on May 13th.
A document will be circulated on Slack on the #announcements
channel for each group to book 2 mentoring session slots: one on April 30th and one on May 7th. Each team is supposed to book a slot on both the 30th and 7th (not all team members need to be present). Book the slots by April 29th at 5pm and indicate whether youโd like the session to be in-person or online.
The aim of the first slot on April 30th is simply to check the feasibility of your project ideas (you wouldnโt have had time to start any real analysis by the time of the first mentoring sessions).
The aim of the second slot on May 7th is to check on the state of advancement of your analysis and to address any potential bottlenecks.
Asking for help on Slack
You can post general clarifying questions on Slack.
For example, you can ask:
- โWhere do I find material that compares different clustering techniques?โ
- โI came across the term โloadingsโ when reading about PCA in the textbook, but I donโt fully understand it. Does anyone have a good alternative resource about it?โ
You wonโt be penalized for posting something on Slack that violates this principle without realizing it. Donโt worry; we will delete your message and let you know.
๐ฏ Collaborating with others
You are allowed to discuss the assignment with other teams, work alongside each other, and help each other. However, you cannot share or copy code from others โ pretty much the same rules as above.
๐ค Using AI help?
You can use Generative AI tools such as ChatGPT when doing this research and search online for help. If you use it, however minimal use you made, you are asked to report the AI tool you used and add an extra section to your notebook to explain how much you used it.
Note that while these tools can be helpful, they tend to generate responses that sound convincing but are not necessarily correct. Another problem is that they tend to create formulaic and repetitive responses, thus limiting your chances of getting a high mark. When it comes to coding, these tools tend to generate code that is not very efficient or old and does not follow the principles we teach in this course.
To see examples of how to report the use of AI tools, see ๐ค Our Generative AI policy.