π Group project
2024/25 Autumn Term
π‘ NOTE: This time, you are not asked to write code as part of the assignment. If you choose to do so, please ensure your code confirms, reinforces, or complements your answers. Adding code just for the sake of it will not help you get a higher grade.
β²οΈ Due Date: Wednesday, February 12th, 5pm
If you update your files on GitHub after this date without an authorised extension, you will receive a late submission penalty.
Did you have an extenuating circumstance and need an extension? Send an e-mail to π§
βοΈ Assignment Weight:
This assignment is worth 40% of your final grade in this course.
40%
π Instructions
π Read it carefully, as some details might change from one assignment to another.
Step 1 on Day 1 (January 29th): Choice of datasets/research questions and group formation
- Get acquainted with the dataset/research question pairs and rank them by order of preference by 5pm on January 29th in a document shared on Slackβs
#announcements
channel
We have chosen three datasets to base the research questions for the group projects on:
And download the full World Values survey questionnaire description below:
And download the full Wellcome Trust Global Monitor survey questionnaire description below:
The research questions for the group projects are, as follows:
Project | Dataset | Research question |
---|---|---|
1 | Wellcome Trust Global Monitor, 2020 | What factors determine the publicβs trust in science ( question W6 )? |
2 | Wellcome Trust Global Monitor, 2020 | What factors would determine oneβs opinion and outlook on science (questions W11A and W11B )? |
3 | Wellcome Trust Global Monitor, 2020 | Would science increase or decrease jobs ( question W10 )? |
4 | Wellcome Trust Global Monitor, 2020 | What factors make it more (or less) likely to be climatosceptic (question W15 )? |
5 | World Values Survey, Wave 7 | What factors influence the perception that a country is democratic (question Q251 )? |
6 | World Values Survey, Wave 7 | What factors influence feelings of security (question Q131 )? |
7 | World Values Survey, Wave 7 | What factors drive an interest in politics (question Q199 )? |
8 | World Values Survey, Wave 7 | What actions can be justified (question Q177 to Q195 )? Are there regional differences when it comes to this? |
9 | World Values Survey, Wave 7 | What factors drive the perception of freedom of choice (question Q48 )? |
10 | European Social Survey, Round 11 | Can you predict emotional attachment to country (atchctr )? What factors drive it? |
11 | European Social Survey, Round 11 | What factors make it more likely to divide parliament (more) equally between men and women (eqparep )? |
12 | World Values Survey, Wave 7 | Can you predict trust in media (e.g Q66 ,Q67 ,Q117 )? |
13 | European Social Survey, Round 11 | Can you predict trust in political institutions? |
In the same document, youβll also be asked to indicate your availability for the planned mentoring slots on January 31st and February 5th
The final group composition will be announced by 6.30pm on January 29th
Step 2: Book mentoring session slots
- Check the Slackβs
#announcements
channel for a document on which youβll be able to book mentoring session slots. Book the slots by January 30th at 6pm.
Step 3: Create the group project repository on GitHub Classroom
Go to our Slack workspaceβs
#announcements
channel to find a GitHub Classroom link entitled π Group project. Do not share this link with anyone outside this course!Click on the link, sign in to GitHub, and then click on the green
Accept this assignment
button. The first student from the team will be creating a new team (and giving it a name) while the others will join an existing team. So coordinate between yourselves on the team name so that you join the correct GitHub repositoryYou will be redirected to a new private repository created just for you. The repository will be named
ds202a-2024-group-project-name-of-your-team
, wherename-of-your-team
is the team name youβve chosen. The repository will be private and will be blank unlike in previous assignments. Itβll be up to you to populate it. In particular, add:a README.md file : this should document the content of your repository and give instructions on how to use it. See more details about README files here
a
.qmd
file as well as a renderedHTML
file that correspond to your final group project report. These files should only contain the amount of analysis needed to answer the original research question. Donβt try every single machine learning method under the sun to solve the research questions, avoid explanations that are too verbose and only provide code if it adds anything to your storytelling. The.qmd
file should be named after your team.Fill out the
.qmd
file with your analysis. Only add code chunks if required for your storytelling. Still, you should provide a nicely formatted notebook.- Use headers (in particular section/subsection headers) and code chunks to keep your work organised. This will make it easier for us to grade your work. Learn more about the basics of markdown formatting here.
- donβt forget to reference your work properly if using ideas which are not your own
Once done, click on the
Render
button at the top of the.qmd
file. This will create an.html
file with the same name as your.qmd
file. For example, if your.qmd
file is named12345.qmd
, then the.html
file will be named12345.html
.If you added any code, ensure your
.qmd
code is reproducible. If we were to restart R and RStudio and run your notebook, it should run without errors, and we should get the same results as you did.If you choose to add code, please ensure your code confirms, reinforces, or complements your storytelling. Adding code just for the sake of it will not help you get a higher grade.
individual contribution reflections files (500 words max) (see Section 1.4 for this part)
βWhat do I submit?β
You will submit:
A Quarto markdown file with the following naming convention:
<TEAM_NAME>.qmd
, where<TEAM_NAME>
is your candidate number. For example, if your team name isteam_alpha
, then your file should be namedteam_alpha.qmd
.An HTML file render of the Quarto markdown file.
In addition to these two files, each team member will submit an individual contribution reflection file of 500 words maximum. This should be submitted as a Markdown file, say, a reflections/<username>.md
where you replace <username>
with your GitHub username (β οΈ donβt forget to send us i.e if you havenβt already done so!). In this file, you should outline:
- your technical contribution e.g which parts of the analysis you contributed to, which models you implemented, which code you wrote
- your role in the team collaboration e.g examples of how you supported your team members, coordinated or played a role in diffusing conflicts
- what you learned form this project e.g any skills you developed, challenges you overcame or areas you want to work on further in the future
Provide some evidence to back up your reflection file (e.g meeting notes, Slack discussion screenshots, links to GitHub classroom commits or pull requests).
You donβt need to click to submit anything. Your assignment will be automatically submitted when you commit
AND push
your changes to GitHub. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment. Not sure how to use Git on your computer? You can always add the files via the GitHub web interface.
π Your Task
What do we need from you?
Context
While we provide data and general research question, we will not be prescriptive in terms of choice of methods. Instead, we will task you with proposing your approach to the data.
Unlike other assignments, the data is also provided as is so you will have to choose your own features and do some amount of data cleaning before proceeding with your analysis.
This mirrors real-world scenarios in data science and academic research, where you are often given a dataset and asked to derive insights or address a problem.
Some things we want you to consider when tackling your research questions are as follows:
Which features do we need and why? Do we need any pre-processing to make them usable?
Should we go for a supervised or unsupervised model or both?
How do we define our target variable (if you need one)? Is our definition sound? Does it have any limits?
βοΈ Assessment criteria
Here is a rough rubric for how weβll grade this project.
Component | Weight | Things that influence your grade |
---|---|---|
Report organisation and logic | 15% |
|
Clarity of the presentation | 15% |
|
Appropriateness of the methods chosen wrt to the problem at hand | 30% |
|
Quality of the interpretations | 30% |
|
Team coordination and project documentation | 10% |
|
A 70% or above score is considered a distinction in the typical LSE expectation. This means that you can expect to score around 70% if you provide adequate answers to all questions, in line with the learning outcomes of the course and the instructions provided.
Only if you go above and beyond what is asked of you in a meaningful way will you get a higher score. Simply adding more code or text will not get you a higher score. You need to add unique insights or analyses to get a distinction - we cannot tell you what these are, but these should be things that make us go, βwow, thatβs a great idea! I hadnβt thought of thatβ.
- DO NOT TRY EVERY SINGLE MODEL UNDER THE SUN to tackle to the research question. State your modeling hypotheses clearly, justify your choices and only choose a couple of models to try and solve your question.
- The goal is also not to solve the questions entirely but to get as close as possible to it.
π Getting help
Mentoring sessions
There will be two mentoring sessions before the submission of your final group project report on February 12th.
A document will be circulated on Slack on the #announcements
channel for each group to book 2 mentoring session slots: one on January 31st and one on February 5th. Each team is supposed to book a slot on both the 31st and 5th (not all team members need to be present). Book the slots by January 30th at 6pm.
The aim of the first slot on January 31st is simply to check the feasibility of your project ideas (you wouldnβt have had time to start any real analysis by the time of the first mentoring sessions).
The aim of the second slot on February 5th is to check on the state of advancement of your analysis and to address any potential bottlenecks.
Asking for help on Slack
You can post general clarifying questions on Slack.
For example, you can ask:
- βWhere do I find material that compares different clustering techniques?β
- βI came across the term βloadingsβ when reading about PCA in the textbook, but I donβt fully understand it. Does anyone have a good alternative resource about it?β
You wonβt be penalized for posting something on Slack that violates this principle without realizing it. Donβt worry; we will delete your message and let you know.
π― Collaborating with others
You are allowed to discuss the assignment with other teams, work alongside each other, and help each other. However, you cannot share or copy code from others β pretty much the same rules as above.
π€ Using AI help?
You can use Generative AI tools such as ChatGPT when doing this research and search online for help. If you use it, however minimal use you made, you are asked to report the AI tool you used and add an extra section to your notebook to explain how much you used it.
Note that while these tools can be helpful, they tend to generate responses that sound convincing but are not necessarily correct. Another problem is that they tend to create formulaic and repetitive responses, thus limiting your chances of getting a high mark. When it comes to coding, these tools tend to generate code that is not very efficient or old and does not follow the principles we teach in this course.
To see examples of how to report the use of AI tools, see π€ Our Generative AI policy.