✍️ Coursework (Formative)
2024/25 Autumn Term
🎯 OBJECTIVES:
Discover what p-hacking is and reflect on how to avoid it.
Discover the limits of p-values and statistical tests
Learn about the assumptions behind linear regression and the limits of this type of modeling
⌛ DEADLINE: 14 November 2024 at 5pm.
📚 Preparation: Quarto/Zotero Tutorial
Prepare for this formative by working through the Quarto/Zotero tutorial first.
Step 1: Install the necessary software and test your installation. See the installation guide
Step 2: Work through the Quarto/Zotero tutorial
Step 3: Check the solutions to the tutorial here
You’re now ready to continue on to your formative.
Instructions
- You can choose to write this formative as an individual or in pairs.
- This coursework is not graded but it is good practice for your Quarto skills and I will provide feedback on your notes if you submit them via Moodle.
- Before working through this formative, take some time to practice Quarto/Zotero with the help of the dedicated tutorial
- Create a Quarto Markdown file and name it
LSE_DS101A_2024_25_W05_formative.qmd
. - Because this is a formative assessment, submission is not anonymous. Therefore, please include your name(s) in the document.
- Feel free to format this document however you like. You will learn how to work better with Quarto Markdown in the next weeks.
Submission
- After writing, render your Quarto Markdown document to HTML and submit it on Moodle. See this guide for a quick tutorial on how to preview and render your documents to HTML in VSCode.
- The preferred submission is a single HTML document (rendered from Quarto Markdown) that contains the answers to all questions for this formative (including the code-related ones). However, if you are unsure how to embed Python code into a Quarto Markdown file, you are allowed to save the code into a Jupyter notebook (.ipynb extension), include both the HTML file with your non-code related answers and the notebook into an archive (.zip) and upload the archive (.zip) to Moodle.
As shown in the Quarto/Zotero tutorial, you can use the VSCode terminal to preview or render your .qmd
document:
to open the VSCode terminal, go to the VSCode menu, click on Terminal>New Terminal. A new terminal will open.
In the new terminal, check the content of the current folder you are in by typing the
ls
command. If your.qmd
file does not appear in your current folder, check in which folder you are by typing the commandpwd
. Use thecd
command to change folders, e.gpwd
shows that you are currently in/home/users/Downloads
but your.qmd
file is/home/users/Documents/DS101A
, you could typecd /home/users/Documents/DS101A
to go to the correct directory or alternatively you could typecd ../Documents/DS101A
(../
is a special path that brings you back to the parent folder from the folder you are currently in, in this example, it would bring you from/home/users/Downloads
tohome/users/
).Once you are in the correct folder (you can type
ls
orpwd
again to check), you can:- preview your document by typing the command
quarto preview name_of_quarto.qmd --no-browser
- render your document to HTML by typing the command
quarto render name_of_quarto.qmd
. If you want to produce a single HTML file (and not a folder of files), add the lineself-contained: true
to the YAML header of your Quarto document i.e the YAML header of your document should be similar to this
- preview your document by typing the command
---
title: Quarto document title
author: Your name
format:
html:
self-contained: true
bibliography: references.bib
---
Tasks
Task 1:
- Read the following articles: (Aschwanden 2015) and (Greenland et al. 2016).
- Now answer the following questions:
- How are p-values misused/misinterpreted?
- What is p-hacking? What are the consequences of p-hacking?
- How should researchers avoid p-hacking?
Task 2:
- Now, think back to the countries from the world from the 🗓️ Week 05 lecture.
- Suppose you were to create a linear model that would predict the dependent variable called
GDP per capita
. - Now answer the following questions:
- What would you use as independent variables?
- How would you handle missing data and outliers?
- What would be your null and alternative hypotheses?
- How would you avoid p-hacking?
Task 3:
- Read the following article: (Hohl 2009)
- Now answer the following questions:
- According to the article, should linear regression be used as a matter of routine? Why or why not?
- What does the article suggest as an alternative to linear regression? Why?