๐๏ธ Week 05 - Statistical Inference II
2023/24 Autumn Term
We are continuing our exploration of statistical inference this week, and we are getting heavier on content this week! Try not to skip lectures, or you will start missing out on essential concepts.
๐จโ๐ซ Lecture Slides
Either click on the slide area below or click here to view it in fullscreen. Use your keypad to navigate the slides. You can also find a PDF version on Moodle.
๐ฅ Looking for lecture recordings? You can only find those on Moodle.
๐ Full code for the lecture case study
You can find the full code for the lecture case study here
โ๏ธ Formative Coursework
- You can choose to write it as an individual or in pairs.
- This coursework is not graded but it is good practice for your Quarto skills (and modeling/coding skills) and I will provide feedback on your notes if you submit them via Moodle.
- Create a Quarto Markdown file and name it
LSE_DS101A_2023_24_W05_formative.qmd
. - Because this is a formative assessment, submission is not anonymous. Therefore, please include your name(s) in the document.
- Feel free to format this document however you like. You will learn how to work better with Quarto Markdown in the next weeks.
Task 1:
- Read the article referenced in the indicative readings of the week (Aschwanden 2015).
- Now answer the following questions:
- What is p-hacking? What are the consequences of p-hacking?
- How should researchers avoid p-hacking?
Task 2:
- Now, think back to the small, but tidy, data you collected on ๐๏ธ Week 02.
- Suppose you were to create a linear model that would predict a dependent variable called
number_of_people_affected
of Wikipediaโs ongoing events. - Now answer the following questions:
- What would you use as independent variables?
- How would you handle missing data?
- What would be your null and alternative hypotheses?
- How would you avoid p-hacking?
Task 3: Coding/Modeling practice
In the Week 05 lecture, we alluded to the fact that developing and developed countries had different life expectancy distributions and also seemed to have different distributions when it came to the independent variables.
Relying on this statement, can you explore the life expectancy dataset to select appropriate independent variables to build two linear regression models (one for developing countries and one for developed countries)? Write the code that would help you for this task (Hint: have a look at the full code for the case study from week 05)
(Bonus task) How would you go about dealing with outliers in the data? You can illustrate your approach on either one of the models (developing or developed countries) youโve built previously. (Hint: Look at this link or this one for help)
Submission
- After you are done writing, render your document to HTML and submit it on Moodle. See this guide for a quick tutorial on how to preview and render your documents to HTML in VSCode.
- The preferred submission is a single HTML document (rendered from Quarto Markdown) that contains the answers to all questions for this formative (including the code-related ones). However, if you are unsure how to embed Python code into a Quarto Markdown file, you are allowed to save the code into a Jupyter notebook (.ipynb extension), include both the HTML file with your non-code related answers and the notebook into an archive (.zip) and upload the archive (.zip) to Moodle.
As shown in the Week 4 class, you can use the VSCode terminal to preview or render your .qmd
document:
to open the VSCode terminal, go to the VSCode menu, click on Terminal>New Terminal. A new terminal will open.
In the new terminal, check the content of the current folder you are in by typing the
ls
command. If your.qmd
file does not appear in your current folder, check in which folder you are by typing the commandpwd
. Use thecd
command to change folders, e.gpwd
shows that you are currently in/home/users/Downloads
but your.qmd
file is/home/users/Documents/DS101A
, you could typecd /home/users/Documents/DS101A
to go to the correct directory or alternatively you could typecd ../Documents/DS101A
(../
is a special path that brings you back to the parent folder from the folder you are currently in, in this example, it would bring you from/home/users/Downloads
tohome/users/
).Once you are in the correct folder (you can type
ls
orpwd
again to check), you can:- preview your document by typing the command
quarto preview name_of_quarto.qmd --no-browser
- render your document to HTML by typing the command
quarto render name_of_quarto.qmd
. If you want to produce a single HTML file (and not a folder of files), add the lineself-contained: true
to the YAML header of your Quarto document i.e the YAML header of your document should be similar to this
- preview your document by typing the command
---
title: Quarto document title
author: Your name
format:
html:
self-contained: true
bibliography: references.bib
jupyter: python3
engine: jupyter
---
- only add the lines
jupyter: python3
andengine: jupyter
in your YAML header if inserting Python code into your.qmd
- Deadline:
1015 November 2023.
๐ Recommended Reading
- Check the end of slides for the list of references cited in the lecture.
- Check the ๐ Syllabus for this weekโs complete list of indicative and recommended readings.
๐ Communication
- Post your reflections, questions, and links on Slack.