๐Ÿ—“๏ธ Week 05 - Statistical Inference II

2023/24 Autumn Term

Author

We are continuing our exploration of statistical inference this week, and we are getting heavier on content this week! Try not to skip lectures, or you will start missing out on essential concepts.

๐Ÿ‘จโ€๐Ÿซ Lecture Slides

Either click on the slide area below or click here to view it in fullscreen. Use your keypad to navigate the slides. You can also find a PDF version on Moodle.

๐ŸŽฅ Looking for lecture recordings? You can only find those on Moodle.

๐Ÿ Full code for the lecture case study

You can find the full code for the lecture case study here

โœ๏ธ Formative Coursework

Note
  • You can choose to write it as an individual or in pairs.
  • This coursework is not graded but it is good practice for your Quarto skills (and modeling/coding skills) and I will provide feedback on your notes if you submit them via Moodle.
  • Create a Quarto Markdown file and name it LSE_DS101A_2023_24_W05_formative.qmd.
  • Because this is a formative assessment, submission is not anonymous. Therefore, please include your name(s) in the document.
  • Feel free to format this document however you like. You will learn how to work better with Quarto Markdown in the next weeks.

Task 1:

  • Read the article referenced in the indicative readings of the week (Aschwanden 2015).
  • Now answer the following questions:
    1. What is p-hacking? What are the consequences of p-hacking?
    2. How should researchers avoid p-hacking?

Task 2:

  • Now, think back to the small, but tidy, data you collected on ๐Ÿ—“๏ธ Week 02.
  • Suppose you were to create a linear model that would predict a dependent variable called number_of_people_affected of Wikipediaโ€™s ongoing events.
  • Now answer the following questions:
    1. What would you use as independent variables?
    2. How would you handle missing data?
    3. What would be your null and alternative hypotheses?
    4. How would you avoid p-hacking?

Task 3: Coding/Modeling practice

In the Week 05 lecture, we alluded to the fact that developing and developed countries had different life expectancy distributions and also seemed to have different distributions when it came to the independent variables.

  1. Relying on this statement, can you explore the life expectancy dataset to select appropriate independent variables to build two linear regression models (one for developing countries and one for developed countries)? Write the code that would help you for this task (Hint: have a look at the full code for the case study from week 05)

  2. (Bonus task) How would you go about dealing with outliers in the data? You can illustrate your approach on either one of the models (developing or developed countries) youโ€™ve built previously. (Hint: Look at this link or this one for help)

Submission

  • After you are done writing, render your document to HTML and submit it on Moodle. See this guide for a quick tutorial on how to preview and render your documents to HTML in VSCode.
  • The preferred submission is a single HTML document (rendered from Quarto Markdown) that contains the answers to all questions for this formative (including the code-related ones). However, if you are unsure how to embed Python code into a Quarto Markdown file, you are allowed to save the code into a Jupyter notebook (.ipynb extension), include both the HTML file with your non-code related answers and the notebook into an archive (.zip) and upload the archive (.zip) to Moodle.
Tip

As shown in the Week 4 class, you can use the VSCode terminal to preview or render your .qmd document:

  • to open the VSCode terminal, go to the VSCode menu, click on Terminal>New Terminal. A new terminal will open.

  • In the new terminal, check the content of the current folder you are in by typing the ls command. If your .qmd file does not appear in your current folder, check in which folder you are by typing the command pwd. Use the cd command to change folders, e.g pwd shows that you are currently in /home/users/Downloads but your .qmd file is /home/users/Documents/DS101A, you could type cd /home/users/Documents/DS101A to go to the correct directory or alternatively you could type cd ../Documents/DS101A (../ is a special path that brings you back to the parent folder from the folder you are currently in, in this example, it would bring you from /home/users/Downloads to home/users/).

  • Once you are in the correct folder (you can type ls or pwd again to check), you can:

    • preview your document by typing the command quarto preview name_of_quarto.qmd --no-browser
    • render your document to HTML by typing the command quarto render name_of_quarto.qmd. If you want to produce a single HTML file (and not a folder of files), add the line self-contained: true to the YAML header of your Quarto document i.e the YAML header of your document should be similar to this
---
title: Quarto document title
author: Your name
format:
   html:
     self-contained: true
bibliography: references.bib
jupyter: python3 
engine: jupyter
---
  • only add the lines jupyter: python3 and engine: jupyter in your YAML header if inserting Python code into your .qmd
  • Deadline: 10 15 November 2023.

๐Ÿ“Ÿ Communication

  • Post your reflections, questions, and links on Slack.

References

Aschwanden, Christie. 2015. โ€œScience Isnโ€™t Broken.โ€ FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/.