๐Ÿ—“๏ธ Week 05 - Exploratory data analysis and visualisation

2024/25 Autumn Term

Author
Note

The lecture slides have been updated to include some more explanations and clarifications.

This week, weโ€™ll dive a bit deeper into exploratory data analysis. During the classes, itโ€™ll be time for your group presentations (and first summative for this course!).

๐Ÿ‘ฉ๐Ÿปโ€๐Ÿซ Lecture Slides

Either click on the slide area below or click here to view it in fullscreen. Use your keypad to navigate the slides. You can also find a PDF version on Moodle.

๐ŸŽฅ Looking for lecture recordings? You can only find those on Moodle.

๐Ÿ“š Quarto/Zotero tutorial (Preparation for the formative coursework)

Step 1: Install the necessary software and test your installation. See the installation guide
Step 2: Work through the Quarto/Zotero tutorial
Step 3: Check the solutions to the tutorial here

Youโ€™re now ready to continue on to your formative.

โœ๏ธ Formative Coursework

Note
  • You can choose to write it as an individual or in pairs.
  • This coursework is not graded but it is good practice for your Quarto skills and I will provide feedback on your notes if you submit them via Moodle.
  • Create a Quarto Markdown file and name it LSE_DS101A_2024_25_W05_formative.qmd.
  • Because this is a formative assessment, submission is not anonymous. Therefore, please include your name(s) in the document.
  • Try to format this document with Quarto Markdown for a bit of practice.

Task 1:

  • Read the following articles: (Aschwanden 2015) and (Greenland et al. 2016).
  • Now answer the following questions:
    1. How are p-values misused/misinterpreted?
    2. What is p-hacking? What are the consequences of p-hacking?
    3. How should researchers avoid p-hacking?

Task 2:

  • Now, think back to the countries from the world from the ๐Ÿ—“๏ธ Week 05 lecture.
  • Suppose you were to create a linear model that would predict the dependent variable called GDP per capita.
  • Now answer the following questions:
    1. What would you use as independent variables?
    2. How would you handle missing data and outliers?
    3. What would be your null and alternative hypotheses?
    4. How would you avoid p-hacking?

Task 3:

  • Read the following article: (Hohl 2009)
  • Now answer the following questions:
    1. According to the article, should linear regression be used as a matter of routine? Why or why not?
    2. What does the article suggest as an alternative to linear regression? Why?

Submission

  • After you are done writing, render your document to HTML and submit it on Moodle. See this guide for a quick tutorial on how to preview and render your documents to HTML in VSCode.
  • The preferred submission is a single HTML document (rendered from Quarto Markdown) that contains the answers to all questions for this formative (including the code-related ones). However, if you are unsure how to embed Python code into a Quarto Markdown file, you are allowed to save the code into a Jupyter notebook (.ipynb extension), include both the HTML file with your non-code related answers and the notebook into an archive (.zip) and upload the archive (.zip) to Moodle.
Tip

As shown in the Quarto/Zotero tutorial, you can use the VSCode terminal to preview or render your .qmd document:

  • to open the VSCode terminal, go to the VSCode menu, click on Terminal>New Terminal. A new terminal will open.

  • In the new terminal, check the content of the current folder you are in by typing the ls command. If your .qmd file does not appear in your current folder, check in which folder you are by typing the command pwd. Use the cd command to change folders, e.g pwd shows that you are currently in /home/users/Downloads but your .qmd file is /home/users/Documents/DS101A, you could type cd /home/users/Documents/DS101A to go to the correct directory or alternatively you could type cd ../Documents/DS101A (../ is a special path that brings you back to the parent folder from the folder you are currently in, in this example, it would bring you from /home/users/Downloads to home/users/).

  • Once you are in the correct folder (you can type ls or pwd again to check), you can:

    • preview your document by typing the command quarto preview name_of_quarto.qmd --no-browser
    • render your document to HTML by typing the command quarto render name_of_quarto.qmd. If you want to produce a single HTML file (and not a folder of files), add the line self-contained: true to the YAML header of your Quarto document i.e the YAML header of your document should be similar to this
---
title: Quarto document title
author: Your name
format:
   html:
     self-contained: true
bibliography: references.bib

---
  • Deadline: 14 November 2024 at 5pm.

๐Ÿ“Ÿ Communication

  • Post your reflections, questions, and links on Slack.

References

Aschwanden, Christie. 2015. โ€œScience Isnโ€™t Broken.โ€ FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/.
Greenland, Sander, Stephen J Senn, Kenneth J Rothman, John B Carlin, Charles Poole, Steven N Goodman, and Douglas G Altman. 2016. โ€œStatistical Tests, p Values, Confidence Intervals, and Power: A Guide to Misinterpretations.โ€ European Journal of Epidemiology 31 (4): 337โ€“50. https://doi.org/10.1007/s10654-016-0149-3.
Hohl, Katrin. 2009. โ€œBeyond the Average Case: The Mean Focus Fallacy of Standard Linear Regression and the Use of Quantile Regression for the Social Sciences.โ€ Available at SSRN 1434418. http://dx.doi.org/10.2139/ssrn.1434418.