✍️ Coursework (Formative)

2025/26 Autumn Term

Author

Dr. Ghita Berrada

Published

21 October 2025

🎯 OBJECTIVES:

By the end of this activity, you should be able to:

Identify and critique common pitfalls in interpreting and communicating statistical results.
Evaluate the strengths and weaknesses of p-values, regression models, and visualization choices.
Practise critical reflection on how research methods, assumptions, and communication shape evidence.
Suggest alternatives and improvements to common practices in data analysis.

⌛ DEADLINE: 11 November 2025 at 5pm.

📚 Preparation: Quarto/Zotero Tutorial

Prepare for this formative by working through the Quarto/Zotero tutorial first.

Step 1: Install the necessary software and test your installation. See the installation guide
Step 2: Work through the Quarto/Zotero tutorial
Step 3: Check the solutions to the tutorial here

You’re now ready to continue on to your formative.

Instructions

Note

You can choose to write this formative as an individual or in pairs.
This coursework is not graded but it is good practice for your Quarto skills and I will provide feedback on your notes if you submit them via Moodle.
Before working through this formative, take some time to practice Quarto/Zotero with the help of the dedicated tutorial

Create a Quarto Markdown file and name it LSE_DS101A_2025_26_W05_formative.qmd.
Because this is a formative assessment, submission is not anonymous. Therefore, please include your name(s) in the document.
Feel free to format this document however you like. You will learn how to work better with Quarto Markdown in the next weeks.

Submission

After writing, render your Quarto Markdown document to HTML and submit it on Moodle. See this guide for a quick tutorial on how to preview and render your documents to HTML in VSCode.
The preferred submission is a single HTML document (rendered from Quarto Markdown) that contains the answers to all questions for this formative.

Tip

As shown in the Quarto/Zotero tutorial, you can use the VSCode terminal to preview or render your .qmd document:

to open the VSCode terminal, go to the VSCode menu, click on Terminal>New Terminal. A new terminal will open.
In the new terminal, check the content of the current folder you are in by typing the ls command. If your .qmd file does not appear in your current folder, check in which folder you are by typing the command pwd. Use the cd command to change folders, e.g pwd shows that you are currently in /home/users/Downloads but your .qmd file is /home/users/Documents/DS101A, you could type cd /home/users/Documents/DS101A to go to the correct directory or alternatively you could type cd ../Documents/DS101A (../ is a special path that brings you back to the parent folder from the folder you are currently in, in this example, it would bring you from /home/users/Downloads to home/users/).
Once you are in the correct folder (you can type ls or pwd again to check), you can:
- preview your document by typing the command quarto preview name_of_quarto.qmd --no-browser
- render your document to HTML by typing the command quarto render name_of_quarto.qmd. If you want to produce a single HTML file (and not a folder of files) that has all the figures correctly rendered, add the line self-contained: true to the YAML header of your Quarto document i.e the YAML header of your document should be similar to this

---
title: Quarto document title
author: Your name
format:
   html:
     self-contained: true
bibliography: references.bib
---

Tasks

✨ Task 1: When numbers mislead

Readings:

Regina Nuzzo (2014). “Scientific method: statistical errors”. Nature, 506(7487). – (Nuzzo 2014)
Regina Nuzzo (2015). “How scientists fool themselves–and how they can stop”. Nature, 526(7572), 182-185. – (Nuzzo 2015)
John Bohannon (2015). “I fooled millions into thinking chocolate helps weight loss. Here’s how”. Gizmodo. – (Bohannon 2015)
Sander Greenland et al. (2016). “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations”. European journal of epidemiology, 31(4), 337–350. – (Greenland et al. 2016)

Questions:

According to (Greenland et al. 2016), what are two common misinterpretations of p-values? Why do these misunderstandings persist in practice?
(Nuzzo 2014) argues that researchers often treat p-values as a “gold standard.” How does this mindset connect to the problems (Greenland et al. 2016) describes?
(Nuzzo 2015) identifies cognitive biases that can mislead researchers. Which of these biases might encourage misuse of p-values?
Bohannon’s chocolate hoax (Bohannon 2015) highlights failures in study design and media reporting.
- Identify two major red flags in the study (beyond just “a bad p-value”).
- For each red flag, explain:
  - Why it is a problem for scientific validity.
  - What consequences it had for how the findings were interpreted by the public and media.
  - How the issue could have been avoided or corrected through better study design, analysis, or reporting.
Imagine you are a journal editor: What one policy change would you implement to reduce both statistical misuse (Greenland et al. 2016) and self-deception/hype [Nuzzo (2015)](Bohannon 2015) in published research?

🌍 Task 2: Modelling a complex world

Now, think back to the countries from the world and related data from the 🗓️ Week 02 class. You want to explain differences in GDP per capita.

Questions:

Which independent variables might you include in your regression, and what’s your rationale for including/excluding them?
Regression rests on several assumptions (linearity, independence, etc.). Which assumption is most likely to be violated in cross-country data, and why?
Suppose your model finds a statistically significant relationship between education and GDP per capita. Policymakers claim this “proves” education causes economic growth. How would you challenge or qualify that claim?
Instead of only regression, how could exploratory visualizations (scatterplots, grouping, comparisons, etc.) give complementary insights into the same question?

Task 3: beyond the average

Reading:

Roger Koenker et al. (2001). “Quantile regression”. Journal of economic perspectives, 15(4), 143-156 – (Koenker and Hallock 2001)

Questions:

Why is it problematic to only focus on the “average case” in social science data?
Give a real-world example where focusing on the extremes (e.g., the poorest, most at-risk, or highest-performing) reveals insights that the average hides.
Compare: how might linear regression vs. quantile regression tell different stories about the same dataset?

🎨 Task 4: When Averages and Visualizations Mislead

Readings:

Sparsh Gupta (2022). “Anscombe’s Quartet: What Is It and Why Do We Care?”. Built In – (Sparsh Gupta 2022)
Maria Mouschoutzi (2025). “Water Cooler Small Talk Ep. 7: Anscombe’s Quartet and the Datasaurus”. Towards Data Science – (Mouschoutzi 2025)

Questions:

Think about your exploration of GDP per capita from Task 2. Both Anscombe’s quartet and GDP per capita highlight how summary numbers can hide important patterns.

What lesson does Anscombe’s quartet teach about relying only on summary statistics like the mean or R²?
Imagine two countries with the same GDP per capita. Give two different ways their economic situations could still be very different (justify your answer).
How might a regression model using GDP per capita as a dependent variable miss important internal variation within countries?
Suggest an alternative or complementary indicator to GDP per capita. How could including this indicator change the story told by the data?
Sketch or describe a visualization of country-level GDP data. How could a poorly chosen visualization mislead a policymaker or audience?
Redesign your visualization so it communicates the uncertainty and internal variation more honestly.

🔍 Task 5: Reflection

In a short paragraph:

What principle(s) will you adopt in your own work to keep your analysis honest, transparent, and critical?
Which part of this assignment challenged your thinking the most, and why?

Feedback rubrics

This is how we’ll be evaluating and providing feedback on your work:

Criteria	Needs Support	Developing	Good	Excellent	Questions to Guide You
Understanding of Readings	Misunderstands readings; few/no connections.	Understands some points; limited connection to tasks.	Understands key ideas; applies to tasks.	Clear understanding; integrates readings; applies thoughtfully.	Did I identify main arguments from each reading? Can I explain them in my own words? How do they connect to the tasks? Are my explanations concise, clear, and accessible? Are my citations accurate and clearly linked to the ideas I discuss?
Critical Thinking	Summarizes only; does not justify assertions; ignores source bias or implications.	Identifies some issues; reasoning limited; partial justification; limited consideration of source reliability or bias.	Identifies key issues; justifies assertions; considers source reliability and potential bias; explains consequences.	Goes beyond identifying issues; synthesizes across sources; justifies all assertions; critically evaluates bias, reliability, and implications for evidence.	Did I spot flaws or biases? Can I explain why they matter? Did I think about alternative approaches? Did I justify my assertions? Did I consider whether sources might be biased or limited? Are my explanations concise, clear, and accessible? Are my citations accurate and clearly linked to the ideas I discuss?
Application to Data & Visualization	Minimal/incorrect application; plots or analysis misleading.	Some correct application; partial attention to assumptions; basic plots.	Thoughtful application; acknowledges assumptions; visualizations show trends/variation.	Thoughtful, accurate application; visualizations communicate patterns, uncertainty, and hidden variation; reasoning is clear.	Does my plot clearly show trends or variation? Am I considering assumptions, outliers, or uncertainty? How could I improve the visualization? Are my explanations concise, clear, and accessible? Are my citations accurate and clearly linked to the ideas I discuss?
Reflection & Synthesis	No reflection; no connection to practice.	Limited reflection; identifies 1-2 lessons.	Reflection shows insight; connects learning to analysis; identifies principles for future work.	Deep, self-aware reflection; links lessons across tasks; identifies principles and implications for future work.	What challenged my thinking most? What did I learn? How will I apply it in future work? Are my explanations concise, clear, and accessible? Are my citations accurate and clearly linked to the ideas I discuss?
Formative Formatting (Quarto HTML)	HTML does not render; outputs missing; no clear structure; figures/tables unclear; code (if any)/output poorly integrated; hard to read.	HTML renders but some outputs/figures broken; sections partially labeled; figures/tables or code (if any)/output partially clear; minor readability issues.	HTML mostly renders; most outputs/figures visible; sections mostly clear; figures/tables readable and code (if any)/output mostly integrated; styling mostly consistent.	HTML fully renders; all outputs/figures visible and reproducible; sections clearly labeled; figures/tables captioned and clear; code (if any)/output integrated; layout readable and visually consistent; optionally uses a non-default Quarto theme to enhance presentation.	Does my HTML render correctly? Are all outputs and figures present and reproducible? Is the structure clear? Are figures/tables readable and captioned? Is code (if any) integrated with outputs? Is the layout consistent and readable? Did I use a theme thoughtfully to improve readability?

Across all tasks, keep your explanations concise, clear, and accessible to a general audience. Ensure that all ideas are accurately cited and each citation clearly supports the point it is linked to. Avoid copying text verbatim; explain concepts in your own words.

References

Bohannon, John. 2015. “I Fooled Millions into Thinking Chocolate Helps Weight Loss. Here’s How.” Gizmodo, May. https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800.

Greenland, Sander, Stephen J Senn, Kenneth J Rothman, John B Carlin, Charles Poole, Steven N Goodman, and Douglas G Altman. 2016. “Statistical Tests, p Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.

Koenker, Roger, and Kevin F Hallock. 2001. “Quantile Regression.” Journal of Economic Perspectives 15 (4): 143–56. https://pubs.aeaweb.org/doi/pdf/10.1257/jep.15.4.143.

Mouschoutzi, Maria. 2025. “Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus.” Towards Data Science. https://towardsdatascience.com/water-cooler-small-talk-ep-7-anscombes-quartet-and-the-datasaurus-09a143400320/.

Nuzzo, Regina. 2014. “Scientific Method: Statistical Errors.” Nature 506 (7487). https://www.nature.com/articles/506150a.

———. 2015. “How Scientists Fool Themselves–and How They Can Stop.” Nature 526 (7572): 182–85. https://www.nature.com/articles/526182a.

Sparsh Gupta. 2022. “Anscombe’s Quartet: What Is It and Why Do We Need It?” Built In. https://builtin.com/data-science/anscombes-quartet.