π₯οΈ Week 09 Lecture
Exploratory Data Analysis & Data Visualisation
By the end of this lecture, you should be able to: i) explain why summary statistics alone can mislead (Anscombeβs Quartet, Datasaurus Dozen), ii) check data completeness and identify systematic missingness, iii) choose between mean and median based on distribution shape and justify your choice, iv) identify and investigate outliers using domain reasoning, v) recognise common data visualisation sins and apply evidence-based alternatives, vi) use sns.FacetGrid to compare distributions across groups, vii) distinguish correlation from causation in your own analysis.
π Update announced in this lecture: Mini-Project 2 is now due on Wednesday 1 April 2026 at 8 pm UK time. That gives you an extra week to apply todayβs EDA and visualisation work to NB03 and REPORT.md. The W11 group pitch is now formative, so you will receive feedback on the day as you would anyway but it will no longer count towards your grade.
π Logistics
πLocation: Thursday, 19 March 2026, 4-6 pm at CKK.LG.03
Todayβs lecture covers two connected topics. Weβll talk about how to freely (but not really) explore your data using EDA techniques. Then we will cover how to communicate your insights effectively with data visualisation. The second part includes a live critique activity on Slack, so make sure you have it open and notifications on!
Note: This weekβs Friday lab has three parts: open EDA work, group formation for the π¦ Group Project, and GitHub Pages setup. If you already have friends youβd like to work with, let them know. Groups should ideally have 3 people, with up to 4 if needed.
π Preparation
- You went to the π₯οΈ W08 Lecture and π» W08 Lab
- Your
NB01andNB02for βοΈ Mini-Project 2 should be complete (or very close), so you can focus on analysis andNB03this week

The LSE runs a course survey every term, and your feedback genuinely shapes how this module is taught next year. It takes about 3 minutes. πΌ
π‘ Note: Please assess all the instructors you have interacted with
(Jon counts as a teacher too!).
Last updated: 18 March 2026
π£οΈ Lecture Overview
Hour 1: Exploratory Data Analysis
- Why summary statistics alone can mislead: Anscombeβs Quartet and the Datasaurus Dozen
- Introducing the IMDb dataset: multiple tables connected by shared keys (same
pd.merge()logic as W08) - Checking data completeness and systematic missingness with
.notna()and.groupby() - Understanding distributions:
.describe(),sns.histplot(), skewness - Mean vs median: why they disagree, when each is appropriate, and what the gap tells you
- Investigating outliers with domain reasoning
Hour 2: Data Visualisation & Communication
- The Seven Sins of data visualisation (truncated axes, pie chart overload, bar plots hiding distributions, and more)
- Weissgerber et al. (2015): the research behind βshow the data, not just the summaryβ
- Good dataviz examples: CJR, The Pudding, Visual Cinnamon, Closeread Prize winners
- π Hall of Fame / ποΈ Hall of Shame: live critique activity on Slack
- Static insights vs interactive dashboards (and what your REPORT.md should do)
sns.FacetGridfor multi-group comparison- Correlation vs causation: what language to use in your REPORT.md
- Closeread: an optional scrollytelling upgrade for REPORT.md
π Lecture Materials
Today we use facilitation slides plus one lecture notebook demonstrating the EDA workflow on IMDb data. The notebook will be shared on Nuvolos, and you can also download a zip bundle with all Week 09 files.
π¬ Facilitation Slides
Use keyboard arrows to navigate. Select the slides below or view fullscreen.
Or download the slides directly as a PDF:
π Appendix
Key References
- π Weissgerber et al. (2015): Beyond bar and line graphs
- π Same Stats, Different Graphs (Autodesk): Datasaurus Dozen
- π Friends Donβt Let Friends Make Bad Graphs
- π Tyler Vigenβs Spurious Correlations
- π Closeread documentation
- π seaborn FacetGrid docs
Useful Links
- βοΈ Mini-Project 2
- π» W09 Lab
- π Syllabus
- β Contact Hours
Looking Ahead
- Friday W09 Lab: NB03 working session + group formation for the π¦ Group Project
- Week 10 Lecture: Git collaboration for teams (
git fetch,git pull, merge conflicts) and introduction to SQL with the same IMDb database - Monday W11: Group pitch presentations for the π¦ Group Project
