DS105 2025-2026 Winter Term Icon

💻 Week 09 Lab

EDA in practice and group formation

Author

Dr Jon Cardoso-Silva

Published

19 March 2026

🥅 Learning Goals

By the end of this lab, you should be able to: i) download a CSV from GitHub correctly (raw file, not the HTML page), ii) apply the EDA checklist from 🖥️ W09 Lecture to a dataset you haven’t seen before, iii) produce a clear, honest visualisation in small groups and explain your choices, iv) articulate your EDA plan for ✍️ Mini-Project 2, v) have formed your group for the 📦 Group Project with a live project page.

This lab puts the EDA principles from 🖥️ W09 Lecture into practice on a dataset you have not worked with before. You will work in small groups from the start, explore the data freely, produce one visualisation that reflects your own question, and then set up your 📦 Group Project repository and GitHub Pages site with your class teacher.

📋 Preparation

  • Attend the 🖥️ W09 Lecture in person
  • Have Slack open with notifications on
  • Think about who you’d like to work with for the 📦 Group Project (ideally 3 people per group, up to 4)

🛣️ Lab Roadmap

Part Activity Type Focus Time Outcome
Part 1 🎯 SMALL-GROUP ACTION POINTS Open EDA exploration + one plot 40 min Your group explores freely and produces one clear plot with a takeaway
Part 2 🗣️ GUIDED ACTIVITY Form teams and accept GitHub Classroom assignment 20 min You have a team name and everyone is in the same repository
Part 3 🗣️ TEACHING MOMENT Publish docs/index.md as a live site and inspect HTML output 25 min Your group website is live and you can connect Markdown headings to HTML tags
Wrap-Up 🗣️ TEACHING MOMENT What to finish before W10 5 min Clear next steps

Part 1: Open EDA exploration in small groups (40 min)

Note to class teachers: Put students in small groups from the start (ideally 2-3 people). Begin by projecting the GitHub page and showing the difference between downloading the raw CSV vs saving the HTML page by mistake. Then let groups explore freely. Do not prescribe one single research question. Circulate and ask prompts like “what did you notice first?”, “which variable pairing is worth plotting?”, and “what would make this plot more honest?”.

Getting the data

Today you’ll work with the Gapminder dataset: country-level data on life expectancy, GDP per capita, and population across five continents.

The CSV is hosted on GitHub:

🔗 https://github.com/plotly/datasets/blob/master/gapminder_with_codes.csv

🎯 ACTION POINTS

  1. Go to the link above. You should see a preview of the CSV on GitHub.

  2. Download the raw CSV file. Click the download button (⬇️) or click “Raw” and then save the page.

    ⚠️ Be careful: if you right-click the main page and “Save as”, you’ll save an HTML file that looks like a CSV in the filename but isn’t. If pd.read_csv() throws a parsing error, this is likely why. Open your downloaded file in a text editor to check.

  3. On Nuvolos, create the folder week09/data/ if it doesn’t exist yet

  4. Upload the CSV to week09/data/gapminder_with_codes.csv

  5. Open the lab notebook at /files/week09/W09-NB01-Lab-Open-EDA.ipynb

Explore first, then plot

The notebook already has the imports and the pd.read_csv() call ready. Once you’ve loaded the data, spend some time getting to know it.

🎯 ACTION POINTS

Use code cells and Markdown cells to document your group exploration:

  1. Check the structure: rows, columns, and what each row represents.
  2. Run quick diagnostics (.head(), .dtypes, .describe(), .isna().sum()).
  3. Decide what your group wants to investigate.
  4. Prepare plot_df if you need filtering, grouping, or reshaping.
  5. Produce one plot with a narrative title that states your takeaway.
  6. Export the plot as .png and be ready to share it on Slack.

Your class teacher will bring the class together for a short discussion once most groups have produced a first plot.

💡 Seaborn plot types you might find useful later

You’ve seen these in 💻 W05 Lab, 🖥️ W08 Lecture, and 🖥️ W09 Lecture. Here’s a quick reminder of the typical plot types:

  • sns.histplot(data=df, x="col") for distributions
  • sns.boxplot(data=df, x="group_col", y="value_col") for comparing group spreads
  • sns.stripplot(data=df, x="group_col", y="value_col") for showing every data point
  • sns.scatterplot(data=df, x="col_a", y="col_b", hue="group_col") for relationships between two variables
  • sns.FacetGrid(data=df, col="group_col", col_wrap=3) then .map_dataframe(sns.histplot, x="col") for comparing distributions across groups
  • sns.lineplot(data=df, x="time_col", y="value_col", hue="group_col") for trends over time

Remember the plot_df pattern: prepare the data shape first, inspect it with print(plot_df), then plot.

Class discussion (inside Part 1)

Your class teacher will ask a few groups to present. For each plot:

  • What does this plot tell you about wealth and life expectancy?
  • Is the chart type a good fit?
  • Does the title state the finding?
  • Could anything about it mislead a reader?

Then, shifting to your own ✍️ Mini-Project 2:

  • What does your journey time distribution look like?
  • Are you planning to report mean, median, or both? Why?
  • Have you found any systematic missingness in your data?
  • Which technique from Thursday’s lecture will you use in NB03?

The goal is to hear how others are approaching similar problems.

💡 If your group plotted GDP against life expectancy, you saw a positive association. Does that mean wealth causes longer lives? This is the same correlation-vs-causation question from the lecture, and the same kind of claim you might make in your ✍️ Mini Project 2 about deprivation and journey times.

🔗 For a much richer version of this kind of analysis, explore the Gapminder interactive tools. Hans Rosling’s TED talk is also worth watching.

Fill out the survey

A polite panda holding a survey form, looking hopeful

Tell the LSE about your experience in this course!
(only 13 out of 103 of you have completed the course survey)
13%
0% – 50%
50% – 75%
75% – 100%

The LSE runs a course survey every term, and your feedback genuinely shapes how this module is taught next year. It takes about 3 minutes. 🐼

💡 Note: Please assess all the instructors you have interacted with
(Jon counts as a teacher too!).

Last updated: 19 March 2026

Part 2: Guided team formation and GitHub Classroom setup (20 min)

Note to class teachers: Run this as a guided activity. Announce constraints, give students a short window to self-organise, then help match anyone still without a team. Once teams are settled, pause the room and walk everyone through GitHub Classroom together so each student joins the correct repository.

Form your team and join GitHub Classroom

🎯 ACTION POINTS

Constraints:

  • Groups of 3 people (up to 4 if needed)
  • It’s OK to form groups with people from other Friday classes. You just need to coordinate with them though!
  • Choose a creative team name (the name of your team will be publicly visible on the Web)

Use your creativity and try to avoid names that previous DS105 cohorts have already used.

Names already taken in past years

altf4, any-ideas, api-animals-rawr, bamboo, chatgpteam, codingcats, ctr-alt-defeat, ctraltelite, data-scientists, deepsleep, definitely-struggling-105, dev-ai, ds-404, ds-one-oh-py-thon, dsos, forks, global-grooves-lab, hapi-hour, i-m-proficient-in-python, json_derulo, json-derulo, newteam, pandas-express, snack-overflow, stargazers, the-bosses, the-outliers, the-quants-of-monte-carlo, the-standard-deviants, curry-bunnies, data_scientists, data-acc, data-alchemists, data-detectives, data-wars, faust-is-mouse, kung-fu-pandas, lucky_dolphin, matthias-and-king-david-please-join, musketeers, naturalstupidity, octotastic_octopi, pandas-of-the-opera, populists, profiting-pandas, pulls, rush-hour, super-panda

🎯 ACTION POINTS

  1. Finalise your group of 3 people (up to 4 if needed).
  2. Confirm your team name before anyone clicks the assignment link.
  3. One person clicks the button below and creates the team in GitHub Classroom.

  1. All other members click the same button and join the existing team. Do not create a duplicate team.

  2. Everyone clones the repository locally:

    git clone <repository-url>

Part 3: Create and publish your group website (25 min)

Note to class teachers: This section is a live teaching moment. Please demo with the The Bosses repository so students can follow a concrete example. Show each step on the projector: create docs/index.md, push, check the Actions tab while GitHub builds the site, open the final URL, then right-click and use View page source to connect Markdown to HTML tags. Ask: “What do you think is an <h2>? Which Markdown line mapped to it?”.

Live demo flow (class teacher)

  1. (David?), (Tabby?) and (Sara?): work with our shared The Bosses repository.
  2. Create docs/index.md with a tiny page.
  3. Commit and push to main.
  4. Open the Actions tab and show the build running.
  5. Wait until the workflow completes, then open the published site.
  6. Right-click on the page, choose View page source, and compare Markdown headings to HTML tags (# -> <h1>, ## -> <h2>).
  7. Ask the class: “What do you think is an <h2>? What did it map to in your Markdown?”

🎯 ACTION POINTS

  1. Create docs/index.md with this starter content:

    # Team [Your Team Name]
    
    ## Members
    
    - [Name 1]
    - [Name 2]
    - [Name 3]
    
    ## Project ideas
    
    We're still deciding! Check back soon.
  2. Commit and push your changes to main.

  3. Enable GitHub Pages step-by-step:

    • Open your repository on GitHub.
    • Click Settings.
    • In the left sidebar, click Pages.
    • Under Build and deployment, set Source to Deploy from a branch.
    • Set Branch to main and Folder to /docs.
    • Click Save.
  4. Click the Actions tab and wait for the pages build workflow to finish.

  5. Open your site URL and confirm the page loads.

  6. Right-click the page, choose View page source, and identify where your ## Members heading appears as an HTML tag.

You’ll use this page for your group pitch on Monday W11 and for the final project submission in May.

💡 If you can’t find a group today, post in #help on Slack and your class teacher will help match you before W10.

Wrap-Up (5 min)

Note to class teachers: Quick check that every student is in a group and every group has pushed their docs/index.md. Anyone without a group should be flagged for follow-up on Slack.

Before you leave:

  • You downloaded a CSV from GitHub (the raw file, not the HTML page)
  • You explored the Gapminder data and produced at least one visualisation in small groups
  • You have a concrete idea of what your ✍️ Mini Project 2 NB03 will contain
  • You are in a group for the 📦 Group Project and your docs/index.md is live

What’s coming:

  • W10 Lecture (Thursday): Git collaboration for teams (git fetch, git pull, merge conflicts) and an introduction to SQL with an IMDb database
  • W10 Lab (Friday): Practice syncing changes with your team and pitch preparation
  • W11 (Monday): Group pitch presentations

🔗 Useful Resources

💻 Course Materials

🆘 Getting Help

  • Slack: Post questions to #help channel
  • Office Hours: Book via StudentHub
  • Check staff availability on ✋ Contact Hours