DS101 – Fundamentals of Data Science
13 Oct 2025
Excel is great for:
But dangerous for:
Let’s see why with some real examples…

Source: (Vincent 2020)
What happened:
The scale:
The fix: 27 genes officially renamed (MARCH1 → MARCHF1)

Source: (Hern 2020)
The incident:
Impact: Lives at risk

Source: (“The Excel Depression” 2013)
Reinhart-Rogoff spreadsheet error:
Lesson: Always check your formulas!
Source: (Kwak 2013)

JP Morgan VaR Model (2012)
Lesson:
Even sophisticated financial models fail on fragile, manually managed software. Critical calculations need robust, auditable systems.

Source: (Beales 2013)
A pattern of Excel-related failures:
2007 CPDOs (i.e Constant proportion debt obligations): Moody’s coding error inflated structured finance ratings (pre-crisis)
2012 London Whale: JPMorgan’s Excel-based risk model failures ($6.2B loss)
2013 Reinhart-Rogoff: Spreadsheet error undermined influential austerity research
The pattern: Office tools used as substitutes for proper systems and critical thinking

Excel is NOT for:
Excel IS fine for:
Just know its limitations!
“Excel doesn’t make mistakes. People do.”
These stories aren’t really about spreadsheets.
They’re about thinking — or rather, the lack of computational thinking.
| Spreadsheet World | Computational Thinking |
|---|---|
| 🧍♀️ Manual formulas | 🤖 Explicit procedures |
| ❌ Hidden logic | ✅ Transparent logic |
| 😬 Fragile systems | 💪 Reproducible steps |
| 🌀 Chaos | 🧩 Clarity |
A mindset for solving problems so precisely that a computer could follow your logic — even if it’s terrible coffee-fuelled logic ☕💻
More formally, it involves:
1️⃣ Designing computations that get computers to do jobs for us
2️⃣ Explaining and interpreting the world as a complex information process
1️⃣ Designing computations (thinking like a computer)
2️⃣ Explaining and interpreting the world (modeling complexity)

💬 Discussion Prompt:
- “What patterns do you think emerge if residents move when unhappy?”
Input: grid of neighbouring households (two colors to distinguish type), tolerance threshold (e.g 30% of neighbours of different type)
Output: reordered grid
💬 Observation:
- Mild individual preferences → strong segregation patterns
- No one intended overall segregation → emergent behavior
“Next, we’ll focus on designing precise rules — algorithms — to solve tasks.”
Imagine a robot has never seen tea.
Task: Write the first three instructions to make a cup of tea.
💬 Discuss at your table: what’s tricky about giving precise instructions?
One possible precise algorithm:
💬 Step-by-step, unambiguous, finite, feasible — now let’s formalize algorithm properties.
Definition:
- A recipe for solving a problem in a finite, step-by-step, unambiguous way

Properties:

Input:
[10, 20, 42, 29, 50]
Output:
[10, 20, 29, 42, 50]
Computation:
How would you solve it❓
Group up with the people of your respective tables and try to come up with a recipe that solves the problem, regardless of the list size.
Test that your recipe works for the following list:
[237, 153, 311, 33, 854, 212, 368, 42, 892, 755]

After the break:
Scenario: Campus walk from Marshall Building to other locations
Graph (distances in meters):
Discovery:
Step 0 – Initialization:
| MAR | FAW | COL | SSC | LRB | |
|---|---|---|---|---|---|
| Distance | 0 | \(\infty\) | \(\infty\) | \(\infty\) | \(\infty\) |
Step 1 – Current node = MAR
| MAR | FAW | COL | SSC | LRB | |
|---|---|---|---|---|---|
| Distance | 0 | 200 | 300 | 350 | 400 |
Step 2 – Current node = FAW (200)
| MAR | FAW | COL | SSC | LRB | |
|---|---|---|---|---|---|
| Distance | 0 | 200 | 300 | 350 | 400 |
Step 3 – Current node = COL (300)
| MAR | FAW | COL | SSC | LRB | |
|---|---|---|---|---|---|
| Distance | 0 | 200 | 300 | 350 | 400 |
Step 4 – Current node = SSC (350)
| MAR | FAW | COL | SSC | LRB | |
|---|---|---|---|---|---|
| Distance | 0 | 200 | 300 | 350 | 400 |
Step 5 – Current node = LRB (400)
| MAR | FAW | COL | SSC | LRB | |
|---|---|---|---|---|---|
| Distance | 0 | 200 | 300 | 350 | 400 |
Final shortest distances & paths:
| Node | Distance | Path |
|---|---|---|
| MAR | 0 | MAR |
| FAW | 200 | MAR → FAW |
| COL | 300 | MAR → COL |
| SSC | 350 | MAR → SSC |
| LRB | 400 | MAR → LRB |
Step 1: Mark the source node with a current distance of 0 and the rest with infinity
Step 2: Set the non-visited node with the smallest current distance, as the current node, let’s say c (initially c is set to the source node).
Step 3: For each neighbour N of the current node c: add the current distance of c with the weight of the edge c-N. If it is smaller than the current distance of N, set it as the new current distance of N.
Step 4: Mark c as visited.
Step 5: Go to Step 2 if there are any unvisited nodes.
Scenario: Let’s votes for favorite café/tea shop!
Here are the voters’ preferences?
Preferences:
| Student | Rank 1 | Rank 2 | Rank 3 |
|---|---|---|---|
| Vansh | Pürcha | Garrick | Blank Street Café |
| Unai | Blank Street Café | Pürcha | Garrick |
| Manuela | Garrick | Pürcha | Blank Street Café |
| Ria | Garrick | Blank Street Café | Pürcha |
| Luka | Pürcha | Garrick | Blank Street Café |
Discovery: How would you go about the vote?
Majority Vote
Count first-place votes:
No majority (>50%) → need tie-break rules or runoff
Observation: Borda count can break ties where majority fails
Borda Count
Assign points to each rank: 1st place = 3 pts, 2nd = 2 pts, 3rd = 1 pt
Sum points:
✅ Winner (Borda): Garrick (11 pts)
Borda count
Majority voting
Suppose you had the following scenario:
Students: Vansh, Unai, Manuela Tasks: Prepare slides, Book meeting room room, Bring snacks
Original costs (time in minutes):
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | 10 | 15 | 9 |
| Unai | 9 | 18 | 5 |
| Manuela | 6 | 14 | 3 |
How would you go about assigning the tasks?
Step 1 — Subtract the minimum from every element of each row
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | 10 | 15 | 9 |
| Unai | 9 | 18 | 5 |
| Manuela | 6 | 14 | 3 |
\(->\)
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | 1 | 6 | 0 |
| Unai | 4 | 13 | 0 |
| Manuela | 3 | 11 | 0 |
Step 2 — Subtract the minimum of each column
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | 1 | 6 | 0 |
| Unai | 4 | 13 | 0 |
| Manuela | 3 | 11 | 0 |
\(->\)
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | 0 | 0 | 0 |
| Unai | 3 | 7 | 0 |
| Manuela | 2 | 5 | 0 |
Step 3 — Cross 0s with the minimum number of lines needed
❌
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | |||
| Unai | |||
| Manuela |
✅
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | |||
| Unai | 3 | 7 | |
| Manuela | 2 | 5 |
If the number of lines is < \(n\), the number of rows and columns in the matrix, reduce the matrix again (i.e step 4). In our example, we have 2 lines, but we have a 3x3 matrix, so we’ll reduce again. If we didn’t have this case, we’d directly jump to step 5.
Step 4 — Find the smallest entry not covered by any line, and subtract this entry from the entire non-crossed matrix (and add it to the elements crossed twice).
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | |||
| Unai | 3 | 7 | |
| Manuela | 2 | 5 |
\(->\)
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | |||
| Unai | 1 | 5 | |
| Manuela | 0 | 3 |
then go back to step 3: cross the 0s!
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | |||
| Unai | 1 | 5 | |
| Manuela |
This time, we have three lines so we can go to step 5.
Step 5 — Assign Tasks and Students starting with the line with only one zero!
| Slides | Room | Snacks | |
|---|---|---|---|
| Vansh | 0 | 0 | 2 |
| Unai | 1 | 5 | 0 |
| Manuela | 0 | 3 | 0 |

LSE DS101 2025/26 Autumn Term