🗓️ Week 03
Computational Thinking and Programming

DS101 – Fundamentals of Data Science

Dr. Ghita Berrada

LSE Data Science Institute

09 Oct 2023

The old data science Venn diagram

This diagram (Conway 2010) is quite outdated, but today, it will prove helpful to think about the computer skills required in data science projects.

We will focus on elements of these “hacking” skills today.
Hands-on in-class exercises

Computational Thinking

What is computational thinking?

The mental skills and practices related to the following aspects of computing:

1️⃣ Designing computations that get computers to do the jobs for us

2️⃣ Explaining and interpreting the world as a complex information process

1️⃣ Designing computations

We usually have to think like computers to design the proper computations for the job.

Input: numbers, symbols, lists
Output: a solution
Computation: deterministic calculations & symbolic manipulation

2️⃣ Explaining and interpreting the world

We can use computers to create models of the world.

Let us look at a simple, old example of a computational model for urban segregation (McCown 2014) ➡️

What is computational thinking?

Today our focus is on the first aspect:

1️⃣ Designing computations that get computers to do the jobs for us

2️⃣ Explaining and interpreting the world as a complex information process

Algorithms

Recipes for computers

flowchart LR
  A(Input) --> B{Set of instructions/rules</br>to obtain the expected</br> output from the given input}
  B --> C(Output)

Algorithms

Clear and unambiguous: Each step of the algorithm should have one unambiguous meaning
Finiteness: The algorithm is finite i.e it terminates after a finite time
Language independent: the algorithm is composed of a set of instructions that can be implemented in any programming language and still lead to the same outcome
Feasible: The algorithm must be simple, generic and practical so to be executed with the resources available. It cannot depend on future or non-existing technologies.

graph LR
  B((Algorithm characteristics))-->A(Well-defined inputs)
  B--> C(Well-defined outputs)
  B-->D(Clear and unambiguous)
  B-->E(Finiteness)  
  B-->F(Language-independent)
  B-->G(Feasible)

Let’s look at an example

Problem Definition:

Whenever I receive a list of numbers, I want to ensure this list is ordered.

Breakdown

Input:

[10, 20, 42, 29, 50]
Output:

[10, 20, 29, 42, 50]
Computation:

How would you solve it❓

🎯 Action Point

Group up with the people of your respective tables and try to come up with a recipe that solves the problem, regardless of the list size.
Test that your recipe works for the following list:

[237, 153, 311, 33, 854, 212, 368, 42, 892, 755]

Time for a break 🍵

After the break:

An another algorithm example
R vs Python
Computational Notebooks
Example from Kaggle

(no slides, all live demos)

Time to look at another example of algorithm

graph LR
A(A) ---|2| B(B)
B ---|5| D(D)
A ---|6| C(C)
C ---|8| D
D ---|10| E(E)
D ---|15| F(F)
F ---|6| E
E ---|2| G(G)
F ---|6| G

Graph:

Nodes (A, B,C,…)
Edges (Undirected, weighted)

🎯 How do we get the shortest path between A and all the other nodes in this graph (i.e between A and B, A and D, A and G, etc…?
Pair with the people in your respective tables and discuss.

One possible solution: The Dijkstra algorithm

Step 1: Mark the source node with a current distance of 0 and the rest with infinity

Step 2: Set the non-visited node with the smallest current distance, as the current node, let’s say c (initially c is set to the source node).

Step 3: For each neighbour N of the current node c: add the current distance of c with the weight of the edge c-N. If it is smaller than the current distance of N, set it as the new current distance of N.

Step 4: Mark c as visited.

Step 5: Go to Step 2 if there are any unvisited nodes.

Applying Dijkstra on our example graph

graph LR
A(A) ---|2| B(B)
B ---|5| D(D)
A ---|6| C(C)
C ---|8| D
D ---|10| E(E)
D ---|15| F(F)
F ---|6| E
E ---|2| G(G)
F ---|6| G

Source node: A

Initially :

	A	B	C	D	E	F	G
Distance	0	\(\infty\)	\(\infty\)	\(\infty\)	\(\infty\)	\(\infty\)	\(\infty\)