🗓️ Week 03
Computational Thinking and Programming

DS101 – Fundamentals of Data Science

09 Oct 2023

The old data science Venn diagram

  • This diagram (Conway 2010) is quite outdated, but today, it will prove helpful to think about the computer skills required in data science projects.
  • We will focus on elements of these “hacking” skills today.
  • Hands-on in-class exercises

Computational Thinking

What is computational thinking?

The mental skills and practices related to the following aspects of computing:

1️⃣ Designing computations that get computers to do the jobs for us

2️⃣ Explaining and interpreting the world as a complex information process

1️⃣ Designing computations

We usually have to think like computers to design the proper computations for the job.

  • Input: numbers, symbols, lists
  • Output: a solution
  • Computation: deterministic calculations & symbolic manipulation

2️⃣ Explaining and interpreting the world

We can use computers to create models of the world.

Let us look at a simple, old example of a computational model for urban segregation (McCown 2014) ➡️

What is computational thinking?

Today our focus is on the first aspect:

1️⃣ Designing computations that get computers to do the jobs for us

2️⃣ Explaining and interpreting the world as a complex information process

Algorithms

Recipes for computers

flowchart LR
  A(Input) --> B{Set of instructions/rules</br>to obtain the expected</br> output from the given input}
  B --> C(Output)

Algorithms

  • Clear and unambiguous: Each step of the algorithm should have one unambiguous meaning
  • Finiteness: The algorithm is finite i.e it terminates after a finite time
  • Language independent: the algorithm is composed of a set of instructions that can be implemented in any programming language and still lead to the same outcome
  • Feasible: The algorithm must be simple, generic and practical so to be executed with the resources available. It cannot depend on future or non-existing technologies.

graph LR
  B((Algorithm characteristics))-->A(Well-defined inputs)
  B--> C(Well-defined outputs)
  B-->D(Clear and unambiguous)
  B-->E(Finiteness)  
  B-->F(Language-independent)
  B-->G(Feasible)

Let’s look at an example

Problem Definition:

Whenever I receive a list of numbers, I want to ensure this list is ordered.

Breakdown

  • Input:

    [10, 20, 42, 29, 50]

  • Output:

    [10, 20, 29, 42, 50]

  • Computation:

    How would you solve it❓

🎯 Action Point

  • Group up with the people of your respective tables and try to come up with a recipe that solves the problem, regardless of the list size.

  • Test that your recipe works for the following list:

[237, 153, 311, 33, 854, 212, 368, 42, 892, 755]

Time for a break 🍵

After the break:

  • An another algorithm example
  • R vs Python
  • Computational Notebooks
  • Example from Kaggle

(no slides, all live demos)

Time to look at another example of algorithm

graph LR
A(A) ---|2| B(B)
B ---|5| D(D)
A ---|6| C(C)
C ---|8| D
D ---|10| E(E)
D ---|15| F(F)
F ---|6| E
E ---|2| G(G)
F ---|6| G

Graph:

  • Nodes (A, B,C,…)
  • Edges (Undirected, weighted)

🎯 How do we get the shortest path between A and all the other nodes in this graph (i.e between A and B, A and D, A and G, etc…?
Pair with the people in your respective tables and discuss.

One possible solution: The Dijkstra algorithm

Step 1: Mark the source node with a current distance of 0 and the rest with infinity

Step 2: Set the non-visited node with the smallest current distance, as the current node, let’s say c (initially c is set to the source node).

Step 3: For each neighbour N of the current node c: add the current distance of c with the weight of the edge c-N. If it is smaller than the current distance of N, set it as the new current distance of N.

Step 4: Mark c as visited.

Step 5: Go to Step 2 if there are any unvisited nodes.

Applying Dijkstra on our example graph

graph LR
A(A) ---|2| B(B)
B ---|5| D(D)
A ---|6| C(C)
C ---|8| D
D ---|10| E(E)
D ---|15| F(F)
F ---|6| E
E ---|2| G(G)
F ---|6| G

Source node: A

  • Initially :
A B C D E F G
Distance 0 \(\infty\) \(\infty\) \(\infty\) \(\infty\) \(\infty\) \(\infty\)

Unvisited nodes: {A,B,C,D,E,F,G}

  • Updating distance of nodes adjacent to A i.e B and C
A B C D E F G
Distance 0 2 6 \(\infty\) \(\infty\) \(\infty\) \(\infty\)

Closest node to source node (based on current distances): B

nodes on path: {A, B}

Unvisited nodes: {A,B,C,D,E,F,G}

  • Next : C and D (adjacent to A and B)
A B C D E F G
Distance 0 2 6 7 \(\infty\) \(\infty\) \(\infty\)

Unvisited nodes: {A,B,C,D,E,F,G}

Applying Dijkstra on our example graph

graph LR
A(A) ---|2| B(B)
B ---|5| D(D)
A ---|6| C(C)
C ---|8| D
D ---|10| E(E)
D ---|15| F(F)
F ---|6| E
E ---|2| G(G)
F ---|6| G

  • Next node : D (adjacent to B and C)
A B C D E F G
Distance 0 2 6 7 (via B not C 14) \(\infty\) \(\infty\) \(\infty\)

Unvisited nodes: {A,B,C,D,E,F,G}

  • Next : nodes E and F
A B C D E F G
Distance 0 2 6 7 (via B not C 14) 17 (A–B–D–E) 22 (A–B–D–F) \(\infty\)

Unvisited nodes: {A,B,C,D,E,F,G}

Applying Dijkstra to our graph example

graph LR
A(A) ---|2| B(B)
B ---|5| D(D)
A ---|6| C(C)
C ---|8| D
D ---|10| E(E)
D ---|15| F(F)
F ---|6| E
E ---|2| G(G)
F ---|6| G

  • Next : nodes F and G
A B C D E F G
Distance 0 2 6 7 (via B not C 14) 17 (A–B–D–F) 22 (A–B–D–E) 19 (A–B–D–F–G)

For F: distance unchanged since second option (A–B–D–F–E) more costly (23 instead of 22)

Unvisited nodes: {A,B,C,D,E,F,G}

  • Last unvisited node: F
A B C D E F G
Distance 0 2 6 7 (via B not C 14) 17 (A–B–D–E) 22 (A–B–D–F) 19 (A–B–D–F–G)

Possibilities: 22 (A–B–D–F), 23 (A–B–D–E–F), 25 (A–B–D–E–G–F)

Previous distance from (A–B–D–F) unchanged since lowest

No more unvisited nodes

Final shortest distance calculation from A to other nodes

A B C D E F G
Distance 0 2 6 7 17 22 19

Use cases for shortest path algorithms (and Dijkstra algorithm)

  • find optimal directions between locations (e.g Google Maps-see here for a further explanation)/GPS Navigation: considering real-traffic conditions, calculate the shortest path between a source and destination in navigation systems
  • determine the shortest path between network nodes, helping in efficiently routing data packets (used in computer networks but also telephone networks)
  • recommendation systems: find the most relevant and shortest path between items or users
  • social networks: suggest a list of friends that a user may know

For other applications, see also Schulz, Wagner, and Weihe (2000), Rosyidi et al. (2014) and Ahdan and Setiawansyah (2021)

References

Ahdan, Syaiful, and Setiawansyah Setiawansyah. 2021. “Android-Based Geolocation Technology on a Blood Donation System (BDS) Using the Dijkstra Algorithm.” IJAIT (International Journal of Applied Information Technology), 1–15.
Conway, Drew. 2010. “The Data Science Venn Diagram.” Drew Conway. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram.
Denning, Peter J., and Matti Tedre. 2019. Computational Thinking. The MIT Press Essential Knowledge Series. Cambridge, Massachusetts: The MIT Press.
HackerEarth. 2023. “Bubble Sort Visualize Sorting Algorithms.” HackerEarth. https://www.hackerearth.com/practice/algorithms/sorting/bubble-sort/visualize/.
McCown, Frank. 2014. “Schelling’s Model of Segregation.” Schelling’s Model of Segregation. http://nifty.stanford.edu/2014/mccown-schelling-model-segregation/.
Rosyidi, Lukman, Hening Pram Pradityo, Dedi Gunawan, and Riri Fitri Sari. 2014. “Timebase Dynamic Weight for Dijkstra Algorithm Implementation in Route Planning Software.” In 2014 International Conference on Intelligent Green Building and Smart Grid (IGBSG), 1–4. IEEE.
Schulz, Frank, Dorothea Wagner, and Karsten Weihe. 2000. “Dijkstra’s Algorithm on-Line: An Empirical Case Study from Public Railroad Transport.” Journal of Experimental Algorithmics (JEA) 5: 12–es.