πŸ‘¨β€πŸ« Week 04 - Lecture

DS202 - Data Science for Social Scientists

Author
Published

21 October 2022

Topic: Resampling methods

There are no slides this week! Instead, you will form groups (of five people) and we will run a couple of live experiments together.

  • πŸ’» Bring your laptop or team up with someone who will bring one!

How this is going to work

  1. Head over to πŸ“₯W04 Lecture Files (on Moodle) to download the files you will need for this lecture.

  2. You will form groups of five people to work on getting the β€œbest” model for your specific dataset by following a set of tasks marked as 🎯 ACTION POINT in the RMarkdown.

  3. As you work on the solutions, you will send me the responses on Slack and I will update the tables below live during the lecture. The final outcome will be available on the website later.

  • You can read all about resampling methods in our textbook, more specifically (James et al. 2021, chap. 5).
  • The RMarkdown you will use during the lecture will help you revise later, too.

The cross-validation setup

How did I create the datasets you are using in this lecture?

Step 1:

First, I selected the dataset Wage from the ISLR2 package in R and randomly split it into two subsets:

  • \(10\%\) for external validation set: this is a portion of data that I have kept hidden from everyone.
  • \(90\%\) available for model training, that I further distributed to groups. I call this the internal validation set

Step 2:

I then took the internal validation set and further split it into five random subsets:

Fold1

Fold2

Fold3

Fold4

Fold5

We call each of this subsets a fold. Therefore, we have 5 folds of data.

Step 3:

The point of doing this is to perform cross-validation. The goal is to answer the following question:

How well does our model perform on data it has not seen yet?

Notice that this goes beyond asessing goodness-of-fit. Instead of focusing on how well our model fits the current data, we ask whether it would still be generalisable should be receive new data.

How do we do that? We could run the same model on different subsets of the data, holding one of the folds for testing.

For example:

Train

Train

Train

Train

Test

Step 4:

Since I have split the data into five folds, I could train and test algorithms using five different splits of my data. This 5-fold cross-validation looks like this:

Split 1:

Test

Train

Train

Train

Train


Split 2:

Train

Test

Train

Train

Train


Split 3:

Train

Train

Test

Train

Train


Split 4:

Train

Train

Train

Test

Train


Split 5:

Train

Train

Train

Train

Test

Each group will be working on a separate split of this data. You will train your models on 80% of the data, evaluate the goodness-of-fit in the training data, then assess the performance in the test data. Once you find a model that has a good performance, balanced in training vs test data, then we will see how well your model performs in the external data set.

The action points below will be updated during the lecture:

🎯 ACTION POINT 1: Check if your numbers match


πŸ—„οΈ DATASET 1

Distribution of above150k

(Training)

  No  Yes 
1910  250 

(Test)

 No Yes 
473  67 
πŸ—„οΈ DATASET 2

Distribution of above150k

(Training)

  No  Yes 
1900  260 

(Test)

 No Yes 
483  57 
πŸ—„οΈ DATASET 3

Distribution of above150k

(Training)

  No  Yes 
1904  256 

(Test)

 No Yes 
479  61 
πŸ—„οΈ DATASET 4

Distribution of above150k

(Training)

  No  Yes 
1915  245 

(Test)

 No Yes 
468  72  
πŸ—„οΈ DATASET 5

Distribution of above150k

(Training)

  No  Yes 
1903  257 

(Test)

 No Yes 
480  60 
πŸ—„οΈ EXTERNAL DATA

Distribution of above150k

Just for your knowledge:

 No Yes 
274  26 

🎯 ACTION POINT 2 & 3: Tell us your best threshold (training data)

DATASET 1

πŸ—£οΈ Sofie

  • Best threshold = $$

(Training Stats)

  • Accuracy = $$
  • TNR = $$
  • TPR = $$
  • Precision = $$
  • Recall = $$
  • F1-score = $$

πŸ—£οΈ Yujia

  • Best threshold = \(0.245\)

(Training Stats)

  • Accuracy = \(82.41 \%\)
  • TNR = $ 84.97 %$
  • TPR = $ 62.80 %$
  • Precision = $ 35.36 %$
  • F1-score = \(0.4524496\)

DATASET 2

πŸ—£οΈ Vansh

  • Best threshold = \(0.2\)

(Training Stats)

  • Accuracy = \(79.44 \%\)
  • TNR = \(80.58 \%\)
  • TPR = \(71.15 \%\)
  • Precision = \(33.39 \%\)
  • F1-score = \(0.4545\)

πŸ—£οΈ Ekki

  • Best threshold = $$

(Training Stats)

  • Accuracy = $$
  • TNR = $$
  • TPR = $$
  • Precision = $$
  • Recall = $$
  • F1-score = $$

DATASET 3

πŸ—£οΈ Yoyo

  • Best threshold = \(0.23\)

(Training Stats)

  • Accuracy = \(81.85 \%\)
  • TNR = \(84.40 \%\)
  • TPR = \(62.89 \%\)
  • Precision = \(35.15 \%\)
  • F1-score = \(0.4509804\)

πŸ—£οΈ Diljot

  • Best threshold = $$

(Training Stats)

  • Accuracy = $$
  • TNR = $$
  • TPR = $$
  • Precision = $$
  • Recall = $$
  • F1-score = $$

DATASET 4

πŸ—£οΈ Ashley

  • Best threshold = \(0.23\)

(Training Stats)

  • Accuracy = $ 83.94 %$
  • TNR = $ 86.95 %$
  • TPR = \(60.41 \%\)
  • Precision = $ 37.19 %$
  • F1-score = \(0.4603421\)

πŸ—£οΈ Lisa

  • Best threshold = $$

(Training Stats)

  • Accuracy = $$
  • TNR = $$
  • TPR = $$
  • Precision = $$
  • Recall = $$
  • F1-score = $$

DATASET 5

πŸ—£οΈ Paul Keenan

  • Best threshold = \(0.245\)

(Training Stats)

  • Accuracy = \(82.59259 \%\)
  • TNR = \(85.54913 \%\)
  • TPR (Recall) = \(60.70039 \%\)
  • Precision = \(36.1949 \%\)
  • F1-score = \(0.4534884\)

πŸ—£οΈ Andres

  • Best threshold = \(0.25\)

(Training Stats)

  • Accuracy = \(83.01 \%\)
  • TNR = \(86.50 \%\)
  • TPR (Recall) = \(57.20 \%\)
  • Precision = \(36.39 \%\)
  • F1-score = $0.4447806 $
🎯 ACTION POINT 4 & 5: What is the best threshold both for training and test?

DATASET 1

DATASET 1

πŸ—£οΈ <Person>

  • Best threshold = $$

(Training vs Test)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $
DATASET 2

DATASET 2

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $
DATASET 3

DATASET 3

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $
DATASET 4

DATASET 4

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $
DATASET 5

DATASET 5

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $

πŸ—£οΈ <Person>

  • Best threshold = $ $ vs $ $

(Training Stats)

  • Accuracy = $ $ vs $ $
  • TNR = $ $ vs $ $
  • TPR = $ $ vs $ $
  • Precision = $ $ vs $ $
  • Recall = $ $ vs $ $
  • F1-score = $ $ vs $ $
🎯 ACTION POINT 6 & 7: What about the external dataset?


You will be asked to upload your model to Slack. I will then run your model on the external data and report back the results!

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. Second edition. Springer Texts in Statistics. New York NY: Springer. https://www.statlearning.com/.