👨‍🏫 Week 04 - Lecture

DS202 - Data Science for Social Scientists

Author

Published

21 October 2022

Topic: Resampling methods

There are no slides this week! Instead, you will form groups (of five people) and we will run a couple of live experiments together.

💻 Bring your laptop or team up with someone who will bring one!

How this is going to work

Head over to 📥W04 Lecture Files (on Moodle) to download the files you will need for this lecture.
You will form groups of five people to work on getting the “best” model for your specific dataset by following a set of tasks marked as 🎯 ACTION POINT in the RMarkdown.
As you work on the solutions, you will send me the responses on Slack and I will update the tables below live during the lecture. The final outcome will be available on the website later.

If there are no slides, how do I revise this later?

You can read all about resampling methods in our textbook, more specifically (James et al. 2021, chap. 5).
The RMarkdown you will use during the lecture will help you revise later, too.

The cross-validation setup

How did I create the datasets you are using in this lecture?

Step 1:

First, I selected the dataset Wage from the ISLR2 package in R and randomly split it into two subsets:

$10\%$ for external validation set: this is a portion of data that I have kept hidden from everyone.
$90\%$ available for model training, that I further distributed to groups. I call this the internal validation set

Step 2:

I then took the internal validation set and further split it into five random subsets:

Fold1

Fold2

Fold3

Fold4

Fold5

We call each of this subsets a fold. Therefore, we have 5 folds of data.

Step 3:

The point of doing this is to perform cross-validation. The goal is to answer the following question:

How well does our model perform on data it has not seen yet?

Notice that this goes beyond asessing goodness-of-fit. Instead of focusing on how well our model fits the current data, we ask whether it would still be generalisable should be receive new data.

How do we do that? We could run the same model on different subsets of the data, holding one of the folds for testing.

For example:

Train

Test

Step 4:

Since I have split the data into five folds, I could train and test algorithms using five different splits of my data. This 5-fold cross-validation looks like this:

Split 1:

Test

Train

Split 2:

Train

Test

Train

Split 3:

Train

Test

Train

Split 4:

Train

Test

Train

Split 5:

Train

Test

Each group will be working on a separate split of this data. You will train your models on 80% of the data, evaluate the goodness-of-fit in the training data, then assess the performance in the test data. Once you find a model that has a good performance, balanced in training vs test data, then we will see how well your model performs in the external data set.

The action points below will be updated during the lecture:

🎯 ACTION POINT 1: Check if your numbers match

🗄️ DATASET 1

Distribution of above150k

(Training)

  No  Yes 
1910  250

(Test)

 No Yes 
473  67

🗄️ DATASET 2

Distribution of above150k

(Training)

  No  Yes 
1900  260

(Test)

 No Yes 
483  57

🗄️ DATASET 3

Distribution of above150k

(Training)

  No  Yes 
1904  256

(Test)

 No Yes 
479  61

🗄️ DATASET 4

Distribution of above150k

(Training)

  No  Yes 
1915  245

(Test)

 No Yes 
468  72

🗄️ DATASET 5

Distribution of above150k

(Training)

  No  Yes 
1903  257

(Test)

 No Yes 
480  60

🗄️ EXTERNAL DATA

Distribution of above150k

Just for your knowledge:

 No Yes 
274  26

🎯 ACTION POINT 2 & 3: Tell us your best threshold (training data)

DATASET 1

🗣️ Sofie

Best threshold = $$

(Training Stats)

Accuracy = $$
TNR = $$
TPR = $$
Precision = $$
Recall = $$
F1-score = $$

🗣️ Yujia

Best threshold = $0.245$

(Training Stats)

Accuracy = $82.41 \%$
TNR = $ 84.97 %$
TPR = $ 62.80 %$
Precision = $ 35.36 %$
F1-score = $0.4524496$

DATASET 2

🗣️ Vansh

Best threshold = $0.2$

(Training Stats)

Accuracy = $79.44 \%$
TNR = $80.58 \%$
TPR = $71.15 \%$
Precision = $33.39 \%$
F1-score = $0.4545$

🗣️ Ekki

Best threshold = $$

(Training Stats)

Accuracy = $$
TNR = $$
TPR = $$
Precision = $$
Recall = $$
F1-score = $$

DATASET 3

🗣️ Yoyo

Best threshold = $0.23$

(Training Stats)

Accuracy = $81.85 \%$
TNR = $84.40 \%$
TPR = $62.89 \%$
Precision = $35.15 \%$
F1-score = $0.4509804$

🗣️ Diljot

Best threshold = $$

(Training Stats)

Accuracy = $$
TNR = $$
TPR = $$
Precision = $$
Recall = $$
F1-score = $$

DATASET 4

🗣️ Ashley

Best threshold = $0.23$

(Training Stats)

Accuracy = $ 83.94 %$
TNR = $ 86.95 %$
TPR = $60.41 \%$
Precision = $ 37.19 %$
F1-score = $0.4603421$

🗣️ Lisa

Best threshold = $$

(Training Stats)

Accuracy = $$
TNR = $$
TPR = $$
Precision = $$
Recall = $$
F1-score = $$

DATASET 5

🗣️ Paul Keenan

Best threshold = $0.245$

(Training Stats)

Accuracy = $82.59259 \%$
TNR = $85.54913 \%$
TPR (Recall) = $60.70039 \%$
Precision = $36.1949 \%$
F1-score = $0.4534884$

🗣️ Andres

Best threshold = $0.25$

(Training Stats)

Accuracy = $83.01 \%$
TNR = $86.50 \%$
TPR (Recall) = $57.20 \%$
Precision = $36.39 \%$
F1-score = $0.4447806 $

🎯 ACTION POINT 4 & 5: What is the best threshold both for training and test?

DATASET 1

🗣️ <Person>

Best threshold = $$

(Training vs Test)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

DATASET 2

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

DATASET 3

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

DATASET 4

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

DATASET 5

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🗣️ <Person>

Best threshold = $ $ vs $ $

(Training Stats)

Accuracy = $ $ vs $ $
TNR = $ $ vs $ $
TPR = $ $ vs $ $
Precision = $ $ vs $ $
Recall = $ $ vs $ $
F1-score = $ $ vs $ $

🎯 ACTION POINT 6 & 7: What about the external dataset?

You will be asked to upload your model to Slack. I will then run your model on the external data and report back the results!

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. Second edition. Springer Texts in Statistics. New York NY: Springer. https://www.statlearning.com/.