π¨βπ« Week 04 - Lecture
DS202 - Data Science for Social Scientists
Topic: Resampling methods
There are no slides this week! Instead, you will form groups (of five people) and we will run a couple of live experiments together.
- π» Bring your laptop or team up with someone who will bring one!
How this is going to work
Head over to π₯W04 Lecture Files (on Moodle) to download the files you will need for this lecture.
You will form groups of five people to work on getting the βbestβ model for your specific dataset by following a set of tasks marked as π― ACTION POINT in the RMarkdown.
As you work on the solutions, you will send me the responses on Slack and I will update the tables below live during the lecture. The final outcome will be available on the website later.
The cross-validation setup
How did I create the datasets you are using in this lecture?
Step 1:
First, I selected the dataset Wage
from the ISLR2 package in R and randomly split it into two subsets:
- \(10\%\) for external validation set: this is a portion of data that I have kept hidden from everyone.
- \(90\%\) available for model training, that I further distributed to groups. I call this the internal validation set
Step 2:
I then took the internal validation set and further split it into five random subsets:
Fold1
Fold2
Fold3
Fold4
Fold5
We call each of this subsets a fold. Therefore, we have 5 folds of data.
Step 3:
The point of doing this is to perform cross-validation. The goal is to answer the following question:
How well does our model perform on data it has not seen yet?
Notice that this goes beyond asessing goodness-of-fit. Instead of focusing on how well our model fits the current data, we ask whether it would still be generalisable should be receive new data.
How do we do that? We could run the same model on different subsets of the data, holding one of the folds for testing.
For example:
Train
Train
Train
Train
Test
Step 4:
Since I have split the data into five folds, I could train and test algorithms using five different splits of my data. This 5-fold cross-validation looks like this:
Split 1:
Test
Train
Train
Train
Train
Split 2:
Train
Test
Train
Train
Train
Split 3:
Train
Train
Test
Train
Train
Split 4:
Train
Train
Train
Test
Train
Split 5:
Train
Train
Train
Train
Test
Each group will be working on a separate split of this data. You will train your models on 80% of the data, evaluate the goodness-of-fit in the training data, then assess the performance in the test data. Once you find a model that has a good performance, balanced in training vs test data, then we will see how well your model performs in the external data set.
The action points below will be updated during the lecture:
π― ACTION POINT 1: Check if your numbers match
ποΈ DATASET 1
Distribution of above150k
(Training)
No Yes 1910 250
(Test)
No Yes 473 67
ποΈ DATASET 2
Distribution of above150k
(Training)
No Yes 1900 260
(Test)
No Yes 483 57
ποΈ DATASET 3
Distribution of above150k
(Training)
No Yes 1904 256
(Test)
No Yes 479 61
ποΈ DATASET 4
Distribution of above150k
(Training)
No Yes 1915 245
(Test)
No Yes 468 72
ποΈ DATASET 5
Distribution of above150k
(Training)
No Yes 1903 257
(Test)
No Yes 480 60
ποΈ EXTERNAL DATA
Distribution of above150k
Just for your knowledge:
No Yes 274 26
π― ACTION POINT 2 & 3: Tell us your best threshold (training data)
DATASET 1
π£οΈ Sofie
- Best threshold = $$
(Training Stats)
- Accuracy = $$
- TNR = $$
- TPR = $$
- Precision = $$
- Recall = $$
- F1-score = $$
π£οΈ Yujia
- Best threshold = \(0.245\)
(Training Stats)
- Accuracy = \(82.41 \%\)
- TNR = $ 84.97 %$
- TPR = $ 62.80 %$
- Precision = $ 35.36 %$
- F1-score = \(0.4524496\)
DATASET 2
π£οΈ Vansh
- Best threshold = \(0.2\)
(Training Stats)
- Accuracy = \(79.44 \%\)
- TNR = \(80.58 \%\)
- TPR = \(71.15 \%\)
- Precision = \(33.39 \%\)
- F1-score = \(0.4545\)
π£οΈ Ekki
- Best threshold = $$
(Training Stats)
- Accuracy = $$
- TNR = $$
- TPR = $$
- Precision = $$
- Recall = $$
- F1-score = $$
DATASET 3
π£οΈ Yoyo
- Best threshold = \(0.23\)
(Training Stats)
- Accuracy = \(81.85 \%\)
- TNR = \(84.40 \%\)
- TPR = \(62.89 \%\)
- Precision = \(35.15 \%\)
- F1-score = \(0.4509804\)
π£οΈ Diljot
- Best threshold = $$
(Training Stats)
- Accuracy = $$
- TNR = $$
- TPR = $$
- Precision = $$
- Recall = $$
- F1-score = $$
DATASET 4
π£οΈ Ashley
- Best threshold = \(0.23\)
(Training Stats)
- Accuracy = $ 83.94 %$
- TNR = $ 86.95 %$
- TPR = \(60.41 \%\)
- Precision = $ 37.19 %$
- F1-score = \(0.4603421\)
π£οΈ Lisa
- Best threshold = $$
(Training Stats)
- Accuracy = $$
- TNR = $$
- TPR = $$
- Precision = $$
- Recall = $$
- F1-score = $$
DATASET 5
π£οΈ Paul Keenan
- Best threshold = \(0.245\)
(Training Stats)
- Accuracy = \(82.59259 \%\)
- TNR = \(85.54913 \%\)
- TPR (Recall) = \(60.70039 \%\)
- Precision = \(36.1949 \%\)
- F1-score = \(0.4534884\)
π£οΈ Andres
- Best threshold = \(0.25\)
(Training Stats)
- Accuracy = \(83.01 \%\)
- TNR = \(86.50 \%\)
- TPR (Recall) = \(57.20 \%\)
- Precision = \(36.39 \%\)
- F1-score = $0.4447806 $
π― ACTION POINT 4 & 5: What is the best threshold both for training and test?
DATASET 1
DATASET 1
π£οΈ <Person>
- Best threshold = $$
(Training vs Test)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
DATASET 2
DATASET 2
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
DATASET 3
DATASET 3
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
DATASET 4
DATASET 4
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
DATASET 5
DATASET 5
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π£οΈ <Person>
- Best threshold = $ $ vs $ $
(Training Stats)
- Accuracy = $ $ vs $ $
- TNR = $ $ vs $ $
- TPR = $ $ vs $ $
- Precision = $ $ vs $ $
- Recall = $ $ vs $ $
- F1-score = $ $ vs $ $
π― ACTION POINT 6 & 7: What about the external dataset?
You will be asked to upload your model to Slack. I will then run your model on the external data and report back the results!