π£οΈ Week 08 - Lab Roadmap (90 min)
Comparing models and model evaluation
π₯ Learning Objectives
By the end of this lab, you will be able to:
- Learn how to evaluate models using the bias-variance tradeoff
- Learn how to use the
tune_grid()
function to do a grid search - Learn how to fit support vector machines
- Learn how to use ensemble methods

This week you can still use ChatGPT if you like but you will not be asked to do so. Go about the lab as usual, freely interacting with others, the lab material, the Web or any other resource you may find useful.
The only thing we ask is to fill out the brief survey at the end of the lab: π link, and when asked if you were asked to use ChatGPT, please answer no.
Thanks for being a GENIAL participant!
π Preparation
We will be using the same World Values Survey dataset as in last weekβs lab.
Use the link below to download the lab materials:
We will post solutions to Part III on Tuesday afternoon, only after all labs have ended.
π Lab Tasks
Here are the instructions for this lab:
Import required libraries:
# Tidyverse packages we will use
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
# Tidymodel packages we will use
library(rsample)
library(yardstick)
library(parsnip)
library(recipes)
library(workflows)
library(rpart)
library(tune)
# New packages for SVM
library(LiblineaR)
library(kernlab)
Read the data set:
It is the dataset youβve downloaded last week.
# Modify the filepath if needed
<- "data/WVS_Wave7_modified.csv"
filepath <- read_csv(filepath) wvs_data
Part I - Explore decision trees, overfitting, and the bias-variance tradeoff (20 min)
In this lab, weβll learn how to assess overfitting on decision trees, compare different models, and evaluate a model with respect to the bias-variance tradeoff. Letβs begin by going back to decision trees.
π§βπ« TEACHING MOMENT:
(Your class teacher will guide you through this section. Just run all the code chunks below together with your class teacher.)
Our goal in this lab is to explore diverse models and different methods to help evaluate whether a model is over- or under-fitting on a dataset. We will be doing this using models to predict the same variable we created in last weekβs lab, in order to predict someoneβs trust in institutions (i.e., police, army, justice courts, press and television, labor unions, civil services, political parties, parliament, and government), as represented in our dataset by the variable wvs_data["TRUST_INSTITUTIONS"]
. See last weekβs lab for a refresher on the logic behind how we have created this variable.
π― ACTION POINTS:
- Create the
TRUST_INSTITUTIONS
column by running the following code.
<-
wvs_data %>%
wvs_data rowwise() %>%
mutate(MEAN_I_TRUST_INSTITUTIONS = mean(c(I_TRUSTARMY, I_TRUSTCIVILSERVICES, I_TRUSTPOLICE, I_TRUSTCOURTS,
I_TRUSTPRESS, I_TRUSTTELEVISION, I_TRUSTUNIONS, I_TRUSTGVT,
I_TRUSTPARTIES, I_TRUSTPARLIAMENT), na.rm = TRUE),
TRUST_INSTITUTIONS = (1 - MEAN_I_TRUST_INSTITUTIONS) > 0.5,
TRUST_INSTITUTIONS = factor(TRUST_INSTITUTIONS,
labels=c("No", "Yes"),
levels=c(FALSE, TRUE))) %>%
ungroup()
- Remove the columns we used to compute the
TRUST_INSTITUTIONS
column.
<- c("I_TRUSTARMY", "I_TRUSTCIVILSERVICES", "I_TRUSTPOLICE", "I_TRUSTCOURTS",
cols_to_remove "I_TRUSTPRESS", "I_TRUSTTELEVISION", "I_TRUSTUNIONS", "I_TRUSTGVT",
"I_TRUSTPARTIES", "I_TRUSTPARLIAMENT", "MEAN_I_TRUST_INSTITUTIONS",
"D_INTERVIEW", "W_WEIGHT", "S018", "Q_MODE", "K_DURATION",
"Q65", "Q67", "Q68", "Q69", "Q70", "Q71", "Q72", "Q73", "Q74",
"Q275","Q276","Q277","Q278","Q275A","Q276A","Q277A","Q278A")
# Filter data to remove unnecessary columns
<-
wvs_data %>%
wvs_data select(-all_of(cols_to_remove))
- Letβs again randomly split our dataset into a training set (containing 70% of the rows in our data) and a test set (including 30% of the rows in our data), and retrieve our resulting training and testing sets.
set.seed(123)
# Randomly split the initial data frame into training and testing sets (70% and 30% of rows, respectively)
<- initial_split(wvs_data, prop = 0.7)
split <- training(split)
training_data <- testing(split) testing_data
- Create a recipe, a decision tree model specification, and wrap them up into a workflow.
# Create a recipe
<-
wvs_rec recipe(TRUST_INSTITUTIONS ~ .,
data = training_data) %>%
prep()
# Create the specification of a model
<-
dt_spec decision_tree(mode = "classification",
# You must specify the parameter you want to tune
min_n = tune()) %>%
set_engine("rpart")
<-
wflow workflow() %>%
add_recipe(wvs_rec) %>%
add_model(dt_spec)
π§βπ« TEACHING MOMENT: Your class teacher will briefly explain the concept of bias-variance tradeoff.
- The bias-variance tradeoff is an essential concept to consider when choosing a model. Bias is a concept describing the difference between the modelβs average prediction and the expected value. A model with high bias is said to be underfitting to the data. It makes simplistic assumptions about the training data, which makes it difficult to learn the underlying pattern. Variance captures the generalisability of the model. High variance typically means that we are overfitting to our training data, finding patterns and complexity that are a product of randomness as opposed to some real trend. Ideally, we are looking for a model with low bias, and low variance.
In order to understand the bias-variance tradeoff, we will be varying a hyperparameter of the decision tree to produce more complex decision tree models for comparison. We will do this using the tune_grid() function from the tune package, which uses grid search to train different models based on the parameter values you have chosen. The hyperparameter we are varying here is the min_n
- a parameter that refers to the minimum number of observations required in a terminal (leaf) node.
set.seed(234)
<- vfold_cv(training_data, v = 5)
folds
<- control_grid(verbose = FALSE, save_pred = TRUE)
ctrl
# Create a grid specifying the min number we want to try
= expand_grid(
grid_search min_n = seq(1, 500, length.out = 5)
)
# This will take a little while
<- tune_grid(
dt_res
wflow,# This computes k-fold CV during tuning
resamples = folds,
grid = grid_search,
# Making sure we keep the out-of-sample predictions for each resample during tuning
control = control_grid(verbose = FALSE, save_pred = TRUE)
)
If we want to find out which parameter created the best model, we can run the following command.
show_best(dt_res)
- Plot the values for the roc_auc metric for each value of
min_n
we have tried with the grid search.
%>%
dt_res autoplot(metric="roc_auc")
- Now visualise the out-of-sample predictions for each value of
min_n
we have tried - e.g. the average validation set predictions.
collect_predictions(dt_res) %>%
group_by(min_n) %>%
roc_auc(truth=TRUST_INSTITUTIONS, .pred_Yes, event_level="second") %>%
ggplot() +
geom_point(aes(x=min_n, y=.estimate)) +
geom_line(aes(x=min_n, y=.estimate))
π£οΈ DISCUSSION:
We trained different models using a variety of values for our min_n
parameter. Looking at the training vs testing performance at each value, how do you think the parameter changes the overfitting or underfitting of the model?
Part II - Comparing models (30 min)
π§βπ« TEACHING MOMENT: Your class teacher will briefly explain the concept of decision boundaries and how different models produce different decision boundaries.
So far we have looked at decision trees. But how well do other models compare to predicting someoneβs trust in institutions?
Create a new recipe, model specification, and workflow for the support vector machine model. Note that this is using a linear kernel, SVM models can also be used with polynomial kernels.
# SVMs do not accept NAs or categorical variables
# Create the specification of a support vector machine (SVM) model
<-
wvs_rec recipe(TRUST_INSTITUTIONS ~ .,
data = training_data) %>%
step_rm(all_nominal_predictors()) %>%
step_naomit(everything(), skip = FALSE) %>%
prep()
# Create the specification of a support vector machine (SVM) model
<-
svm_spec svm_linear(mode = "classification") %>%
set_engine("LiblineaR")
<-
wflow_svm workflow() %>%
add_recipe(wvs_rec) %>%
add_model(svm_spec)
Now that you have a workflow to fit an SVM model:
- Fit the model to a testing and training set, calculating an appropriate metric (e.g. confusion matrix, ROC/AUC curve). How does it compare to the decision tree?
- Change a parameter of the SVM model to find out if it improved the predictions (e.g. confusion matrix, AUC/ROC curve).
π£οΈ DISCUSSION:
You will have noticed a difference in the pre-processing of our dataframe for the SVM model. What do you think the pros and cons of the various algorithms are and the assumptions they make? Given the different pre-processing requirements of different algorithms, does this influence your view on which cases they are best used?
(Bonus)
- Train an SVM model, using a polynomial kernel.
Part III - Ensemble methods (40 min)
π§βπ« TEACHING MOMENT: Your class teacher will briefly explain the concept of ensemble methods.
While we can tweak hyperparameters to reduce overfitting and underfitting to try and improve the bias-variance tradeoff in decision trees, we also have techniques such as βensemble methodsβ which can also help to improve modelling results. Ensemble learning helps to improve predictions by combining several models, which can lead to better predictive performance than compared to a single model. The basic idea is to learn a set of classifiers (experts) and to allow them to vote.
Ensemble algorithms, such as bootstrapping aggregation (bagging) and boosting, which aim to reduce variance at the small cost of bias in decision trees.
We suggest you have these tabs open in your browser:
- The
tidymodels
documentation page (you can open tabs with documentation pages for each package if you need to) - The
tidyverse
documentation page (you can open tabs with documentation pages for each package if you need to)
This is a model specification for boosting decision trees.
<- boost_tree(trees = 200, tree_depth = 4) %>%
boost_spec set_engine("xgboost") %>%
set_mode("regression")
- Apply boosting to our dataset using a workflow.
- How does this compare to decision trees?
- Tune the hyperparameters using a grid search to improve your model.
(Bonus)
- Apply bagging to decision trees to try and reduce overfitting.
- Fill out this brief survey at the end of the lab if you are part of GENIAL: π link (requires LSE login).