πŸ›£οΈ Week 02 Lab - Roadmap (90 min)

2023/24 Winter Term

Author

Tabtim Duenger

Welcome to the second DS202 lab!

We’ll kick things off with the UK House Prices dataset, updated through to June 2023 (UK Office for National Statistics (ONS) 2023). The main goal is to practice reshaping the original data a little bit using tidyr, dplyr, and ggplot.

πŸ₯… Learning Objectives

  • Configure your working environment for the course, including R, RStudio or VS Code, and Quarto documents.
  • Start using markdown to write your notes and code for lab activities.
  • Learn how to read data from a .csv file into a data frame.
  • Practice consulting and reading the documentation to understand how a function works.
  • Practice with dplyr and tidyselect to manipulate and structure data.
  • Practice using ggplot to create plots to visualise your data.

πŸ“‹ Lab Tasks

No need to wait! Start reading the tasks and tackling the action points below when you come to the classroom.

Part 0: Export your chat logs (~ 3 min)

As part of the GENIAL project, we ask that you fill out the following form as soon as you come to the lab:

🎯 ACTION POINTS

  1. πŸ”— CLICK HERE to export your chat log.

    Thanks for being GENIAL! You are now one step closer to earning some prizes! 🎟️

πŸ‘‰ NOTE: You MUST complete the initial form.

If you really don’t want to participate in GENIAL1, just answer β€˜No’ to the Terms & Conditions question - your e-mail address will be deleted from GENIAL’s database the following week.

πŸ›  Part I: Data manipulation with dplyr(30 min)

Your goal is to transform the data to look like this:

Table 1. Example of the data frame we want to create. Sorted by date in descending order (most recent month first).
date region yearly_change
2023-06-01 England 1.9
2023-06-01 Northern Ireland 2.7
2023-06-01 Scotland 0.0
2023-06-01 United Kingdom 1.7
2023-06-01 Wales 0.6
2023-05-01 England 1.7
2023-05-01 Northern Ireland 2.7
2023-05-01 Scotland 1.9

The steps below will help you think about the problem and guide you through the process of creating the data frame above, but you will have to figure out – by reading the documentation, searching online and experimenting – which functions to use and how to use them.

Links to documentation

🎯 ACTION POINTS

The action points below will help you get there. Work in pairs or groups of three. When stuck, try to look at the documentation of the functions you are using. If that doesn’t help, ask your class instructor.

  1. Click on the link below to download the .qmd file for this lab. Save it in the DS202W folder you created last week. If you need a refresher on the setup, refer back to Part II of last week’s lab.

  1. In your .qmd file, insert a new code chunk and import the required libraries:
library(dplyr)       # for data manipulation
library(tidyr)       # for data reshaping
library(readr)       # for reading data
library(lubridate)   # for working with dates
library(tidyselect)  # for selecting columns
library(tidyverse)  # imports dplyr, readr, lubridate and more
library(tidyselect) # for selecting columns
  1. Create a folder data inside your project folder. This is where you will save the data file you will download in the next step.

  2. The following code downloads the UK House Prices data up to June 2023 as a .csv file and place it in the data folder you created in the previous step. Copy it to your .qmd file and run it.

url <- "http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/UK-HPI-full-file-2023-06.csv"
download.file(url, "data/UK-HPI-full-file-2023-06.csv")

Alternatively, you can click here to download the file directly from the website.

  1. On a separate code chunk, read the data from the .csv file into a data frame, using the read_csv() function from the readr package.
uk_hpi <- readr::read_csv("data/UK-HPI-full-file-2023-06.csv")
  1. Inspect the data frame using the glimpse() function from the dplyr package.
glimpse(uk_hpi)
  1. πŸ‘₯ DISCUSS IN PAIRS/GROUPS: what are the relevant columns for our analysis and how to select them.

  2. Write code using the select() function from the dplyr package to create a new data frame with the selected columns.

  1. Add another step to your code to rename the columns using the rename() function from the dplyr package using the template below:
  1. Now use the filter() function from dplyr to keep only the rows of the data frame corresponding to the regions corresponding to UK countries.
  1. πŸ‘₯ DISCUSS IN PAIRS/GROUPS: If you look at the 8 first rows in your data frame, does it look exactly like the one shown in Table 1? If not, why not? Can you spot the problem and fix it?

πŸ§‘πŸ»β€πŸ« TEACHING MOMENT: At the end of this section, your class teacher will guide you through a solution before you proceed to the next task.


πŸ“‹ Part 2: Data visualisation with ggplot(55 min)

In this part you’ll continue to do the same investigative work by checking the ggplot2 documentation. See if you can figure out how to create a plot like the one below using the data frame you have just created.

Annual house price change by UK country. Source: (UK HM Land Registry 2023)

πŸ“‹ Bonus task

If you finished early, you can give the following task a go. Here’s a further plot that requires a bit more data manipulation of the original dataset using dplyr to finally visualise it. The aim is to plot the difference in house prices between the regions illustrated and the UK.

Monthly average house price difference in London against the United Kingdom. Source: (UK HM Land Registry 2023)

Hint: Obtaining a table similar to the below is the first step to reproducing the plot.

Table 1. Example of the data frame we want to create. Sorted by date in descending order (most recent month first).
date PriceDiffInnerLdn PriceDiffOuterLdn
2023-06-01 2.15 1.9
2023-05-01 2.15 2.7
2023-04-01 2.15 0.0
2023-03-01 2.17 1.7
2023-02-01 2.17 0.6
2023-01-01 2.18 1.7
2022-12-01 2.16 2.7
2022-11-01 2.15 1.9

References

UK HM Land Registry. 2023. β€œUK House Price Index Summary: June 2023.” London: UK HM Land Registry. https://www.gov.uk/government/statistics/uk-house-price-index-for-june-2023/uk-house-price-index-summary-june-2023.
UK Office for National Statistics (ONS). 2023. β€œUK House Price Index: Data Downloads June 2023.” Statistical data set. GOV.UK Statistical Datasets. https://www.gov.uk/government/statistical-data-sets/uk-house-price-index-data-downloads-june-2023.

Footnotes

  1. We’re gonna cry a little bit, not gonna lie. But no hard feelings. We’ll get over it.β†©οΈŽ