π£οΈ Week 02 Lab - Roadmap (90 min)
2023/24 Winter Term
Welcome to the second DS202 lab!
Weβll kick things off with the UK House Prices dataset, updated through to June 2023 (UK Office for National Statistics (ONS) 2023). The main goal is to practice reshaping the original data a little bit using tidyr, dplyr, and ggplot.
π₯ Learning Objectives
- Configure your working environment for the course, including R, RStudio or VS Code, and Quarto documents.
- Start using markdown to write your notes and code for lab activities.
- Learn how to read data from a
.csv
file into a data frame. - Practice consulting and reading the documentation to understand how a function works.
- Practice with
dplyr
andtidyselect
to manipulate and structure data. - Practice using
ggplot
to create plots to visualise your data.
π Lab Tasks
No need to wait! Start reading the tasks and tackling the action points below when you come to the classroom.
Part 0: Export your chat logs (~ 3 min)
As part of the GENIAL project, we ask that you fill out the following form as soon as you come to the lab:
π― ACTION POINTS
π CLICK HERE to export your chat log.
Thanks for being GENIAL! You are now one step closer to earning some prizes! ποΈ
π NOTE: You MUST complete the initial form.
If you really donβt want to participate in GENIAL1, just answer βNoβ to the Terms & Conditions question - your e-mail address will be deleted from GENIALβs database the following week.
π Part I: Data manipulation with dplyr
(30 min)
Your goal is to transform the data to look like this:
date | region | yearly_change |
---|---|---|
2023-06-01 | England | 1.9 |
2023-06-01 | Northern Ireland | 2.7 |
2023-06-01 | Scotland | 0.0 |
2023-06-01 | United Kingdom | 1.7 |
2023-06-01 | Wales | 0.6 |
2023-05-01 | England | 1.7 |
2023-05-01 | Northern Ireland | 2.7 |
2023-05-01 | Scotland | 1.9 |
The steps below will help you think about the problem and guide you through the process of creating the data frame above, but you will have to figure out β by reading the documentation, searching online and experimenting β which functions to use and how to use them.
Links to documentation
π― ACTION POINTS
The action points below will help you get there. Work in pairs or groups of three. When stuck, try to look at the documentation of the functions you are using. If that doesnβt help, ask your class instructor.
- Click on the link below to download the
.qmd
file for this lab. Save it in theDS202W
folder you created last week. If you need a refresher on the setup, refer back to Part II of last weekβs lab.
- In your
.qmd
file, insert a new code chunk and import the required libraries:
library(dplyr) # for data manipulation
library(tidyr) # for data reshaping
library(readr) # for reading data
library(lubridate) # for working with dates
library(tidyselect) # for selecting columns
library(tidyverse) # imports dplyr, readr, lubridate and more
library(tidyselect) # for selecting columns
Create a folder
data
inside your project folder. This is where you will save the data file you will download in the next step.The following code downloads the UK House Prices data up to June 2023 as a
.csv
file and place it in thedata
folder you created in the previous step. Copy it to your.qmd
file and run it.
<- "http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/UK-HPI-full-file-2023-06.csv"
url download.file(url, "data/UK-HPI-full-file-2023-06.csv")
Alternatively, you can click here to download the file directly from the website.
- On a separate code chunk, read the data from the
.csv
file into a data frame, using theread_csv()
function from thereadr
package.
<- readr::read_csv("data/UK-HPI-full-file-2023-06.csv") uk_hpi
- Inspect the data frame using the
glimpse()
function from thedplyr
package.
glimpse(uk_hpi)
π₯ DISCUSS IN PAIRS/GROUPS: what are the relevant columns for our analysis and how to select them.
Write code using the
select()
function from thedplyr
package to create a new data frame with the selected columns.
- Add another step to your code to rename the columns using the
rename()
function from thedplyr
package using the template below:
- Now use the
filter()
function fromdplyr
to keep only the rows of the data frame corresponding to the regions corresponding to UK countries.
- π₯ DISCUSS IN PAIRS/GROUPS: If you look at the 8 first rows in your data frame, does it look exactly like the one shown in Table 1? If not, why not? Can you spot the problem and fix it?
π§π»βπ« TEACHING MOMENT: At the end of this section, your class teacher will guide you through a solution before you proceed to the next task.
π Part 2: Data visualisation with ggplot
(55 min)
In this part youβll continue to do the same investigative work by checking the ggplot2
documentation. See if you can figure out how to create a plot like the one below using the data frame you have just created.
π Bonus task
If you finished early, you can give the following task a go. Hereβs a further plot that requires a bit more data manipulation of the original dataset using dplyr to finally visualise it. The aim is to plot the difference in house prices between the regions illustrated and the UK.
Hint: Obtaining a table similar to the below is the first step to reproducing the plot.
date | PriceDiffInnerLdn | PriceDiffOuterLdn |
---|---|---|
2023-06-01 | 2.15 | 1.9 |
2023-05-01 | 2.15 | 2.7 |
2023-04-01 | 2.15 | 0.0 |
2023-03-01 | 2.17 | 1.7 |
2023-02-01 | 2.17 | 0.6 |
2023-01-01 | 2.18 | 1.7 |
2022-12-01 | 2.16 | 2.7 |
2022-11-01 | 2.15 | 1.9 |
References
Footnotes
Weβre gonna cry a little bit, not gonna lie. But no hard feelings. Weβll get over it.β©οΈ