πŸ›£οΈ Week 02 Lab - Roadmap (90 min)

2023/24 Autumn Term

Author

Welcome to the first DS202 lab!

We’ll kick things off with the UK House Prices dataset, updated through June 2023 (UK Office for National Statistics (ONS) 2023). The main goal is to practice reshaping the original data a little bit. And if time allows, you will see how to create a plot with the resulting data frame.

πŸ₯… Learning Objectives

  • Configure your working environment for the course, including R, RStudio or VS Code, and Quarto documents.
  • Start using markdown to write your notes and code for lab activities.
  • Learn how to read data from a .csv file into a data frame.
  • Practice consulting and reading the documentation to understand how a function works.
  • Practice with dplyr and tidyselect to manipulate and structure data
About Using ChatGPT and Similar AI Tools

We’re not against you using AI assistants like ChatGPT in this course. However, for this particular lab, we recommend doing it on your own (searching online is fine, though). This task is your chance to see where you stand with your R programming skills. If you use an AI assistant now, you won’t get a clear picture of your abilities.

If you get stuck, consider asking your instructor or classmates for help. You can also consult the documentation for the functions and packages you’re using, which is a valuable skill in this field. Don’t worry, you’ll have plenty of chances to use AI help in later labs.

πŸ“‹ Lab Tasks

🀝 Part I: Introductions (15 min)

πŸ§‘πŸ»β€πŸ« TEACHING MOMENT: Your chance to get to know your classmates and instructors.

βš™οΈ Part II: Setup (15 min)

We will be working with Quarto documents throughout this course, these are .qmd files that combine code, text, and output in a single document. Quarto documents are a great way to keep your code, notes, and output organised and reproducible. (Assignments will also require that you work with .qmd files.)

🎯 ACTION POINTS

Follow the steps below according to your preferred IDE.

RStudio

Using RStudio:

  1. Open RStudio.

  2. Create a new project (File > New Project… > New Directory > New Project) and save it in an appropriate folder on your computer. Consider naming it DS202A.

  3. Click on the link below to download the .qmd file for this lab. Save it in the DS202A folder you created in step 2.

  1. Type your responses to the tasks in this lab directly on the .qmd file.
  • Don’t be limited by just editing code chunks, feel free to add comments and notes in Markdown language as well.
  • Try β€˜rendering’ the .qmd file to HTML (click the β€˜Knit’ button in the top-right corner of the Source pane) to see what your code and notes look like in a nice report.
VS Code VS Code

Using VS Code:

  1. Create a new folder in an appropriate folder on your computer. Consider naming it DS202A.

  2. Open VS Code and open the folder you have just created (File > Open Folder). Let’s call this your β€˜project’ folder.

  3. Click on the link below to download the .qmd file for this lab. Save it in the DS202A folder you created in step 2.

  1. Type your responses to the tasks in this lab directly on the .qmd file.
  • Don’t be limited by just editing code chunks, feel free to add comments and notes in Markdown language as well.
  • Try β€˜rendering’ the .qmd file to HTML (click the β€˜Render’ button in the top-right corner of the tab) to see what your code and notes look like in a nice report.
(Optional) Click here to learn how to create a .qmd file from scratch

We will begin by creating a .qmd file for this lab. You will write your solutions to the tasks below in this file.

🎯 ACTION POINTS

RStudio

Using RStudio:

  1. Open RStudio.

  2. Create a new project (File > New Project… > New Directory > New Project) and save it in an appropriate folder on your computer. Consider naming it DS202A.

  3. On this project, create a new Quarto markdown document by clicking on the File menu in the top-left corner of the RStudio window, then selecting Quarto Document from the drop-down menu.

Screenshot showing where the File > Quarto Document option is located in RStudio (Mac OS)

Figure 1. How to create a new Quarto markdown (RStudio version)

This will create an empty but pre-configured .qmd file in the Source pane of RStudio.

  1. Save the new file as LSE_DS202A_W02_lab.qmd somewhere inside the project folder you created in step 2.
  • You can choose to create sub-folders inside your project folder to keep your files organised. For example, you could create a folder called labs and save this file inside it.
  1. Keep adding your solutions to the tasks below to this file as you work through the lab. Don’t be limited by just code chunks, feel free to add comments and notes in Markdown language as well.

Screenshot showing the content of a new .qmd file in RStudio (Mac OS)

Figure 2. What a Quarto markdown document looks like (RStudio version)
VS Code VS Code

Using VS Code:

  1. Create a new folder in an appropriate folder on your computer. Consider naming it DS202A.

  2. Open VS Code and open the folder you have just created (File > Open Folder). Let’s call this your β€˜project’ folder.

  3. On this project, create a new Quarto markdown document by clicking File > New File menu in the top-left corner of the VS Code window. hen selecting Quarto Document from the drop-down menu.

Screenshot showing what shows up when you click File > New File in VS Code (Mac OS)

Figure 1. How to create a new Quarto markdown (VS Code version)

This will create an empty but pre-configured .qmd file in the Source pane of RStudio.

  1. Save the new file as LSE_DS202A_W02_lab.qmd somewhere inside the project folder you created in step 2.
  • You can choose to create sub-folders inside your project folder to keep your files organised. For example, you could create a folder called labs and save this file inside it.
  1. Keep adding your solutions to the tasks below to this file as you work through the lab. Don’t be limited by just code chunks, feel free to add comments and notes in Markdown language as well.

Screenshot showing the content of a new .qmd file in VS Code (Mac OS)

Figure 2. What a Quarto markdown document looks like (VSCode version)

πŸ§‘πŸ»β€πŸ« TEACHING MOMENT: Your class teacher will inform you of the next steps to take.

πŸ› οΈ Part III: Data manipulation with dplyr (55 min)

Your goal is to transform the data to look like this:

Table 1. Example of the data frame we want to create. Sorted by date in descending order (most recent month first).
date region yearly_change
2023-06-01 England 1.9
2023-06-01 Northern Ireland 2.7
2023-06-01 Scotland 0.0
2023-06-01 United Kingdom 1.7
2023-06-01 Wales 0.6
2023-05-01 England 1.7
2023-05-01 Northern Ireland 2.7
2023-05-01 Scotland 1.9

The steps below will help you think about the problem and guide you through the process of creating the data frame above, but you will have to figure out – by reading the documentation, searching online and experimenting – which functions to use and how to use them.

Links to documentation

🎯 ACTION POINTS

The action points below will help you get there. Work in pairs or groups of three. When stuck, try to look at the documentation of the functions you are using. If that doesn’t help, ask your class instructor.

  1. In your .qmd file, insert a new code chunk and import the required libraries:
library(dplyr)       # for data manipulation
library(tidyr)       # for data reshaping
library(readr)       # for reading data
library(lubridate)   # for working with dates
library(tidyselect)  # for selecting columns
library(tidyverse)  # imports dplyr, readr, lubridate and more
library(tidyselect) # for selecting columns
  1. Create a folder data inside your project folder. This is where you will save the data file you will download in the next step.

  2. The following code downloads the UK House Prices data up to June 2023 as a .csv file and place it in the data folder you created in the previous step. Copy it to your .qmd file and run it.

url <- "http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/UK-HPI-full-file-2023-06.csv"
download.file(url, "data/UK-HPI-full-file-2023-06.csv")

Alternatively, you could click here to download the file directly from the website – select the UK HPI Full File option.

  1. On a separate code chunk, read the data from the .csv file into a data frame, using the read_csv() function from the readr package.
uk_hpi <- readr::read_csv("data/UK-HPI-full-file-2023-06.csv")
  1. Inspect the data frame using the glimpse() function from the dplyr package.
glimpse(uk_hpi)
  1. πŸ‘₯ DISCUSS IN PAIRS/GROUPS: what are the relevant columns for our analysis and how to select them.

  2. Write code using the select() function from the dplyr package to create a new data frame with the selected columns.

  1. Add another step to your code to rename the columns using the rename() function from the dplyr package using the template below:
  1. Now use the filter() function from dplyr to keep only the rows of the data frame corresponding to the regions corresponding to UK countries.
  1. πŸ‘₯ DISCUSS IN PAIRS/GROUPS: If you look at the 8 first rows in your data frame, does it look exactly like the one shown in Table 1? If not, why not? Can you spot the problem and fix it?

πŸ§‘πŸ»β€πŸ« TEACHING MOMENT: At the end of this section, your class teacher will guide you through a solution before proceeding to the bonus task (if time allows).


πŸ“‹ Bonus Task

If you finished early, keep doing the same investigative work by checking the ggplot2 documentation. See if you can figure out how to create a plot like the one below using the data frame you have just created.

Annual price change by country

Annual house price change by UK country. Source: (UK HM Land Registry 2023)

References

UK HM Land Registry. 2023. β€œUK House Price Index Summary: June 2023.” London: UK HM Land Registry. https://www.gov.uk/government/statistics/uk-house-price-index-for-june-2023/uk-house-price-index-summary-june-2023.
UK Office for National Statistics (ONS). 2023. β€œUK House Price Index: Data Downloads June 2023.” Statistical data set. GOV.UK Statistical Datasets. https://www.gov.uk/government/statistical-data-sets/uk-house-price-index-data-downloads-june-2023.