π£οΈ Week 02 Lab - Roadmap (90 min)
2023/24 Autumn Term
Welcome to the first DS202 lab!
Weβll kick things off with the UK House Prices dataset, updated through June 2023 (UK Office for National Statistics (ONS) 2023). The main goal is to practice reshaping the original data a little bit. And if time allows, you will see how to create a plot with the resulting data frame.
π₯ Learning Objectives
- Configure your working environment for the course, including R, RStudio or VS Code, and Quarto documents.
- Start using markdown to write your notes and code for lab activities.
- Learn how to read data from a
.csv
file into a data frame. - Practice consulting and reading the documentation to understand how a function works.
- Practice with
dplyr
andtidyselect
to manipulate and structure data
Weβre not against you using AI assistants like ChatGPT in this course. However, for this particular lab, we recommend doing it on your own (searching online is fine, though). This task is your chance to see where you stand with your R programming skills. If you use an AI assistant now, you wonβt get a clear picture of your abilities.
If you get stuck, consider asking your instructor or classmates for help. You can also consult the documentation for the functions and packages youβre using, which is a valuable skill in this field. Donβt worry, youβll have plenty of chances to use AI help in later labs.
π Lab Tasks
π€ Part I: Introductions (15 min)
π§π»βπ« TEACHING MOMENT: Your chance to get to know your classmates and instructors.
βοΈ Part II: Setup (15 min)
We will be working with Quarto documents throughout this course, these are .qmd
files that combine code, text, and output in a single document. Quarto documents are a great way to keep your code, notes, and output organised and reproducible. (Assignments will also require that you work with .qmd
files.)
π― ACTION POINTS
Follow the steps below according to your preferred IDE.
Using RStudio:
Open RStudio.
Create a new project (File > New Project⦠> New Directory > New Project) and save it in an appropriate folder on your computer. Consider naming it
DS202A
.Click on the link below to download the
.qmd
file for this lab. Save it in theDS202A
folder you created in step 2.
- Type your responses to the tasks in this lab directly on the
.qmd
file.
- Donβt be limited by just editing code chunks, feel free to add comments and notes in Markdown language as well.
- Try βrenderingβ the
.qmd
file to HTML (click the βKnitβ button in the top-right corner of the Source pane) to see what your code and notes look like in a nice report.
VS Code
Using VS Code:
Create a new folder in an appropriate folder on your computer. Consider naming it
DS202A
.Open VS Code and open the folder you have just created (File > Open Folder). Letβs call this your βprojectβ folder.
Click on the link below to download the
.qmd
file for this lab. Save it in theDS202A
folder you created in step 2.
- Type your responses to the tasks in this lab directly on the
.qmd
file.
- Donβt be limited by just editing code chunks, feel free to add comments and notes in Markdown language as well.
- Try βrenderingβ the
.qmd
file to HTML (click the βRenderβ button in the top-right corner of the tab) to see what your code and notes look like in a nice report.
(Optional) Click here to learn how to create a .qmd
file from scratch
We will begin by creating a .qmd
file for this lab. You will write your solutions to the tasks below in this file.
π― ACTION POINTS
Using RStudio:
Open RStudio.
Create a new project (File > New Project⦠> New Directory > New Project) and save it in an appropriate folder on your computer. Consider naming it
DS202A
.On this project, create a new Quarto markdown document by clicking on the File menu in the top-left corner of the RStudio window, then selecting Quarto Document from the drop-down menu.
This will create an empty but pre-configured .qmd
file in the Source pane of RStudio.
- Save the new file as
LSE_DS202A_W02_lab.qmd
somewhere inside the project folder you created in step 2.
- You can choose to create sub-folders inside your project folder to keep your files organised. For example, you could create a folder called
labs
and save this file inside it.
- Keep adding your solutions to the tasks below to this file as you work through the lab. Donβt be limited by just code chunks, feel free to add comments and notes in Markdown language as well.
VS Code
Using VS Code:
Create a new folder in an appropriate folder on your computer. Consider naming it
DS202A
.Open VS Code and open the folder you have just created (File > Open Folder). Letβs call this your βprojectβ folder.
On this project, create a new Quarto markdown document by clicking File > New File menu in the top-left corner of the VS Code window. hen selecting Quarto Document from the drop-down menu.
This will create an empty but pre-configured .qmd
file in the Source pane of RStudio.
- Save the new file as
LSE_DS202A_W02_lab.qmd
somewhere inside the project folder you created in step 2.
- You can choose to create sub-folders inside your project folder to keep your files organised. For example, you could create a folder called
labs
and save this file inside it.
- Keep adding your solutions to the tasks below to this file as you work through the lab. Donβt be limited by just code chunks, feel free to add comments and notes in Markdown language as well.
π§π»βπ« TEACHING MOMENT: Your class teacher will inform you of the next steps to take.
π οΈ Part III: Data manipulation with dplyr
(55 min)
Your goal is to transform the data to look like this:
date | region | yearly_change |
---|---|---|
2023-06-01 | England | 1.9 |
2023-06-01 | Northern Ireland | 2.7 |
2023-06-01 | Scotland | 0.0 |
2023-06-01 | United Kingdom | 1.7 |
2023-06-01 | Wales | 0.6 |
2023-05-01 | England | 1.7 |
2023-05-01 | Northern Ireland | 2.7 |
2023-05-01 | Scotland | 1.9 |
The steps below will help you think about the problem and guide you through the process of creating the data frame above, but you will have to figure out β by reading the documentation, searching online and experimenting β which functions to use and how to use them.
Links to documentation
π― ACTION POINTS
The action points below will help you get there. Work in pairs or groups of three. When stuck, try to look at the documentation of the functions you are using. If that doesnβt help, ask your class instructor.
- In your
.qmd
file, insert a new code chunk and import the required libraries:
library(dplyr) # for data manipulation
library(tidyr) # for data reshaping
library(readr) # for reading data
library(lubridate) # for working with dates
library(tidyselect) # for selecting columns
library(tidyverse) # imports dplyr, readr, lubridate and more
library(tidyselect) # for selecting columns
Create a folder
data
inside your project folder. This is where you will save the data file you will download in the next step.The following code downloads the UK House Prices data up to June 2023 as a
.csv
file and place it in thedata
folder you created in the previous step. Copy it to your.qmd
file and run it.
<- "http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/UK-HPI-full-file-2023-06.csv"
url download.file(url, "data/UK-HPI-full-file-2023-06.csv")
Alternatively, you could click here to download the file directly from the website β select the UK HPI Full File option.
- On a separate code chunk, read the data from the
.csv
file into a data frame, using theread_csv()
function from thereadr
package.
<- readr::read_csv("data/UK-HPI-full-file-2023-06.csv") uk_hpi
- Inspect the data frame using the
glimpse()
function from thedplyr
package.
glimpse(uk_hpi)
π₯ DISCUSS IN PAIRS/GROUPS: what are the relevant columns for our analysis and how to select them.
Write code using the
select()
function from thedplyr
package to create a new data frame with the selected columns.
- Add another step to your code to rename the columns using the
rename()
function from thedplyr
package using the template below:
- Now use the
filter()
function fromdplyr
to keep only the rows of the data frame corresponding to the regions corresponding to UK countries.
- π₯ DISCUSS IN PAIRS/GROUPS: If you look at the 8 first rows in your data frame, does it look exactly like the one shown in Table 1? If not, why not? Can you spot the problem and fix it?
π§π»βπ« TEACHING MOMENT: At the end of this section, your class teacher will guide you through a solution before proceeding to the bonus task (if time allows).
π Bonus Task
If you finished early, keep doing the same investigative work by checking the ggplot2
documentation. See if you can figure out how to create a plot like the one below using the data frame you have just created.