π» Lab 01 β Recap of base R and tidyverse fundamentals
Lab roadmap (90 min)
π LAB DIFFICULTY: π EASY (assumes just basic experience with R)
π₯ Learning Objectives
- Refresh your R skills
- Compare and contrast base R and tidyverse solutions to the same problem
- Practice loading, manipulating, and visualizing data using R and tidyverse
π Lab Tasks
In our first lab, you will be given some practical exercises to practice loading, manipulating, and visualizing data using R and tidyverse. Weβll dive into a real dataset called Tesco Groceries 1.0, curated by researchers from Nokia Bell Labs, Kingβs College London, University of Turing, and Tesco Labs (Aiello et al. 2020).
You can find a detailed description of this dataset in the π Data Dictionary: Tesco Grocery 1.0 webpage.
Now, letβs get started!
Part 1: βοΈ Setup (15 minutes)
π― ACTION POINT
(Whenever you come across this text βπ― ACTION POINTβ, it means you have a set of tasks to complete)
Before you dive into coding, take a moment to complete the following steps:
Give a warm hello to your instructor! And donβt forget to high-five the two classmates sitting closest to you! π
- Yes, weβre serious about this one!
Ensure that R is installed on your computer.
If you havenβt already, we also suggest using an integrated development environment (IDE) like RStudio, which is available for free download.
Open RStudio and create two new R scripts.
- Save the first script as
lab01.R
- Save the second as
lab01-tidyverse.R
.
Start writing your code in the first script. Then, after you have completed the base R exercises, copy and paste your code into the second script and modify it to use tidyverse functions instead.
- Save the first script as
Head to the dataset page and download the file named
Dec_lsoa_grocery.csv
.Save the file in the same folder as your R script.
Part 2: Letβs view our data! (20 minutes)
π©π»βπ« TEACHING MOMENT
(Whenever you come across this text βπ©π»βπ« TEACHING MOMENTβ, it means your instructor deserves your full attention)
Your instructor will load the dataset into R and name it
df
. She will runView(df)
so you all explore the datasetβs structure and variables together.Your instructor will filter
df
to show only the row(s) corresponding to the region of London we are currently in. The LSOA code for the Aldwych area surrounding LSE is E01004735.- She will show the base R and the tidyverse ways of doing this.
Your instructor will open the Open Geography portal, made available by the Office for National Statistics of the UK, to show you how you can highlight a region on the map by its LSOA code. Keep a tab open on this page, as we will use it later in the lab.
π£οΈ CLASSROOM-WIDE DISCUSSION: Why do you think the authors gave us the dataset in this format instead of, say, simply a list of all the products purchased by customers in that area?
Part 3: Now you are the data analyst! (55 minutes)
Try to complete the action points below using base R first (type your solutions in lab01.R
). After youβve finished, convert your results to tidyverse (type them in lab01-tidyverse.R
). If you get stuck, ask your instructor for help.
Feel free to π₯ pair up with a classmate to work on the exercises together.
π― ACTION POINT
Filter the dataset to contain only the following columns:
- The identifier column (
area_id
) - Columns with demographic data (population, age, area, etc.)
- Columns that represent the average consumption of nutrients (check data dictionary for examples) across all LSOA regions β ignore the columns with suffixes.
- The identifier column (
Identify the top three regions with the highest average alcohol consumption and print them out. Also, determine the three regions with the lowest average alcohol consumption. Repeat the process for sugar consumption.
- Can you also find out where these regions are located?
Calculate the average and standard deviation of the
population
sizes across all LSOA regions. Save the results in a single data frame. Print out the data frame.Choose two nutrients (carbs, sugar, fat, saturated fat, protein, or fibre) and create a scatterplot to visualize their relationship. What observations can you make?
- Please note that for the base R solution, you should not use the
ggplot2
package. You should use theplot()
function instead.
- Please note that for the base R solution, you should not use the
π©π»βπ« TEACHING MOMENT
Just before you wrap up, your instructor will assess everyoneβs progress with base R and tidyverse. Make sure to jot down any areas that are still unclear to you after the lab, as youβll have the opportunity to discuss them in tomorrowβs lecture.
Additionally, she might request you to complete a brief poll to gauge the ease with which you were able to generate the base R and tidyverse solutions.