✏️ W03 Formative

2023/24 Autumn Term

Author

⏲️ Due Date:

🎯 Main Objectives:

Important

Please submit your work even if you didn’t manage to go very far with the R code. As this is a formative assignment, it won’t be graded, and the main point is for you to get used to submitting your work through GitHub Classroom.

👉 Note: This assignment will count towards your final class grade if you are a General Course or Exchange student.

📚 Preparation

You will use GitHub Classroom 1 to submit your work. You will need to have a GitHub account to do this.

  1. Create an account on GitHub if you don’t have one already. It’s free!

Never heard of GitHub2? Or maybe you have heard of it but never used it? Then, follow the instructions below to get started.

  1. Go to our Slack workspace’s #announcements channel to find the link entitled ‘Intro to Git & GitHub’. You will be taken to a page with instructions on how to get started with Git and GitHub.
  2. Follow those instructions and complete the exercises.
  3. Ask any questions about the exercise above on the #help-assessments channel on Slack.

📝 Instructions

  1. Go to our Slack workspace’s #announcements channel to find a GitHub Classroom link. Do not share this link with anyone outside this course!

  2. Click on the link, sign in to GitHub and then click on the green button Accept this assignment.

  3. You will be redirected to a new private repository created just for you. The repository will be named ds202a-2023-formative--yourusername, where yourusername is your GitHub username. The repository will be private and will contain a README.md file with a copy of these instructions.

  4. Many of you might still be catching up with R and GitHub, so it’s okay if you can only complete a few questions. You will still get feedback on your answers, which will still count as completed (important for General Course and Exchange students).

  5. Create your own .qmd file with your answers. You can use the .qmd file you used in the W02 lab as a template. Just remove anything that is not relevant to this assignment.

  6. Try to create separate headers and code chunks for each question. This will make it easier for us to grade your work. Learn more about the basics of markdown formatting here.

  7. Use the #help-assessments channel on Slack liberally if you get stuck. You can also use the #help-r channel if your questions are more about R programming.

“What do I submit?”

Do you know your CANDIDATE NUMBER? You will need it.

“Your candidate number is a unique five digit number that ensures that your work is marked anonymously. It is different to your student number and will change every year. Candidate numbers can be accessed using LSE for You.

Source: LSE

  • A Quarto markdown file with the following naming convention: <CANDIDATE_NUMBER>.qmd, where <CANDIDATE_NUMBER> is your candidate number. For example, if your candidate number is 12345, then your file should be named 12345.qmd.

  • An HTML file render of the Quarto markdown file.

You don’t need to click to submit anything. Your assignment will be automatically submitted when you commit AND push your changes to GitHub. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment.

✔️ How we will grade your work

We won’t! This is formative. But you will get feedback on your answers. It won’t be super detailed at this stage, but it should give you an idea of how you are doing.

👉 Note: Completing this assignment will count towards your final class grade if you are a General Course or Exchange student. It will still count as submitted even if you submit just a few coding responses.

📚 Tasks

The questions below will build on your code from 💻 W02 Lab.

Part 1: Creating dummy variables

Q1

Create a new column called month that contains the month of the year. Ensure the month is a three-letter abbreviation encoded as a factor.

Q2

You are to create five plots, one for each selected region (‘England’, ‘Scotland’, ‘Wales’, ‘Northern Ireland’, ‘United Kingdom’).

Each plot should be a boxplot of the yearly change in house prices where the x-axis is the month of the year, and the y-axis is the monthly change in house prices.

Alternatively (and preferred), you can create a single plot with five facets, one for each region.

The dataset contains various time scales, which can be tricky to navigate.

To clarify, when we talk about ‘yearly change’, we refer to the average price difference over the past 12 months at a specific point in time. For instance, if you’re looking at the yearly change in house prices for England in January 2000, you’d compare the average price in that month to its counterpart in January 1999. The dataset has a dedicated column that captures this yearly change — it’s the same one you used in the W02 lab. In this example, the data point is associated with the month where the observation was taken, January 2000.

This is why, despite it being a yearly change, we’re plotting it against the month of the year.

Q3

Do you sense that there is a seasonal pattern in the data? If so, what is it?

Part 2: Creating lagged variables (more advanced)

Lagged variables were covered in the 🧑‍🏫 Week 02 lecture.

Q4

Create a new column called yearly_change_lag1 that contains the yearly change from the previous month.

Q5

Add 11 more lagged variables to the dataset, called yearly_change_lag2 to yearly_change_lag12.

Q6

Drop the rows with missing values.

Q7

Reorder the rows by date in descending order. Reorder the columns so that they follow the order below:

  • date
  • month
  • region
  • yearly_change
  • yearly_change_lag1
  • yearly_change_lag2
  • yearly_change_lag12

If you have been paying close attention, you could use the newly created columns to model the yearly change in house prices using the lagged yearly changes as predictors.