✏️ W03 Formative

2023/24 Winter Term

Author

Andreas Stöffelbauer

⏲️ Due Date:

🎯 Main Objectives:

Please submit your work even if you didn’t manage to go very far with the R code. As this is a formative assignment, it won’t be graded, and the main point is for you to get used to submitting your work through GitHub Classroom.

👉 Note: Completing this assignment will count towards your final class grade if you are a General Course or Exchange student. It will still count as submitted even if you submit just a few coding responses.

📚 Preparation (if you are new to GitHub)

You will use GitHub Classroom 1 to submit your work. You will need to have a GitHub account to do this.

  1. Create an account on GitHub

  2. Go to our Slack workspace’s #announcements channel to find the link to ‘Intro to Git & GitHub’. You will be taken to a page with instructions on how to get started with Git and GitHub.

  3. Read the instructions in the README.md and complete the exercises.

  4. Ask any questions about the exercise above on the #help-assessments channel on Slack.

📝 Instructions

  1. Go to our Slack workspace’s #announcements channel to find a GitHub Classroom link. Do not share this link with anyone outside this course!

  2. Click on the link, sign in to GitHub, and then click on the green button Accept this assignment.

  3. You will be redirected to a new private repository created just for you. The repository will be named ds202w-2024-w03-formative-yourusername, where yourusername is your GitHub username. The repository will be private and will contain a README.md file with a copy of these instructions.

  4. Many of you might still be catching up with R and GitHub, so it’s okay if you can only complete a few questions. You will still get feedback on your answers, which will still count as completed (important for General Course and Exchange students).

  5. Create your own .qmd file with your answers. You can use the .qmd file you used in the W02 lab as a template. Just remove anything that is not relevant to this assignment.

  6. Try to create separate headers and code chunks for each question. This will make it easier for us to grade your work. Learn more about the basics of markdown formatting here.

  7. Use the #help-assessments channel on Slack liberally if you get stuck. You can also use the #help-r channel if your questions are more about R programming.

“What do I submit?”

⚠️ Do you know your CANDIDATE NUMBER? You will need it.

“Your candidate number is a unique five digit number that ensures that your work is marked anonymously. It is different to your student number and will change every year. Candidate numbers can be accessed using LSE for You.

Source: LSE

  • A Quarto markdown file with the following naming convention: <CANDIDATE_NUMBER>.qmd, where <CANDIDATE_NUMBER> is your candidate number. For example, if your candidate number is 12345, then your file should be named 12345.qmd.

  • An HTML file render of the Quarto markdown file.

You don’t need to click to submit anything. Your assignment will be automatically submitted when you commit AND push your changes to GitHub. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment.

✔️ How we will grade your work

We won’t! This is formative. But you will get feedback on your answers. It won’t be super detailed at this stage, but it should give you an idea of how you are doing.

👉 Note: Completing this assignment will count towards your final class grade if you are a General Course or Exchange student. It will still count as submitted even if you submit just a few coding responses.

📚 Tasks

The questions below will build on your code from 💻 W02 Lab.

About the Data

We will use a data set from the Office for National Statistics for this assignment. It’s a filtered time series data set about the UK’s Consumer Prices Index, including owner occupiers’ housing costs (CPIH). You can download the dataset by clicking on the button below:

See here if you would like to know more about the data.

Question 1

Load the data and explore the column’s data types. What different data types does it contain? Are there any that you would consider converting, and if so, why?

💡 Answer briefly. No need to write any code here just yet (other than for loading the data of course).

Question 2

Create a new data frame with columns date, category and CPIH. Sort it by date (descending) and category (ascending). Note that you may also need to apply your insights from question 1 to get this right. Ideally, implement all these steps seamlessly using the pipe operator.

Finally, show the first 10 rows of this data frame as output.

Question 3

Use the data frame you created in Q2 and plot a chart to show how the different CPIH price indices developed over time. Set an appropriate title and place the legend on the left-hand side of the plot. Any additional visual improvements are welcome, too, but entirely optional.

Based on the plot, what is the only category whose price index in December 2023 was lower than it had been in January 2000?

Question 4

Create a new column called CPIH_lag1 that contains the CPIH of the previous month.

Once you have created CPIH_lag1, create additionally the variables CPIH_lag3, CPIH_lag6, and CPIH_lag12. Why might we specifically choose 1, 3, 6, and 12 here?

Question 5

What was the largest monthly jump in the CPIH, and when and within what category did it occur?

If you feel motivated, answer this slightly more advanced question instead: Within each category, what were the largest jumps, and when did they occur.