✏️ W04 Formative

2024/25 Autumn Term

Author

Tabtim Duenger

⏲️ Due Date:

🎯 Main Objectives:

Please submit your work even if you didn’t manage to go very far with the R code. As this is a formative assignment, it won’t be graded, and the main point is for you to get used to submitting your work through GitHub Classroom.

👉 Note: Completing this assignment will count towards your final class grade if you are a General Course or Exchange student. It will still count as submitted even if you submit just a few coding responses.

📚 Preparation (if you are new to GitHub)

You will use GitHub Classroom 1 to submit your work. You will need to have a GitHub account to do this.

  1. Create an account on GitHub.

Never heard of GitHub2? Or maybe you have heard of it but never used it? Then, follow the instructions below to get started.

  1. Go to our Slack workspace’s #announcements channel to find the link to ‘Intro to Git and GitHub’. You will be taken to a page with instructions on how to get started with Git and GitHub.

  2. Read the instructions in the README.md and complete the exercises.

  3. Ask any questions about the exercise above on the #help channel on Slack.

📝 Instructions

  1. Go to our Slack workspace’s #announcements channel to find a GitHub Classroom link. Do not share this link with anyone outside this course!

  2. Click on the link, sign in to GitHub and then click on the green button Accept this assignment.

  3. You will be redirected to a new private repository created just for you. The repository will be named ds202a-2024-w04-formative--yourusername, where yourusername is your GitHub username. The repository will be private and will contain a README.md file with a copy of these instructions.

  4. Many of you might still be catching up with R and GitHub, so it’s okay if you can only complete a few questions. You will still get feedback on your answers, which will still count as completed (important for General Course and Exchange students).

  5. Create your own .qmd file with your answers. You can use the .qmd file you used in the W02 lab as a template. Just remove anything that is not relevant to this assignment.

  6. Try to create separate headers and code chunks for each question. This will make it easier for us to grade your work. Learn more about the basics of markdown formatting here.

  7. Use the #help channel on Slack liberally if you get stuck.

“What do I submit?”

⚠️ Do you know your CANDIDATE NUMBER? You will need it.

“Your candidate number is a unique five digit number that ensures that your work is marked anonymously. It is different to your student number and will change every year. Candidate numbers can be accessed using LSE for You.

Source: LSE

  • A Quarto markdown file with the following naming convention: <CANDIDATE_NUMBER>.qmd, where <CANDIDATE_NUMBER> is your candidate number. For example, if your candidate number is 12345, then your file should be named 12345.qmd.

  • An HTML file render of the Quarto markdown file.

You don’t need to click to submit anything. Your assignment will be automatically submitted when you commit AND push your changes to GitHub. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment.

✔️ How we will grade your work

We won’t! This is formative. But you will get feedback on your answers. It won’t be super detailed at this stage, but it should give you an idea of how you are doing.

👉 Note: Completing this assignment will count towards your final class grade if you are a General Course or Exchange student. It will still count as submitted even if you submit just a few coding responses.

📚 Tasks

The questions below will build on your code from 💻 W02 Lab and W03 Labs.

About the Data

We will use a data set from the UC Irvine Machine Learning Repository for this assignment. It’s a dataset that looks at student achievement in mathematics in secondary education of two Portuguese schools. You can download the dataset by clicking on the button below:

See here to find out more about the data.

Question 1

Load the data and look through the documentation in the link above. Rename the variable that represents the final grade to finalgrade.

Question 2

Create a new data frame, keeping only students from GP school, and with the columns age, freetime, goout, traveltime, studytime, health, and finalgrade. Ideally, implement all these steps seamlessly using the pipe operator.

Finally, show the first 10 rows of this data frame as output.

Question 3

Based on travel time and study time, what were the average final grades?

What is the combination of both that leads to the highest average final grades?

Question 4

Use the data frame you created in Q2 and plot a chart to show how the the final grade changes with study time. Set an appropriate title for both the plot and the axes. Any additional visual improvements are welcome, too, but entirely optional.

Based on the plot, is there an obvious relationship between these variables?

Question 5

Split your data frame from Q2 into training and testing sets. Build a multivariate linear model on the training set in order to predict final grades, and evaluate the model on the test set using the metric RMSE. How does this model perform at predicting student’s final grades?

If you feel motivated, answer this slightly more advanced question instead: Take a closer look at the error terms of your model. What insights can you gain from this plot?