✔️ Week 02 - Lab Solutions
DS202 - Data Science for Social Scientists
Solutions to exercises
Use the function
View()
to identify the type of a variable (quantitative or qualitative):View(Auto)
Variables mpg, cylinders, horsepower, weight, acceleration, year are quantitative variable. Variables origin, name are qualitative variable
Use the function
range()
to check the range of each quantitative predictor:range(Auto$mpg) 1] 9.0 46.6 [
To refer to a variable, we must type the data set and the variable name joined with a
$
symbol.Using
summary()
to have an overall look at all variables and statistical features (like mean and standard deviation) are included in the outputs:summary(Auto)
or
mean(Auto$mpg) sd(Auto$mpg)
Remove the 10th through 85th observations from the original data frame and store it as another new data frame:
= Auto[-c(10:85), ] Auto_tmp summary(Auto_tmp) mean(Auto_tmp$mpg) sd(Auto_tmp$mpg)
Create a scatterplot matrix using the function pairs():
pairs( ~ mpg + displacement + horsepower + weight + + year + origin + cylinders, acceleration data = Auto)
Notice the linear or non-linear trends in the scatterplots.Then create a histogram of the variable mpg:
hist (Auto$mpg , col = 2, breaks = 15)
Use the
hist()
function to produce some histograms with differing numbers of bins for a few of the quantitative variables. You may find the commandpar(mfrow = c(2, 2))
useful: it will divide the print window into four regions so that four plots can be made simultaneously. Modifying the arguments to this function will divide the screen in other ways.After observing the first row of the scatterplot matrix which indicates the relationship between gas mileage (
mpg
) and other variables, you will find evident linear or non-linear trends exist in the scatterplots with variables displacement, horsepower, weight, year and origin. Therefore, these varibles might be useful in predictingmpg
.
If you want to achieve ststistical robust when exploring the relationship between variables, you need to culculate some statistics (like the correlation using the function cor()
) and conduct statistical tests. This will be further illustrated in Week 03.