✔️ Week 02 - Lab Solutions

DS202 - Data Science for Social Scientists

Author

Yijun Wang

Published

07 October 2022

Solutions to exercises

  1. Use the function View()to identify the type of a variable (quantitative or qualitative):

    View(Auto)

    Variables mpg, cylinders, horsepower, weight, acceleration, year are quantitative variable. Variables origin, name are qualitative variable

  2. Use the function range() to check the range of each quantitative predictor:

    range(Auto$mpg)
    [1]  9.0 46.6

    To refer to a variable, we must type the data set and the variable name joined with a $ symbol.

  3. Using summary() to have an overall look at all variables and statistical features (like mean and standard deviation) are included in the outputs:

    summary(Auto)

    or

    mean(Auto$mpg)
    sd(Auto$mpg)
  4. Remove the 10th through 85th observations from the original data frame and store it as another new data frame:

    Auto_tmp = Auto[-c(10:85), ]
    summary(Auto_tmp)
    mean(Auto_tmp$mpg)
    sd(Auto_tmp$mpg)
  5. Create a scatterplot matrix using the function pairs():

    pairs( ~ mpg + displacement + horsepower + weight + 
            acceleration + year + origin + cylinders, 
            data = Auto)

    Notice the linear or non-linear trends in the scatterplots.Then create a histogram of the variable mpg:

    hist (Auto$mpg , col = 2, breaks = 15)

    Use the hist() function to produce some histograms with differing numbers of bins for a few of the quantitative variables. You may find the command par(mfrow = c(2, 2)) useful: it will divide the print window into four regions so that four plots can be made simultaneously. Modifying the arguments to this function will divide the screen in other ways.

  6. After observing the first row of the scatterplot matrix which indicates the relationship between gas mileage (mpg) and other variables, you will find evident linear or non-linear trends exist in the scatterplots with variables displacement, horsepower, weight, year and origin. Therefore, these varibles might be useful in predicting mpg.

If you want to achieve ststistical robust when exploring the relationship between variables, you need to culculate some statistics (like the correlation using the function cor()) and conduct statistical tests. This will be further illustrated in Week 03.