πŸ—“οΈ Week 01 – Day 03: Summarizing and Visualizing Data

The basics of Exploratory Data Analysis

Author
Published

10 July 2024

We have seen how tidy data looks like, and we already warmed up to the idea of data wrangling. Today, we will explore what we can do with our data once it is tidy. The process of exploring data, freely, is called Exploratory Data Analysis (EDA). This involves summarizing the main characteristics of the data, and quite often with visual methods.

start Start gather Gather data   start->gather store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       end End communicate->end

πŸ₯… Learning Objectives

Review the goals for today

At the end of the day you should be able to:

  • Use computational notebooks to document your data analysis
  • Write Markdown to format your computational notebooks
  • Articulate the relevance of summarizing data
  • Use the groupby -> apply -> combine pattern to summarize data
  • Create scatterplots, bar charts, histograms, and box plots

Part 1: πŸ§‘β€πŸ’» Live Coding: Summarizing Data

For today’s lecture, we will not be using slides. Instead, we will be working on a computational notebook (Jupyter) to explore the basics of Exploratory Data Analysis (EDA). Download the notebook by clicking the button below:

πŸ‘‰ Save the notebook on the ME204/code folder.

πŸ“š PREPARATION

πŸ’‘ Pro-Tip: For an easier start, I will recommend the base miniconda environment by default today. However, if you have previous conda environment experience, feel free to switch to a custom environment.

  1. On VS Code, select Terminal -> New Terminal and type the following command to activate the (base) miniconda environment:

    conda activate

    This will activate the base environment. You should see the (base) prefix in your terminal. Raise your hand if nothing happens.

  2. Check that pip is installed within your environment by running:

    Windows users

    where.exe pip

    This should show you the path to the pip executable and it must be within the miniconda environment.

    If you see more than one line and the first line is not within the miniconda environment, please let me know.

    macOS/Linux users

    which pip

    This should show you the path to the pip executable and it must be within the miniconda environment.

    If you see more than one line and the first line is not within the miniconda environment, please let me know.

    If you don’t see the path to pip, please let me know.

  3. Now, install the required packages for today’s session:

    pip install pandas matplotlib lets-plot ipykernel jupyterlab

    It’s likely that you already have most of these packages installed. If you see a message saying that the packages are already installed, that’s fine.

  4. Expand the ME204/code folder in the left sidebar of VS Code to reveal the Jupyter Notebook file you downloaded earlier. Double-click on the file to open it.

  5. At the top-right corner of the notebook window (within VS Code), click on β€œSelect Kernel” and choose the miniconda kernel (base).

  6. Run the notebook cells as we go through the lecture.