ποΈ Week 01 β Day 03: Summarizing and Visualizing Data
The basics of Exploratory Data Analysis
We have seen how tidy data looks like, and we already warmed up to the idea of data wrangling. Today, we will explore what we can do with our data once it is tidy. The process of exploring data, freely, is called Exploratory Data Analysis (EDA). This involves summarizing the main characteristics of the data, and quite often with visual methods.
π₯ Learning Objectives
Review the goals for today
At the end of the day you should be able to:
- Use computational notebooks to document your data analysis
- Write Markdown to format your computational notebooks
- Articulate the relevance of summarizing data
- Use the
groupby
->apply
->combine
pattern to summarize data - Create scatterplots, bar charts, histograms, and box plots
Part 1: π§βπ» Live Coding: Summarizing Data
For todayβs lecture, we will not be using slides. Instead, we will be working on a computational notebook (Jupyter) to explore the basics of Exploratory Data Analysis (EDA). Download the notebook by clicking the button below:
π Save the notebook on the ME204/code
folder.
π PREPARATION
π‘ Pro-Tip: For an easier start, I will recommend the base
miniconda environment by default today. However, if you have previous conda environment experience, feel free to switch to a custom environment.
On VS Code, select
Terminal
->New Terminal
and type the following command to activate the(base)
miniconda environment:conda activate
This will activate the base environment. You should see the
(base)
prefix in your terminal. Raise your hand if nothing happens.Check that
pip
is installed within your environment by running:Windows users
where.exe pip
This should show you the path to the
pip
executable and it must be within the miniconda environment.If you see more than one line and the first line is not within the miniconda environment, please let me know.
macOS/Linux users
which pip
This should show you the path to the
pip
executable and it must be within the miniconda environment.If you see more than one line and the first line is not within the miniconda environment, please let me know.
If you donβt see the path to
pip
, please let me know.Now, install the required packages for todayβs session:
pip install pandas matplotlib lets-plot ipykernel jupyterlab
Itβs likely that you already have most of these packages installed. If you see a message saying that the packages are already installed, thatβs fine.
Expand the
ME204/code
folder in the left sidebar of VS Code to reveal the Jupyter Notebook file you downloaded earlier. Double-click on the file to open it.At the top-right corner of the notebook window (within VS Code), click on βSelect Kernelβ and choose the miniconda kernel
(base)
.Run the notebook cells as we go through the lecture.