πŸ—“οΈ Week 03 – Day 01: Intro to Databases and SQL

Moving beyond loose data files

Author
Published

22 July 2024

πŸ₯… Learning Objectives

Review the goals for today

At the end of the day you should be able to:

  • Understand the similarities and differences between SQL and Python/R’s data manipulation libraries
  • Write basic SQL commands: SELECT, WHERE, GROUP BY, ORDER BY, JOIN
  • Translate your data analysis from Python/R to SQL
  • Use SQL to query databases

πŸ“š Today’s Lecture

πŸ“– Preparation

You can use your base environment or create a new one for this notebook.

These are the libraries you need to install:

pip install pandas numpy lets-plot numerize tqdm sqlalchemy jupysql

πŸ’‘ NOTE: There are a few new ones today: the jupysql and sqlalchemy packages.

πŸ“‹ Lecture notebook

Today’s lecture material is all contained in the following Jupyter notebook:

πŸ’½ What is a database?

At its core, a database is a collection of data stored in a computer system. It’s like a structured storage space that allows us to perform various data-related operations, including:

  • storing
  • retrieving
  • modifying, and
  • deleting data.

Relational databases

The most common type of database is a relational database, where data is stored in tables that are related to one another. This allows for easy organization and access to data. Additionally, distributed databases can be replicated across multiple points in a network.

A key feature of relational databases is that they are persistent, meaning the data remains even after a computer crash or power outage.

Database management systems

We use database management systems (DBMS) to interact with databases. These systems serve as intermediaries between users and the databases, enabling operations on the data.

Some popular examples of DBMS include:

Database administrators (DBAs) are responsible for managing databases. They handle tasks like installing or upgrading the DBMS, troubleshooting issues, and monitoring database performance and security. DBAs also contribute to planning and developing the database structure to cater to users’ needs.

Database structure

A database consists of tables, each containing rows and columns. Every row represents a set of related data, and all rows in a table have the same structure. For instance, a table about people might have rows for each person, with columns for name, address, phone number, etc.

Sounds familiar? That is because the tidy data principles were inspired by relational databases, and pandas embraces this relational approachin several ways.

Now, while diving deep into database theory could be a course on its own, we’ll focus on hands-on applications. We will set up a simple database, populate it with data, and query it using SQL – all while taking notes of similarities and differences with pandas.