ποΈ Week 03 β Day 01: Intro to Databases and SQL
Moving beyond loose data files
π₯ Learning Objectives
Review the goals for today
At the end of the day you should be able to:
- Understand the similarities and differences between SQL and Python/Rβs data manipulation libraries
- Write basic SQL commands: SELECT, WHERE, GROUP BY, ORDER BY, JOIN
- Translate your data analysis from Python/R to SQL
- Use SQL to query databases
π Todayβs Lecture
π Preparation
You can use your base
environment or create a new one for this notebook.
These are the libraries you need to install:
pip install pandas numpy lets-plot numerize tqdm sqlalchemy jupysql
π‘ NOTE: There are a few new ones today: the jupysql
and sqlalchemy
packages.
π Lecture notebook
Todayβs lecture material is all contained in the following Jupyter notebook:
π½ What is a database?
At its core, a database is a collection of data stored in a computer system. Itβs like a structured storage space that allows us to perform various data-related operations, including:
- storing
- retrieving
- modifying, and
- deleting data.
Relational databases
The most common type of database is a relational database, where data is stored in tables that are related to one another. This allows for easy organization and access to data. Additionally, distributed databases can be replicated across multiple points in a network.
A key feature of relational databases is that they are persistent, meaning the data remains even after a computer crash or power outage.
Database management systems
We use database management systems (DBMS) to interact with databases. These systems serve as intermediaries between users and the databases, enabling operations on the data.
Some popular examples of DBMS include:
Database administrators (DBAs) are responsible for managing databases. They handle tasks like installing or upgrading the DBMS, troubleshooting issues, and monitoring database performance and security. DBAs also contribute to planning and developing the database structure to cater to usersβ needs.
Database structure
A database consists of tables, each containing rows and columns. Every row represents a set of related data, and all rows in a table have the same structure. For instance, a table about people might have rows for each person, with columns for name, address, phone number, etc.
Sounds familiar? That is because the tidy data principles were inspired by relational databases, and pandas
embraces this relational approachin several ways.
Now, while diving deep into database theory could be a course on its own, weβll focus on hands-on applications. We will set up a simple database, populate it with data, and query it using SQL β all while taking notes of similarities and differences with pandas
.