π¨βπ« Week 09 - Managing your data science workflow
DS105 - Data for Data Science
π₯οΈ Part I - Conda environments (45-50 min)
In the first part of the lecture, we will explore conda
environments together (bring your π»laptops, and perhaps also coffee!).
And in the labs, you will practice how to share these environments via Github and also how to use Github more effectively as a team. You will learn about the process of Pull Requests for teams.
Open to see lecture notes
Lecture Notes
This is an interactive lecture. I will teach you about conda
environments and we will compare how everyoneβs conda and python settings are different.
π€ What is in my conda
?
Step 1.
Which version of python do you have? Open your terminal and type:
--version python
Compare your version to those of your colleagues.
Step 2.
Which version of conda
do you have? Open your terminal and type:
list conda
You should see a list of all packages you have installed, or came installed by default, in your conda
default environment.
What version of jupyterlab
do you have installed? What about pandas
? Compare the version of your packages to those of your colleagues. Are there any differences?
Step 3.
Letβs generate some data! Run the command below in the terminal to save the content of conda list
to a text file. Replace my username for yours:
list >> conda_list_jonjoncardoso.txt conda
Step 4.
I will ask you to upload the file you created to Slack
Step 5.
I will then combine all of our data and we will explore the discrepancies in the versions of packages we are all likely to use.
βοΈ How do we βfixβ everyoneβs environment?
Step 6.
In the terminal, cd
to the directory where you keep all the files of your project.
Step 7.
Create a conda environment
π‘ Useful link: Managing conda
environments`
--prefix .\venv python=3.10 conda create
Step 8.
Activate the environment:
source activate .\venv
activate .\venv
Step 9.
Whatβs different about this conda environment?
list conda
Step 10.
Create a new file called requirements.txt
and paste the following there:
==3.5.3 # version required for plotnine
matplotlib==0.10.1 # Python version of ggplot2
plotnine
>=1.22
numpy==1.4.2
pandas-learn==1.1.3
scikit
### UTILS
==3.4.2
jupyterlab==4.62.0 tqdm
Step 11.
Try to install it with conda
:
--file requirements.txt conda install
Why canβt we install all packages?
Step 12.
Install it with pip
:
conda install pip which pip
conda install pip where.exe pip
Then:
-r requirements.txt pip install
Step 12.
How does the conda environment look like now?
list conda
β Coffee Break (10 min)
Use this time to chat, stretch, drink some coffee or just relax for a bit by yourself.
π₯οΈ Part II - Databases (45-50 min)
Databases: what is it? what is SQL? And how to connect to a database directly through pandas. Initially, this content will come on ποΈ Week 08 but we didnβt have the time for that.
Open to see lecture notes
Useful links
- Relational Database Management System (RDBMS)
- Famous Open-source RDBMS:
- SQL:
- Good step-by-step SQL Tutorial
Follow the steps
Step 13.
Download and Install DBrowser for SQLite
Step 14.
Download this sample data called ChinookDatabase
Step 15.
Import Chinook Database to the database using DBrowser