🧑‍🏫 Week 02 Lecture

A Python crash course

Author
Published

07 October 2024

Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

Last updated: 9 October 2024 8pm

In our second lecture, we switch our attention to Python programming! 🐍 We will still be using data we have been producing as our raw resource.

📋 Preparation

The best way to come prepared for this first one is to have worked on the 📝 W02 Formative Exercise before the lecture.

At DS105, we don’t think teaching is restricted to the classroom. We believe learning happens anytime, anywhere, and that frequent human-to-human communication is key to a successful learning experience. Our lectures are one place where that communication happens. More than just getting new information, this is the place to consolidate your understanding of the self-study material we shared with you by asking questions and engaging in discussions.

📃 Schedule

📍Location: Thursday 10 October 2024, 4 pm - 6 pm at CLM.5.02

This first lecture will have two parts:

  1. Python basics (4 pm - 4.45 pm). Let’s take a step back and look at the building blocks of Python. We will also reflect on where we type Python code and how we run it.

🍵 Quick break w/ data collection: (4.45 pm - 5pm) I will open a Menti Q&A to collect your questions about what you’ve seen so far.

  1. Making the most of lists, dicts & sets in Python (5 pm - 6 pm). Here we will go through the 📝 W02 Formative Exercise, making sure we understand how we are using Python to reach our goals!

Throughout the session, I will do an analysis of the data you produced on Nuvolos during the week. I will use this as a pretext to show you how to make the most of Python’s data structures for data manipulation.

📝 Lecture Notes

📋 TAKE NOTE:

  • You won’t find “slides for studying” in this course. I do use slides in my lectures, but they serve as a visual aid to help me organise my thoughts. I tend to post those slides after the lecture on Slack, along with other links and resources.

  • The studying material will be available here in this page one day before the lecture.

  • Let me know if you want me to add notes on any specific topic or expand on something you might want to revisit later.

TOPIC 1: Questions about Terminal commands, nano, and the Python shell

I hope that, when you worked on the 📝 W02 Formative Exercise, you were surrounded by many questions. Below are some of the questions I received (paraphrased) from you this past week.

“Why are you forcing me to use this Terminal app? I am not a hacker from the 70s!”

“When I open the Terminal app on Nuvolos I see a prompt that ends with $:

(base) 16:43:27 - nuvolos:/files$ 

What even is that?? Why is it different to what I see on my computer?“

“When I type python in the Terminal and hit Enter, the $ prompt changes to >>>. Why?!”

“When I type this nano command you forced me to use, the screen changes completely. What is happening there?”

“Why is it that when I’m inside the Python shell, the Terminal commands no longer work (cd, ls, etc.)?”

“Why don’t I see the >>> prompt when I type python followed by the name of a Python script? For example:

python code/commands_analysis.py

Why does 'nothing happen' when I run the above?“

Click here to see the answers

Q: “Why bother learning Terminal commands?”

A: Our modern computers abstract away a lot of the complexity of the underlying mechanisms that make all the apps we love work. In DS105, we want to give you a peek behind the curtain before you start writing code for data analysis. This will make you a better coder and a more efficient data scientist in the long run.


Q: “What is that line that ends with $?”

A: Let me explain each one of these pieces:

  • (base): ignore this one for now! This means we have a base Python environment activated. We will talk about this in the future around Week 7.

  • 16:43:27: this is just the current time in 24-hour format in the timezone of the Nuvolos machine (not necessarily your local timezone).

  • nuvolos: this is the name of your user on the Nuvolos machine. Everyone is called nuvolos on this machine 🙃

  • /files: this is the current working directory 1 you are in. It is where you are “standing” in the file system.

  • $: all shell prompts end with a $ or a # character. It is just a convention to indicate that the shell is ready to receive your commands.

    By default, when you open the Terminal app, it runs a shell program. Each Operating System has its own default shell program. The line above is the prompt of the shell program running on the Nuvolos machine, which is called bash. Your computer might be running a different shell program, like zsh (if on a Mac) or Powershell (if on Windows).


Q: “Why does the prompt change to >>> when I type python?”

A: When you type python and hit Enter, you leave the shell program and enter the Python shell, a different app. It is as if you changed from ‘Instagram’ to ‘WhatsApp’ on your phone.

Just like how the bash shell program awaits your commands with a $, the Python shell awaits your Python code with a >>>.


Q: “What is happening when I type nano?”

A: Just like python, nano is another app that you can run from the shell. nano is a text editor that allows you to edit plain text files 2. When you type nano and hit Enter, you entered the nano app and you now have to abide by its rules.


Q: “Why can’t I use Terminal commands when I’m inside the Python shell?”

A: Because you are in a completely different program! Running Terminal commands inside the Python shell would be like trying to order an Uber from inside the Instagram app. You need to exit the Python shell, with exit() to go back to the bash shell program.

When inside the Python shell, you can only run Python code. You would need to learn a bit of Python first to understand how to work with files the Python way.


Q: “Why don’t I see the >>> prompt when I run a Python script?”

A: This can be a tricky one! The python command, on its own, opens the Python shell. However, when you specify a Python script to the python command, it understands that you want to run Python code that is stored inside that file. The Python shell is not needed in this case, so it doesn’t open.

When you are in the Python shell, you can run Python code directly. If you type the name of a variable, you see what it contains. But when you run a script, only the output of print() commands will be shown. Nothing else will be displayed.

TOPIC 2: Python basics

What does it mean to write code?

  • Programming languages (like Python, R, C, C#, Java, Javascript) are a way to send instructions to a computer.
  • More than moving files around, we can make the computer do calculations, represent abstract concepts, and even interact with the world around us.
  • Computers follow instructions to the letter from top to bottom. You have to be very precise and follow the rules of the language you are using.

Python building blocks

You can’t start coding without understanding the basic building blocks of Python. Here are some of the most important ones:

Primitive Types

Python comes with a few basic data types that are so essential we call them primitive types 3. These are the building blocks of all the more complex data types you will encounter in Python. Here are the most important ones:

  • Booleans: used to represent True or False values. This is very useful when you want to make decisions in your code like ‘if this is true, do this, otherwise do that’ but is also used in data analysis in a myriad of ways (e.g., to decide which data to keep or discard, to represent whether someone is a customer or not, etc.).

  • Integers: used to represent whole numbers (positive or negative ones). For example, 2, 0 and -20. Integers are used in data analysis to represent counts, quantities, and identifiers.

  • Floats: used to represent numbers with a decimal point. For example, 3.14, 0.0, -1.0 and 2.0. Floats are used in data analysis to represent measurements, percentages, and probabilities.

  • Strings: used to represent text. To make sure Python understands you are writing a string, you need to enclose your text in single or double quotes. For example, 'hello', "hello", '2', "2", '3.14', "3.14" are all strings. Strings are used in data analysis to represent names, addresses, and any other text-based data.

  • None: used to represent the absence of a value. This is very useful when you want to say ‘this data point is missing’ or ‘this data point is not applicable’. In data analysis, you will often encounter missing data, and None is the way Python represents it.

Variables

If you go to the Python shell and type a number of a string, the shell will show it back to you. This is not very useful, though. What if I want to reuse that same number or string later on?

This is where variables come in, they are like boxes where you can store values for later use. Variables are created by giving them a name and assigning them a value using the = operator. For example, if I want to store my name in a variable, I can do this:

name = 'Jon'

I can forget about it, do other stuff, create other variables and this value will remain stored in the variable name. That is, until I decide to change it, delete it or if I close the Python shell.

💡 Variables only exist while the Python shell is open or inside a script when it’s running. If you need to use a variable after, say, closing your computer, you would need to save the variable to a file so you can read it back.

How to name variables?

  • Variable names can contain letters, numbers, and underscores.
  • Variable names cannot start with a number.
  • Variable names cannot contain spaces.
  • Variable names are case-sensitive. This means that name, Name, and NAME are all different variables.

Strive for descriptive variable names. It is better to use name than n or x!

How to update variables?

Simply assign a new value to the variable. Whatever value was stored in the variable before will be replaced by the new value.

For example, if I want to update the variable name to store my full name, I can do this:

name = 'Jon Cardoso-Silva'

Can I update a variable using other variables?

Yes! Say you have a variable called bill that stores the amount of a bill on a restaurant. You start with 0.0 but then you add values of the dishes you ordered. You can do this:

# List of dishes
carbonara = 12.5
tiramisu = 6.0
espresso = 1.5

# Start with the bill at 0.0
bill = 0.0

# Add the dishes to the bill
bill = bill + carbonara
bill = bill + tiramisu

# Print the bill
print(bill)

In the code above, we initialised several variables to represent the prices of dishes available at a restaurant. We then initialised the variable bill to 0.0 and added the prices of the dishes as we ordered them. Finally, we printed the total bill.

String operations

Notice that I used the operator + above to add the price of the dishes to the bill. In that situation, the + operator acts as a mathematical operator. However, when you use the + operator with strings, it acts as a concatenation operator.

For example, I could have a first_name and a last_name variable and I could concatenate them to form a full name like this:

first_name = 'Jon'
last_name = 'Cardoso-Silva'

full_name = first_name + ' ' + last_name

print(full_name)

What if I don’t know the type of a variable?

You can use the type() function to find out the type of a variable. For example, if you want to know the type of the variable name, you can do this:

type(name)

This will return <class 'str'>, which means that the variable name is a string.

How to delete variables?

We don’t normally need to delete variables, but if you want to, you can use the del command. For example, to delete the variable name, you can do this:

del first_name
del last_name

If you type first_name in the Python shell from now on, you will get an error saying “NameError: name is not defined”.

Operators

Operators are symbols that represent computations. You saw the + operator in action above. Here are some of the most important operators in Python:

  • Arithmetic operators: used to perform mathematical operations. The most common ones are +, -, *, /, etc.
  • Comparison operators: used to compare two values. The most common ones are ==, !=, >, <, >=, <=. For example, 2 == 2 will return True (a boolean primitive type) because 2 is indeed equal to 2.
  • Logical operators: used to combine multiple conditions. The most common ones are and, or, and not. For example, I might want to ensure that a number is greater than 0 and less than 10. I can do this with number > 0 and number < 10.

Collections

Most frequently, you need to work with more than one value at a time. Python has a few data types that allow you to store multiple values in a single variable. I will refer to these as collections 4. Here are the most important ones:

Lists

Lists are data structures that contain multiple values. Lists are essential for data science! Most of the time, we don’t work with a single data point, but with many.

You can store elements of any primitive type in a list, and you can even store lists inside lists! Lists are created by enclosing the elements in square brackets [] and separating them with commas. Examples of lists are:

  • ['apple', 'banana', 'cherry']
  • [1, 2, 3, 4, 5]
  • ['Jon', 'DS105A', 2024-10-10, 'Week 02']

There are so many things we can do with lists! We can add elements to them, remove elements from them, access elements by their position, and even sort them.

Creating an empty list

orders = []

Checking the size (length) of a list

How many elements are in the list orders?

len(orders)

We will get 0 because we haven’t added any elements to the list yet.

Adding elements to a list

There are several ways to grow a list. You can use the append() method to add an element to the end of the list. For example:

orders.append('carbonara')

Because the list was empty, the list orders now contains simply ['carbonara'], the length of the list is now 1. If I keep adding elements to the list, it will grow.

orders.append('tiramisu')
orders.append('espresso')

The list orders now contains ['carbonara', 'tiramisu', 'espresso'] and has a length of 3.

Accessing elements in a list

If you want to uncover what is the first element in the list orders, you can do this:

orders[0]

Confusingly, Python starts counting from 0! So the first element in the list is at position 0, the second element is at position 1, and so on.

Updating elements in a list

If you want to change the second element in the list orders to affogato, you can do this:

orders[1] = 'affogato'

That’s it. The list orders now contains ['carbonara', 'affogato', 'espresso'].

You could use the same principles you saw in the variables section to update elements in a list using other variables. For example, if I want to clarify that the espresso was a double shot, I could do this:

orders[2] = orders[2] + ' (double)'

The list orders now contains ['carbonara', 'affogato', 'espresso (double)'].

Removing elements from a list

Say you didn’t order affogato after all. The server made a mistake and you want to remove it from the list. You can do this:

orders.remove('affogato')

The list orders now contains ['carbonara', 'espresso (double)'].

Dictionaries

Another very important collection in Python that we will use extensively in data analysis is the dictionary. Dictionaries are data structures that store key-value pairs. These are very useful when you want to give a name to a value. For example, if you want to store the price of a dish, you can use a dictionary.

Dictionaries are created by enclosing the key-value pairs in curly brackets {} and separating them with commas. The key and the value are separated by a colon :. Examples of dictionaries are:

  • {'apple': 2.0, 'banana': 1.5, 'cherry': 3.0}
  • {'name': 'Jon', 'course': 'DS105A', 'date': '2024-10-10', 'week': 2}

Creating an empty dictionary

You can start with an empty dictionary like this and add key-value pairs to it later:

prices = {}

Adding key-value pairs to a dictionary

prices['carbonara'] = 12.5
prices['tiramisu'] = 6.0
prices['espresso'] = 1.5

The dictionary prices now contains {'carbonara': 12.5, 'tiramisu': 6.0, 'espresso': 1.5}.

Checking the size (length) of a dictionary

How many key-value pairs are in the dictionary prices?

len(prices)

This will return 3.

Checking the keys in a dictionary

If you want to know what dishes are available in the dictionary prices, you can do this:

prices.keys()

This will return dict_keys(['carbonara', 'tiramisu', 'espresso']), which is a fancy name for a list.

Checking the values in a dictionary

Say I don’t care about the dishes, I just want to know the prices. You can do this:

prices.values()

This will return dict_values([12.5, 6.0, 1.5]), which is a fancy name for a list.

Accessing values in a dictionary

If I want to check the price of the carbonara, I can do this:

prices['carbonara']

This will return 12.5.

Updating values in a dictionary

Ops, inflation!

prices['carbonara'] = 13.0

The dictionary prices now contains {'carbonara': 13.0, 'tiramisu': 6.0, 'espresso': 1.5}.

Removing key-value pairs from a dictionary

We don’t serve tiramisu anymore. Let’s remove it from the dictionary:

del prices['tiramisu']

The dictionary prices now contains {'carbonara': 13.0, 'espresso': 1.5}.

Nested lists and dictionaries

Most of the time, you will need to store more complex data structures. You can store lists inside lists, dictionaries inside dictionaries, lists inside dictionaries, and dictionaries inside lists. This is for example the reality of JSON files, one of the most common ways to store data in the wild 5.

Take a look at the following nested dictionary I created to represent an order:

order = {
    "name": "Jon",
    "course": "DS105A",
    "date": "2024-10-10",
    "week": 2,
    "orders": [
         {"carbonara": 13.0},
         {"espresso": 1.5}
    ]
}

We created a variable order that is a dictionary with many value-pairs. How many?

We can find out by using the len() function:

len(order)

This will return 5.

What are the keys in the dictionary order?

order.keys()

This will return the list dict_keys(['name', 'course', 'date', 'week', 'orders']).

Accessing values in a nested dictionary

Note that orders is a list inside the dictionary order. I could focus my attention on the orders list like this:

order['orders']

which would return:

[{'carbonara': 13.0}, {'espresso': 1.5}]

Because order['orders'] is a list, I can access the first element in the list like this:

order['orders'][0]

This will return {'carbonara': 13.0}.

You might think this is just overcomplicating things. Why not just store everything in flat lists and dictionaries? The answer is that the real world is messy. You will often need to store complex data structures to represent the complexity of the world around you. There’s no way around it!

Control flow

When you write code, you often need to make decisions based on the data you have. Sometimes, you also need to repeat the same operation many times. Python has a few structures that allow you to control the flow of your code. Here are the most important ones:

If/Else statements

These are things like ‘if this is true, do this, otherwise do that’. You can use if-else statements to make decisions in your code. For example, say you want to check if a number that you read from a file is greater than zero:

if number > 0:
    print('The number is greater than 0')
else:
    print('The number is not greater than 0')

You can also chain multiple conditions together using elif:

if number > 0:
    print('The number is greater than 0')
elif number == 0:
    print('The number is equal to 0')
else:
    print('The number is less than 0')
For loops

For loops are used to repeat the same operation many times. For example, say you have a list of dishes and you want to print each one of them as a separate line as part of a receipt:

dishes = ['carbonara', 'tiramisu', 'espresso']

print('Your order:')
for dish in dishes:
    print(dish)

This will print:

Your order:
carbonara
tiramisu
espresso

The for loop will go through each element in the list dishes, each time assigning the current element to the variable dish. You can then use the variable dish inside the loop to refer to the current element.

The variable dish only exists inside the loop. Once the loop is finished, the variable dish is gone6.

Looping a list by index (tricky to understand at first)

It is common to want to loop over a list by index. For example, say you have a list of prices and you want to print each price with its index:

prices = [13.0, 6.0, 1.5]

print('Your order:')
for i in range(len(prices)):
    print('Dish ' + str(i) + ': ' + str(prices[i]))

What is going on here? The range() function is a function that returns a list of numbers from 0 to n-1, where n is the number you pass to the function. In this case, len(prices) returns 3, so range(len(prices)) returns [0, 1, 2]. At every loop iteration, the variable i will take the value of the current element in the list. We can then use this variable to refer to the current index of the list prices.

Looping over dictionaries

You can also loop over dictionaries. For example, say you have a dictionary of prices and you want to print each key-value pair:

orders = [{'carbonara': 13.0}, {'tiramisu': 6.0}, {'espresso': 1.5}]

print('Your order:')
for order in orders:
    for dish, price in order.items():
        print(dish + ': ' + str(price))

This will print:

Your order:
carbonara: 13.0
tiramisu: 6.0
espresso: 1.5

Note that the syntax is a little different. We use the items() method to unpack the key and value of each key-value pair in the dictionary order. We then assign the key to the variable dish and the value to the variable price. Inside the loop, we can then use these variables to refer to the current key and value.

While loops

Sometimes we want to repeat an operation until a certain condition is met.

For example, say I want to keep ordering ‘espresso’ until I reach my daily caffeine limit of 29 shots:

daily_limit = 29

shots = 0

while shots < daily_limit:
    print('Ordering espresso...')
    shots = shots + 1

print('Daily limit reached!')

Functions

Loops are great at iterating over data, but sometimes I want to run a specific piece of code many times, but with different inputs. This is where functions come in. Functions are like recipes. You give them some ingredients (inputs) and they give you a dish (output).

Defining functions

Here is how you can define a function in Python:

def greet(name):
    print('Hello, ' + name + '!')

Notice that I need to indent (add spaces) the code inside the function. This is how Python knows that the code inside the function is different from the code outside the function.

If I forget about the indentation, Python will throw an error.

Once I have defined the function, I can use it with different inputs:

greet('Jon')
greet('DS105A')

All of these will print the string 'Hello, ...!' with the different names I passed to the function.

Returning values from functions

Maybe you want to call a function and get a value back instead of just printing something. You can do this by using the return statement:

def calculate_total_bill(prices):
    total = 0.0
    for price in prices:
        total = total + price
    return total

This function takes a list of prices and returns the total bill. You can use it like this:

prices = [13.0, 6.0, 1.5]

total = calculate_total_bill(prices)

# Now I can do whatever I want with the total
print('The total bill is: ' + str(total))
Arguments and keyword arguments

When you define a function, you can give it arguments. These are the inputs the function needs to run. For example, in the function greet above, the argument is name. You can also give a function keyword arguments. These are arguments that have a default value. For example, say you want to greet someone in a different language:

def greet(name, language='english'):
    if language == 'english':
        print('Hello, ' + name + '!')
    elif language == 'spanish':
        print('Hola, ' + name + '!')
    elif language == 'french':
        print('Bonjour, ' + name + '!')
    else:
        print('Hello, ' + name + '!')

You can use this function like this:

greet('Jon', 'spanish')
greet('Jon', 'french')
greet('Jon') # This will greet Jon in English as it is the default language

This is not the end of our journey with functions. There are still many many things to learn about them, but this should give you a good start.

Putting it all together

In the lecture, I will combine all of these concepts and tie them to the 📝 W02 Formative Exercise. I will share the code I produced during the lecture with you via Nuvolos and Slack later.

Footnotes

  1. You first came across this term on Task 2 of 📝 W01 Exercise↩︎

  2. You first encountered this term on Task 4 of 📝 W01 Formative Exercise and on How we gather & store data section of 🧑‍🏫 Week 01 Lecture↩︎

  3. Primitive types were the subject of Part II of the 💻 W01 Lab.↩︎

  4. Lists and dictionaries were the subject of Part V of the 💻 W01 Lab.↩︎

  5. We have been throwing JSON files at you since the beginning of the course! Most recently, we asked you to save your data in a JSON file in Task 3 of the 📝 W02 Formative Exercise.↩︎

  6. More keen-eyed students might have noticed that this is similar to the way we opened a connection to a file with the with statement in Task 3 of the 📝 W02 Formative Exercise.↩︎