DS105A – Data for Data Science
🗓️ 15 Oct 2025
16:00 – 16:20
Let’s start by seeing where you are with the DataQuest lessons and then connect them to what you experienced in Week 01.
Through Mentimeter we’ll see how far you got with the 📝 W02 Formative DataQuest lessons:
💡 You feel behind tomorrow in the lab if you haven’t done all of these!.
We use Markdown to format Slack messages, in our 📚 Jupyter Notebooks, these slides, and all the course webpages you see on Moodle.
This is a **bold** text.
This is an _italic_ text.
[This is a link](https://lse.ac.uk/dsi)
`print("Hello, World!")`
```python # This is a code block print("Hello, World!") ```
The code blocks are just to represent code, not to execute it.
There are also headings in Markdown. They help structure content—not just make text big! (Those won’t work in Slack, by the way.)
# Title (H1)
## Section (H2)
### Sub-section (H3)
#### Sub-sub-section (H4)
Title (H1)
Section (H2)
Sub-section (H3)
Sub-sub-section (H4)
⚠️ Do not use #
just to make text bigger! It’s not what it represents.
Use it to create hierarchical demarcations of sections instead.
Let’s connect what you learned in DataQuest to the data work you’ve done. Form groups of 3-4 people around you.
Your task (5 minutes total):
Share your “aha moment” (2 minutes): Each person briefly shares one concept from DataQuest that clicked for them this week. Could be variables, data types, lists, or anything else.
Connect to Week 01 (2 minutes): Discuss together: How might these Python concepts relate to the DataFrame work you did last week?
Pick one insight (1 minute): Choose one connection your group found interesting to share in Slack.
💡 Post in Slack: Drop your group’s key insight in the thread on the #social
channel. I’ll synthesise common patterns in a moment.
16:20 – 16:40
To understand why my_list[0]
works, we need to understand how computers actually think.
Numbers, text, images, and sounds are all stored as sequences of 0s and 1s in your computer’s memory. Each 0 or 1 is called a bit.
Think of a bit as a tiny box:
\[ \require{color} \fcolorbox{black}{white}{$\phantom{0}$} \phantom{\leftarrow \text{a bit can have a value of $0$}} \]
Numbers, text, images, and sounds are all stored as sequences of 0s and 1s in your computer’s memory. Each 0 or 1 is called a bit.
Think of a bit as a tiny box:
\[ \require{color} \begin{array}{ccc} \fcolorbox{black}{#eeeeee}{0} & \leftarrow & \text{a bit can have a value of $0$} \end{array} \]
Numbers, text, images, and sounds are all stored as sequences of 0s and 1s in your computer’s memory. Each 0 or 1 is called a bit.
Think of a bit as a tiny box:
\[ \begin{array}{ccc} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \leftarrow & \text{OR it can have a value of $1$} \end{array} \]
but nothing else!
With more bits, we can represent more numbers. Here’s how 4 bits can represent 16 different numbers:
\[\begin{array}{ccccccccccccccc} \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 0 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 8 \\ \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 1 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 9 \\ \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 2 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 10 \\ \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 3 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 11 \\ \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 4 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 12 \\ \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 5 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 13 \\ \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 6 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#eeeeee}{0} & \rightarrow 14 \\ \fcolorbox{black}{#eeeeee}{0} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 7 & & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} & \rightarrow 15 \\ \end{array}\]Connection to the DataQuest lessons (📝 W02 Practice): when you create an integer in Python, you are telling your computer to reserve a fixed-size space in memory for a series of 0s and 1s.
In the early days of computing, text was represented using the ASCII table. ASCII uses 7-8 bits to represent each individual character. Here are some examples:
The letter ‘A
’ is represented by the number 65 encoded in binary as:
\[ \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \]
The letter ‘a
’ (lowercase) is represented by the number 97 encoded in binary as:
\[ \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \]
The linebreak character ‘\n
’ is represented by the number 10 encoded in binary as:
\[ \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#111111}{$\textcolor{white}{1}$} \fcolorbox{black}{#eeeeee}{0} \fcolorbox{black}{#eeeeee}{0} \]
When you create a variable in Python, you’re not creating a container for data. You’re creating a reference (a pointer) to a location in memory where the data lives.
What actually happens:
25.3
to binarytemperature
pointing to that locationWhen you reassign:
Python doesn’t change the old value. It stores 28.7
in a new location and updates the reference.
💡 Although you can’t change the size of the data types in ‘pure Python’, we will enforce this when we start working with numpy and pandas in the next few weeks.
Integer Storage:
int32
: Uses 32 bits (4 bytes)
Can store: -2,147,483,648 to 2,147,483,647
int64
: Uses 64 bits (8 bytes)
Can store: much larger numbers!
Why this matters:
A DataFrame with 1 million integers:
int32
: ~4 MB of memoryint64
: ~8 MB of memoryChoosing the right type saves memory!
Float Precision:
Python floats use 64 bits by default.
This is why sometimes:
Odly, the binary representation of numbers in programming isn’t always exact.
🤯 Click here to read more about this.
Lists and dictionaries don’t store the actual data. They, too, store references to where the data lives in memory.
Why this matters:
Collections are containers of references, not containers of actual values.
Why bother? Understanding this memory model helps you understand why some operations are fast (just changing a reference) and others are slow (copying actual data), and why some data structures work better for certain tasks.
16:40 – 17:05
Most of the time in data science, we work with collections of values.
In ‘pure Python’, the two most important collections are lists and dictionaries and they are the foundation for the more complex data structures we will learn about later in Weeks 03 and 04.
Think about the data you worked with in the (📝 W01 Practice) at the end. You saw DataFrames (tables) with multiple columns. What if we tried to store that data using separate lists?
This is why we need better data structures. More complex structures will let us connect related data with meaningful names instead of fragile positions.
Three critical problems:
Coordination nightmare: If you sort temps
, the other lists don’t follow. Your data is now corrupted.
Fragile connections: The relationship between dates[0]
, temps[0]
, and conditions[0]
exists only in your mind, not in the code.
Error-prone: Add a temperature but forget to add a date? Your lists are now different lengths.
Still, it’s important to understand how lists and dictionaries organise those references differently as these are the foundations for everything that is to come.
Lists: Sequential Memory
Lists store references in order. Python can quickly access any position because it knows exactly where each item lives in memory.
Fast: Access by position (temps[0]
)
Limitation: No meaningful names
Dictionaries: Named Memory
Dictionaries use a hash table to map keys to memory locations. The key becomes a meaningful label for the data.
Fast: Access by name (weather['temp']
)
Advantage: Self-documenting code
🤓 If you want to know more: Python dicts and memory usage
Most of the data you will work with will already come in a collection, but if you need to create one, here’s how you do it.
List
💡 You can use multiple lines for readability.
Accessing data differs between lists and dictionaries.
List
Lists are indexed by integers, starting at 0.
To get an element, you need to know its position in the list.
Let’s see how real weather data might look using a dictionary of lists:
# A dictionary of lists (like a DataFrame!)
weekly_weather = {
'date': ['2025-10-13', '2025-10-14', '2025-10-15'],
'temp': [18, 22, 19],
'humidity': [65, 58, 72],
'conditions': ['cloudy', 'sunny', 'rainy']
}
# Accessing data - column first, then row
all_temps = weekly_weather['temp'] # Gets [18, 22, 19]
monday_temp = weekly_weather['temp'][0] # Gets 18
tuesday_conditions = weekly_weather['conditions'][1] # Gets 'sunny'
This is very similar to how pandas DataFrames work! Each key is a column name, and each value is a list of data for that column. Notice the access pattern: dict['column'][row]
- same as df['column'][0]
from Week 01.
Let’s check your understanding. Go to the #social
channel on Slack and vote:
Scenario: You’re storing hourly temperature readings for London, and you want to look up the temperature at a specific time (like “14:00”).
Which structure would you use?
A. A simple list: [18, 22, 19, 23, 20]
B. A dictionary: {'09:00': 18, '12:00': 22, '15:00': 19, ...}
C. Separate lists for times and temperatures
D. A list of dictionaries
17:05 – 17:15
After the break:
17:15 – 17:50
So far, we’ve created our own lists and dictionaries manually, mostly to demonstrate the concepts. But in practice, we will collect it from data sources.
❌ Problem:
Typing weather data manually is tedious and error-prone.
✅ Solution:
In this course, we will use APIs (Application Programming Interfaces) to fetch live real data dynamically.
An API is like a vending machine:
In Python, we use a package called requests
to “talk” to APIs.
The requests
package does not come pre-installed with Python. You need to install it using pip
.
On VS Code, click on the
Menu icon then navigate to Terminal > New Terminal.
A window will pop up at the bottom of the screen.
In the terminal window, type pip install requests
and press Enter
.
Wait for the installation to complete.
The requests
package is now installed and ready to use on Jupyter Notebooks.
Explore their API documentation.
🚀 We will:
✅ We now have real-time weather data!
⏭️ Let’s inspect the response (live demo).
The response
you get from the API is just pure text, just a string that looks like a Python dictionary.
💡 TIP: Just because something looks like a dictionary or a list, it doesn’t mean it is.
To convert it into a Python dictionary, we use the json()
method:
Let me show you this API in action. We’ll:
During this demo, think about: How is this different from the clean CSV files you worked with in Week 01? What are the advantages and challenges of live data?
17:50 – 18:00
Here’s what is coming next. Next week, code-wise, we will be focused on transforming JSON data into clean and nicely tabular DataFrames.
Your knowledge of dictionaries and lists will be put to the test!
Dictionary structure:
Access a “column”:
JSON objects are a mixture of dictionaries and lists, nested inside each other. Your goal will be to transform them into nice tables!
Practice your lists and dictionaries skills! You need to be able to understand if a structure is a list or a dictionary, and how to manipulate them.
You practice some of it in the 💻 W02 Lab tomorrow but keep practicing further!
💻 W02 Lab (Friday)
📝 W03 Formative (next week)
Lots of new cool stuff awaits you!
for
loops and if-else
conditionals🎯 The Big Picture
Key Questions to Reflect On:
How do the Python fundamentals from DataQuest connect to the data analysis you did in the (📝 W02 Practice)?
What’s the relationship between memory and DataFrame operations?
How might APIs change the way you think about data collection?
Next week: We’ll transform complex JSON data into the clean DataFrames you’re already comfortable working with.
💬 Remember: Use the #help
channel on Slack for any questions that come up as you work through the lab tomorrow!
LSE DS105A (2025/26)