DS105A – Data for Data Science
🗓️ 16 Oct 2025
16:00 – 16:15
In your 📝 W03 Practice, you navigated through files and folders using the Terminal. This might have felt quite different from clicking around with your mouse!
Let’s understand what’s happening under the hood.
A computer has four key components:
Examples of Operating Systems:
Ubuntu is a Linux distribution. Windows and macOS are proprietary operating systems.
Each operating system has its own history and philosophy:
UNIX & Linux
Philosophy: Open, portable, simple
Windows
Philosophy: User-friendly, commercial
💡 Why this matters: Different OSes organise files differently and use different Terminal commands. Understanding this prevents frustration when tutorials use OS-specific syntax.
16:15 – 16:30
Each OS organises files and directories differently. Every file has a path (its address in the system).
The directory structure starts from a single root directory called /
:
💡 macOS uses /Users/
instead of /home/
, but the structure is similar.
Windows uses drive letters (C:, D:, E:) where each drive is a separate filesystem:
⚠️ Windows uses \
(backslash) for paths, Mac/Linux use /
(forward slash).
We can specify file locations in two ways:
Absolute Path:
/home/user/Documents/data.csv
C:\Users\Username\Documents\data.csv
Relative Path:
./data.csv
→ file in current directory../data.csv
→ go up one level firstdata/weather.json
→ file in data subfolder💡 TIPs
pwd
in Terminal to find where you are.16:30 – 16:35
Environment variables are system settings that programs use to find what they need.
Special variables that store system-wide settings:
PATH
)HOME
)PWD
)TEMP
)🤔 Think about it: When you type python
in Terminal, how does your computer know where the python program is?
Variable | Mac/Linux | Windows | Purpose |
---|---|---|---|
$HOME |
/home/username/ |
%USERPROFILE% |
User’s home directory |
$PATH |
/usr/local/bin:/usr/bin/ |
C:\Windows\System32\;... |
Where OS looks for programs |
$PWD |
/home/username/Documents |
%CD% |
Current directory |
$TEMP |
/tmp/ |
%TEMP% |
Temporary files |
Try it now:
16:35 – 16:50
Great notes from your 📝 W03 Practice! Let’s build on what you discovered.
UNIX/Mac/Linux (Nuvolos too!)
💡 Same goals, different commands. In this course, we use UNIX commands (Mac/Linux/Nuvolos).
Not all files are the same! Understanding the difference helps you know which tools to use.
Plain Text Files:
.py
), Markdown (.md
)cat
, edit with nano
or VS CodeBinary Files:
.png
, .jpg
), PDFs, databases, executablesThe two most common data formats you’ll work with:
CSV (Comma-Separated Values)
date,max_temp_c,conditions
2025-10-13,18,cloudy
2025-10-14,22,sunny
2025-10-15,19,rainy
Let me show you how to save and read data files so you don’t need to re-download from APIs every time.
Demo Notebook: week03/W03-NB02-File-Formats.ipynb
We’ll cover:
💡 This is critical for your 📝 W04 Practice: collect data once, save it, work from the saved file!
Save API data so you don’t need to download it repeatedly:
What’s happening:
with open()
creates a connection to the file'w'
means write modejson.dump()
writes the dictionary as JSONindent=2
makes it human-readableRead saved data without making another API call:
Benefits:
Transform structured data into simple tabular format:
import csv
dates = loaded_data['daily']['time']
temps = loaded_data['daily']['temperature_2m_max']
# Write to CSV
with open('data/july_2024_temps.csv', 'w', newline='') as f:
writer = csv.writer(f)
# Header row
writer.writerow(['date', 'max_temp_c'])
# Data rows
for i in range(len(dates)):
writer.writerow([dates[i], temps[i]])
print("CSV created!")
💡 We use range(len(dates))
to iterate through indices, accessing both lists at the same position.
Read CSV data back into Python:
This pattern matches what you learned in DataQuest this week!
Use JSON when:
Example: Weather data with location info, multiple measurements, metadata
Use CSV when:
Example: Final heatwave counts per year (simple table)
💡 W04 Practice workflow: Collect from API → Save as JSON → Analyze → Export summary as CSV
Before our coffee break, let’s connect what you learned in 📝 W03 Practice to what we’ve covered today. Turn to someone near you (2-3 people).
Your task (3 minutes total):
Share one discovery (1.5 minutes): Each person briefly shares one thing that clicked from the Terminal work or file format exploration. Could be about pwd
/ls
/cd
, plain text vs binary files, or anything else.
Share one challenge (1.5 minutes): What is something that you found difficult or confusing this week?
💡 Post in Slack: Drop your pair’s key insight in a thread on the #social
channel. What was the most interesting connection you made?
16:50 – 17:00
After the break:
17:00 – 18:00
This hour runs like a lab session. Follow along on your laptop!
Git:
GitHub:
💡 Think of it this way: Git is the engine, GitHub is the parking garage where you store your car.
By the end of this hour, you will have:
my-ds105a-notes
Everything you need is in the guide on Moodle/website. We’ll work through it together!
This workflow pattern will become muscle memory:
We’ll practice this ceremony multiple times today. By W04, it’ll feel natural!
Open the guide: Using GitHub & Git for Version Control
We’ll work through each step together.
💡 Don’t rush! Git feels overwhelming at first, but we’ll practice repeatedly over the coming weeks. Today is just exposure.
See you in Friday’s lab where you’ll practice Git workflows and start your 📝 W04 Practice!
💬 Questions? Post in the #help
channel on Slack.
LSE DS105A (2025/26)