DS105W β Data for Data Science
06 Feb 2025
16:03 β 16:15
In your π W03 Formative, you navigated through files and folders of a computer (on Nuvolos but still, a computer) in a way that might have felt quite different from your usual experience.
A few examples (some are vintage):
Ubuntu is a Linux distribution. Windows XP and MacOS are proprietary operating systems.
A 2-minute history of Operating Systems.
A computer from the 1950s
(Computer History Museum n.d.)
Source: Wikimedia Commons - Rwoodsmall
This led to the birth of one of the most influential operating systems: GNU/Linux, or simply Linux.
Android, the most popular OS for phones worldwide, is based on Linux.
GNU stands for βGNU is not Unixβ. Computer nerds love a recursive joke.
See (Silberschatz, Galvin, and Gagne 2005, Appendix B) for more on Windows.
16:15 β 16:30
The directory structure of a Linux computer typically looks like this:
The root directory is /
.
This is also very similar to how macOS organises its files, with a few small differences. For example, instead of /home
, macOS uses /Users
.
Windows has a drive letter system, where each drive is a separate filesystem. The most common drives are C:\
, D:\
, E:\
, etc. Here is what a typical Windows filesystem looks like (starting from the C:\
drive):
The root directory is C:\
.
Each OS also imposes a standard structure for user directories.
/
Users/
username/
βββ Documents/
β βββ project_notes.txt
β βββ research_paper.pdf
βββ Downloads/
β βββ dataset.csv
β βββ software_installer.sh
βββ Pictures/
β βββ vacation.jpg
β βββ screenshots/
β βββ screenshot1.png
β βββ screenshot2.png
βββ Desktop/
βββ todo_list.txt
βββ shortcuts/
C:\
Users\
Username\
βββ Documents\
β βββ project_notes.txt
β βββ research_paper.pdf
βββ Downloads\
β βββ dataset.csv
β βββ software_installer.exe
βββ Pictures\
β βββ vacation.jpg
β βββ Screenshots\
β βββ screenshot1.png
β βββ screenshot2.png
βββ Desktop\
βββ todo_list.txt
βββ shortcuts
π‘ One annoying difference: Windows uses \
(backward slash) for directories, while Mac/Linux use /
(forward slash). There are OS-agnostic ways to specify paths to files so that the same code runs in both Win and Unix systems!
We can specify the location of a file, its path, in two ways:
Absolute Path:
/home/user/Documents/project_notes.txt
C:\Users\Username\Documents\project_notes.txt
We can specify the location of a file, its path, in two ways:
Relative Path:
./project_notes.txt
-> Refers to a file in the current directory.../project_notes.txt
-> Goes one level up before looking for the file.π‘ If you are on the Terminal, type pwd
to find out where you are! When writing Python code, use os.getcwd()
.
π₯ Current Directory: /home/user/Documents/
(Mac/Linux)
Absolute Path: /home/user/Documents/project_notes.txt
Relative Path: ./project_notes.txt
π₯ Current Directory: C:\Users\Username\Documents\
(Windows)
Absolute Path: C:\Users\Username\Documents\project_notes.txt
Relative Path: .\project_notes.txt
π‘ ALWAYS use relative paths in this course. It makes your work reproducible
(I donβt have a /home/johnDoe/
folder in my computer)
16:30 β 16:35
There is one important concept that will inevitably come up to you when working with the Terminal and particularly when installing command-line tools or setting up your Python environment: Environment Variables.
There are these special variables (almost in the same sense as Python variables) that store information about the operating system and user preferences.
They store system-wide settings that help programs find what they need.
They are used to configure and control, amongst many other things, software behaviour, file paths, and access credentials, execution paths.
π€ Think about it: When I install python
, how does my computer know where the python program is?
Just as a reference, here are some common environment variables you might encounter:
Variable | Mac/Linux | Windows | Purpose |
---|---|---|---|
$HOME |
/home/username/ |
%USERPROFILE% β C:\Users\Username\ |
Userβs home directory |
$PATH |
/usr/local/bin:/usr/bin/ |
C:\Windows\System32\;... |
Where the OS looks for programs |
$PWD |
/home/username/Documents |
%CD% β C:\Users\Username\Documents |
Current directory |
$SHELL |
/bin/bash or /bin/zsh |
N/A | Default shell |
$TEMP |
/tmp/ |
%TEMP% β C:\Users\Username\AppData\Local\Temp\ |
Temporary files location |
ποΈ A note on notation: Windows uses %VAR%
, while Mac/Linux use $VAR
for environment variables.
Mac/Linux (Terminal)
Windows (PowerShell)
π‘ Try it now! Whatβs your $HOME
or %USERPROFILE%
path?
π How to set an environment variable (Temporary Change)
πΉ Changes disappear after closing the Terminal!
Mac/Linux
Windows (PowerShell)
π‘ Useful for temporary configuration during a session.
π When you type python
or git
, the OS searches for executables inside the directories listed in $PATH
.
π₯ Check your PATH:
π Adding a new directory to PATH (Temporary Change)
export PATH=$PATH:/home/user/custom-bin # Mac/Linux
$env:PATH += ";C:\Users\Username\custom-bin" # Windows
π‘ This is useful when installing software that doesnβt set itself up automatically!
Accessing environment variables in Python is easy with the os
module:
import os
print(os.environ['HOME']) # Works on Mac/Linux
print(os.environ.get('USERPROFILE')) # Works on Windows
You can also set up new environment variables from Python:
π‘ This is often used for API keys, custom paths, and configurations in scripts.
If you decide to install Python or Git in your own computer, moving away from using Nuvolos, you will definitely have to deal with environment variables.
16:35 β 16:50
Here are some great summaries I spotted in your π W03 Formative on Nuvolos. Can you spot your own notes there?
I really like how this person wrote it in their own words.
It clearly shows a good understanding of the material.
Here are the commands you might want to use when working on Nuvolos (or if you have a Mac or Linux machine):
cd [directory]
: Change directoryls
: List files and directoriespwd
: Print working directorymkdir [directory]
: Create a new directorytouch [file]
: Create a new filerm [file]
: Remove a filerm -r [directory]
: Remove a directory and its contentscat [file]
: Display the contents of a filecp [source] [destination]
: Copy a file or directorymv [source] [destination]
: Move or rename a file or directorygrep [pattern] [file]
: Search for a pattern in a filechmod [permissions] [file]
: Change file permissionsps
: List running processeskill [PID]
: Terminate a processtop
: Display system activity and process informationIf you have a Windows machine, you can do the same things! It just happens that the commands are a bit different:
cd [directory]
: Change directorydir
: List files and directoriespwd
: Print working directorymkdir [directory]
: Create a new directoryni [file]
: Create a new filerm [file]
: Remove a filerm -r [directory]
: Remove a directory and its contentsgc [file]
: Get the contents of a filecp [source] [destination]
: Copy a file or directorymv [source] [destination]
: Move or rename a file or directorysls [pattern] [file]
: Search for a pattern in a fileicacls [file] /grant [permissions]
: Change file permissionsps
: List running processeskill [PID]
: Terminate a processtasklist
: Display running tasksThe two most common types of file formats you will encounter in this course are:
π‘ Understanding these formats is crucial when working with data files in Python!
JSON is commonly used for API responses, data exchange between programs, and configuration files. And, as you know, it supports nested structures.
π₯ Example JSON File:
{
"name": "Alice",
"age": 25,
"languages": ["Python", "JavaScript"],
"location": {
"city": "London",
"country": "UK"
}
}
π‘ As mentioned last week, when we parse JSON into Python, they become an object that could be treated as a mix of dictionaries and lists.
CSVs are great for tabular data like spreadsheets or databases. They are simple and widely supported, but they donβt support nested structures.
,
).π₯ Example CSV File:
name,age,city,country
Alice,25,London,UK
Bob,30,New York,USA
Charlie,28,Berlin,Germany
π‘ Easily opened in Excel, Google Sheets, or Python!
Despite differences, both JSON & CSV are stored as plain text files!
This means:
π‘ Try opening a .csv
or .json
file in a text editor like nano
or VS Code.
Binary files are different. They store data in a format that is not human-readable. Examples include:
.png
, .jpg
).mp3
, .wav
).mp4
, .avi
).exe
, .app
).db
, .sqlite
)16:50 β 17:00
After the break:
17:00 β 18:00
This whole hour will be like a lab session.
Everything you need to know is under the Using GitHub & Git for Version Control guide, found on Moodle/ the website.
THE END
LSE DS105W (2024/25)