πŸ—“οΈ Week 03
Navigating Computer Filesystems with the Terminal and Intro to GitHub

DS105W – Data for Data Science

06 Feb 2025

1️⃣ Operating Systems

16:03 – 16:15

In your πŸ“ W03 Formative, you navigated through files and folders of a computer (on Nuvolos but still, a computer) in a way that might have felt quite different from your usual experience.

What an Operating Systems (OS) does

  • A computer can be divided into four parts:
    • hardware β€” provides the basic computing resources for the system
    • application programs β€” define how these resources are used
    • operating system β€” controls the hardware and coordinates its use among the various application programs for the various users
    • user β€” a person or a bot (a computer script) that requests actions from the computer.

user User app Application Programs (compilers, web browsers, development kits, etc.) user->app os Operating System app->os hardware Computer Hardware (CPU, memory, I/O devices, etc.) os->hardware

A few examples (some are vintage):

Why do different OSes exist?

A 2-minute history of Operating Systems.

A computer from the 1950s
(Computer History Museum n.d.)

  • In the early days of modern computing, when computers were not accessible to everyone, software (applications) typically came with their source code open.
  • Open source means you can read precisely which instructions the computer will follow when running.
  • As the industry grew, most software companies released only the binaries β€” a type of file you can only execute, not read as if it was a text.
    • This includes Operating Systems! ⏭️

UNIX

  • UNIX was the first big Operating System, developed at Bell Labs and AT&T
  • It aimed to be simple* and easy to port to any hardware architecture
  • But, it required a license
  • In the late 1980s and early 1990s, a group of hackers and activists developed free & open source alternatives to UNIX.

How the UNIX System III looks like.

How the UNIX System III looks like.

GNU/Linux

  • This led to the birth of one of the most influential operating systems: GNU/Linux, or simply Linux.

  • Android, the most popular OS for phones worldwide, is based on Linux.

GNU stands for β€œGNU is not Unix”. Computer nerds love a recursive joke.

macOS

  • macOS is the Operating System of Apple computers
  • It is a hybrid system. It has a free, open-source component called Darwin, but it also includes proprietary, closed-source components.
  • iOS, Apple’s mobile operating system, is also based on Darwin
  • Darwin is based on BDS UNIX, a derivative of the original UNIX system.

Windows

  • Windows has its own history.
  • Microsoft and IBM co-developed its predecessor, the OS/2 operating system.
  • But then, Microsoft took on its own path and developed its own versions of the OS: Windows NT, Windows 95, Windows 98, Windows 2000, Windows XP, Windows 7, Windows Vista*, etc.
  • Windows popularity can be traced to the success of the Office suite

2️⃣ Filesystems

16:15 – 16:30

  • Each Operating System use a different system and standard for organising files and directories πŸ—ƒοΈ
  • Every file has a path (its address in the system).
  • The Operating System controls and enforces permissions (who can read/write files).

A filing cabinet full of shelves and folders

A folder full of files

Typical Filesystem Structure in UNIX-like Systems

The directory structure of a Linux computer typically looks like this:

root / bin bin root->bin dev dev root->dev etc etc root->etc home home root->home lib lib root->lib mnt mnt root->mnt proc proc root->proc namedroot root root->namedroot sbin sbin root->sbin tmp tmp root->tmp usr usr root->usr var var root->var jonathan jonathan home->jonathan documents Documents jonathan->documents images Images jonathan->images videos Videos jonathan->videos downloads Downloads jonathan->downloads workspace Workspace documents->workspace ds105 DS105 workspace->ds105 usr_lib lib usr->usr_lib usr_bin bin usr->usr_bin usr_include include usr->usr_include var_log log var->var_log var_mail mail var->var_mail var_spool spool var->var_spool var_tmp tmp var->var_tmp

The root directory is /.

This is also very similar to how macOS organises its files, with a few small differences. For example, instead of /home, macOS uses /Users.

Typical Filesystem Structure in Windows

Windows has a drive letter system, where each drive is a separate filesystem. The most common drives are C:\, D:\, E:\, etc. Here is what a typical Windows filesystem looks like (starting from the C:\ drive):

C_drive C:\ ProgramFiles Program Files C_drive->ProgramFiles ProgramFilesX86 Program Files (x86) C_drive->ProgramFilesX86 Users Users C_drive->Users Windows Windows C_drive->Windows Temp Temp C_drive->Temp Jonathan Jonathan Users->Jonathan Documents Documents Jonathan->Documents Workspace Workspace Jonathan->Workspace Images Images Jonathan->Images Videos Videos Jonathan->Videos Downloads Downloads Jonathan->Downloads AppData AppData Jonathan->AppData DS105 DS105 Workspace->DS105 Local Local AppData->Local LocalLow LocalLow AppData->LocalLow Roaming Roaming AppData->Roaming System32 System32 Windows->System32

The root directory is C:\.

How HOME directories are organised

Each OS also imposes a standard structure for user directories.

Mac

/
  Users/
    username/
    β”œβ”€β”€ Documents/
    β”‚   β”œβ”€β”€ project_notes.txt
    β”‚   β”œβ”€β”€ research_paper.pdf
    β”œβ”€β”€ Downloads/
    β”‚   β”œβ”€β”€ dataset.csv
    β”‚   └── software_installer.sh
    β”œβ”€β”€ Pictures/
    β”‚   β”œβ”€β”€ vacation.jpg
    β”‚   └── screenshots/
    β”‚       β”œβ”€β”€ screenshot1.png
    β”‚       └── screenshot2.png
    └── Desktop/
      β”œβ”€β”€ todo_list.txt
      └── shortcuts/

Windows

C:\
  Users\
    Username\
    β”œβ”€β”€ Documents\
    β”‚   β”œβ”€β”€ project_notes.txt
    β”‚   β”œβ”€β”€ research_paper.pdf
    β”œβ”€β”€ Downloads\
    β”‚   β”œβ”€β”€ dataset.csv
    β”‚   └── software_installer.exe
    β”œβ”€β”€ Pictures\
    β”‚   β”œβ”€β”€ vacation.jpg
    β”‚   └── Screenshots\
    β”‚       β”œβ”€β”€ screenshot1.png
    β”‚       └── screenshot2.png
    └── Desktop\
      β”œβ”€β”€ todo_list.txt
      └── shortcuts

πŸ’‘ One annoying difference: Windows uses \ (backward slash) for directories, while Mac/Linux use / (forward slash). There are OS-agnostic ways to specify paths to files so that the same code runs in both Win and Unix systems!

Ways to specify paths

We can specify the location of a file, its path, in two ways:

Absolute Path:

  • The full location of a file/folder from the root directory.
  • Works no matter where you are in the system.
  • Examples:
    • Mac/Linux: /home/user/Documents/project_notes.txt
    • Windows: C:\Users\Username\Documents\project_notes.txt

Ways to specify paths

We can specify the location of a file, its path, in two ways:

Relative Path:

  • The location relative to where you are in the filesystem.
  • More flexible for scripts and automation.
  • Examples:
    • ./project_notes.txt -> Refers to a file in the current directory.
    • ../project_notes.txt -> Goes one level up before looking for the file.

πŸ’‘ If you are on the Terminal, type pwd to find out where you are! When writing Python code, use os.getcwd().

Examples in Action

πŸ–₯ Current Directory: /home/user/Documents/ (Mac/Linux)

Absolute Path: /home/user/Documents/project_notes.txt
Relative Path: ./project_notes.txt

πŸ–₯ Current Directory: C:\Users\Username\Documents\ (Windows)

Absolute Path: C:\Users\Username\Documents\project_notes.txt
Relative Path: .\project_notes.txt

πŸ’‘ ALWAYS use relative paths in this course. It makes your work reproducible
(I don’t have a /home/johnDoe/ folder in my computer)

3️⃣ Environment Variables

16:30 – 16:35

There is one important concept that will inevitably come up to you when working with the Terminal and particularly when installing command-line tools or setting up your Python environment: Environment Variables.

What Are Environment Variables

  • There are these special variables (almost in the same sense as Python variables) that store information about the operating system and user preferences.

  • They store system-wide settings that help programs find what they need.

  • They are used to configure and control, amongst many other things, software behaviour, file paths, and access credentials, execution paths.

πŸ€” Think about it: When I install python, how does my computer know where the python program is?

Common Environment Variables Across OSes

Just as a reference, here are some common environment variables you might encounter:

Variable Mac/Linux Windows Purpose
$HOME /home/username/ %USERPROFILE% β†’ C:\Users\Username\ User’s home directory
$PATH /usr/local/bin:/usr/bin/ C:\Windows\System32\;... Where the OS looks for programs
$PWD /home/username/Documents %CD% β†’ C:\Users\Username\Documents Current directory
$SHELL /bin/bash or /bin/zsh N/A Default shell
$TEMP /tmp/ %TEMP% β†’ C:\Users\Username\AppData\Local\Temp\ Temporary files location

πŸ—’οΈ A note on notation: Windows uses %VAR%, while Mac/Linux use $VAR for environment variables.

How to check an environment variable?

  • Mac/Linux (Terminal)

    echo $HOME
    echo $PATH
  • Windows (PowerShell)

    echo $env:USERPROFILE
    echo $env:PATH

πŸ’‘ Try it now! What’s your $HOME or %USERPROFILE% path?

Modifying Environment Variables

πŸ›  How to set an environment variable (Temporary Change)

πŸ”Ή Changes disappear after closing the Terminal!

  • Mac/Linux

    export MY_VAR="Hello World"
    echo $MY_VAR
  • Windows (PowerShell)

    $env:MY_VAR="Hello World"
    echo $env:MY_VAR

πŸ’‘ Useful for temporary configuration during a session.

Why Is This Important?

πŸ” When you type python or git, the OS searches for executables inside the directories listed in $PATH.

πŸ–₯ Check your PATH:

echo $PATH  # Mac/Linux
echo $env:PATH  # Windows

πŸ›  Adding a new directory to PATH (Temporary Change)

export PATH=$PATH:/home/user/custom-bin  # Mac/Linux
$env:PATH += ";C:\Users\Username\custom-bin"  # Windows

πŸ’‘ This is useful when installing software that doesn’t set itself up automatically!

Using Environment Variables in Python

Accessing environment variables in Python is easy with the os module:

import os
print(os.environ['HOME'])  # Works on Mac/Linux
print(os.environ.get('USERPROFILE'))  # Works on Windows

You can also set up new environment variables from Python:

os.environ['MY_VAR'] = 'Hello from Python'
print(os.environ['MY_VAR'])

πŸ’‘ This is often used for API keys, custom paths, and configurations in scripts.

If you decide to install Python or Git in your own computer, moving away from using Nuvolos, you will definitely have to deal with environment variables.

4️⃣ Terminal & Files
(Plain Text vs Binary)

16:35 – 16:50

Here are some great summaries I spotted in your πŸ“ W03 Formative on Nuvolos. Can you spot your own notes there?

There were some very good and meaningful summaries!

A cheatsheet of UNIX commands Mac or Linux

Here are the commands you might want to use when working on Nuvolos (or if you have a Mac or Linux machine):

  • cd [directory]: Change directory
  • ls: List files and directories
  • pwd: Print working directory
  • mkdir [directory]: Create a new directory
  • touch [file]: Create a new file
  • rm [file]: Remove a file
  • rm -r [directory]: Remove a directory and its contents

File Operations

  • cat [file]: Display the contents of a file
  • cp [source] [destination]: Copy a file or directory
  • mv [source] [destination]: Move or rename a file or directory
  • grep [pattern] [file]: Search for a pattern in a file
  • chmod [permissions] [file]: Change file permissions

Process Management

  • ps: List running processes
  • kill [PID]: Terminate a process
  • top: Display system activity and process information

A cheatsheet of Windows PowerShell commands

If you have a Windows machine, you can do the same things! It just happens that the commands are a bit different:

  • cd [directory]: Change directory
  • dir: List files and directories
  • pwd: Print working directory
  • mkdir [directory]: Create a new directory
  • ni [file]: Create a new file
  • rm [file]: Remove a file
  • rm -r [directory]: Remove a directory and its contents

File Operations

  • gc [file]: Get the contents of a file
  • cp [source] [destination]: Copy a file or directory
  • mv [source] [destination]: Move or rename a file or directory
  • sls [pattern] [file]: Search for a pattern in a file
  • icacls [file] /grant [permissions]: Change file permissions

Process Management

  • ps: List running processes
  • kill [PID]: Terminate a process
  • tasklist: Display running tasks

There were also some good notes about reading JSON

There were also some good notes about writing JSON

Speaking of Files…

The two most common types of file formats you will encounter in this course are:

  • JSON (JavaScript Object Notation) β†’ Best for structured data, commonly used in APIs.
  • CSV (Comma-Separated Values) β†’ Simple format, often used for tabular data. Simpler than JSON but less flexible.

πŸ’‘ Understanding these formats is crucial when working with data files in Python!

The use of JSON

JSON is commonly used for API responses, data exchange between programs, and configuration files. And, as you know, it supports nested structures.

πŸ–₯ Example JSON File:

{
    "name": "Alice",
    "age": 25,
    "languages": ["Python", "JavaScript"],
    "location": {
        "city": "London",
        "country": "UK"
    }
}

πŸ’‘ As mentioned last week, when we parse JSON into Python, they become an object that could be treated as a mix of dictionaries and lists.

The use of CSV

CSVs are great for tabular data like spreadsheets or databases. They are simple and widely supported, but they don’t support nested structures.

  • Each row represents a record.
  • Columns are separated by commas (,).
  • No nested structuresβ€”just raw, structured data.

πŸ–₯ Example CSV File:

name,age,city,country
Alice,25,London,UK
Bob,30,New York,USA
Charlie,28,Berlin,Germany

πŸ’‘ Easily opened in Excel, Google Sheets, or Python!

Plain text files

Despite differences, both JSON & CSV are stored as plain text files!

This means:

  • You can open them in any text editor (VS Code, Notepad, Nano).
  • They are human-readable and editable.

πŸ’‘ Try opening a .csv or .json file in a text editor like nano or VS Code.

Binary files

Binary files are different. They store data in a format that is not human-readable. Examples include:

  • Images (.png, .jpg)
  • Audio (.mp3, .wav)
  • Video (.mp4, .avi)
  • Executables (.exe, .app)
  • Databases (.db, .sqlite)

🍡 Quick Coffee Break

16:50 – 17:00

After the break:

  • Hands-On Introduction to Git & GitHub

5️⃣ Git & GitHub

17:00 – 18:00

This whole hour will be like a lab session.

Everything you need to know is under the Using GitHub & Git for Version Control guide, found on Moodle/ the website.

Thanks!

THE END

References

Computer History Museum. n.d. β€œ1950 Timeline of Computer History.” 1950 Timeline of Computer History. Accessed September 16, 2022. https://www.computerhistory.org/timeline/1950/.
Silberschatz, Abraham, Peter B. Galvin, and Greg Gagne. 2005. Operating System Concepts. 7th ed. Hoboken, NJ: J. Wiley & Sons.