graph LR style EDIT color:#DC3445, fill:white, stroke:#DC3445; style STATUS color:white, fill:#DC3445; style ADD color:white, fill:#DC3445; style COMMIT color:white, fill:#DC3445; style PUSH color:white, fill:#DC3445; EDIT[Edit code] -- "'What have<br>I changed?'" --> STATUS[git status] STATUS --> EDIT EDIT -- "'I'm happy with<br>those changes'" --> ADD[git add] ADD -- "'I finished a significant <br> portion of my code' <br> OR 'I need to stop for today'" --> COMMIT[git commit] COMMIT -- "write a <br> commit message" --> PUSH[git push] COMMIT --> EDIT PUSH --> EDIT
Using Git &
GitHub for Version Control
![Winter Term Icon for LSE DS105 - Created to reflect the journey of data transformation and insight discovery. Icon representing the themes of data transformation and insight discovery.](../../../figures/visual-identity/2024_2025_DS105W_icon_200px.png)
Understanding Git & GitHub
We will be using Git and GitHub extensively in this course. Here’s a quick overview of what they are and why they are important.
What is Git?
Git is like a time machine for your files. It’s a tool that helps you:
- Track changes in your code over time
- Experiment with new features without breaking code that is working
- Collaborate with others without overwriting each other’s work
- Recover previous versions if something goes wrong (it often does)
Think of Git as taking snapshots of your project folder at different points in time. Each snapshot (called a “commit
”) records exactly what all your files looked like at that moment.
What is GitHub?
While Git is the tool that tracks changes on your computer, GitHub is a website that hosts Git projects (called “repositories” or “repos”). It’s like a social network for code where you can:
- Store your code online
- Share your work with others
- Collaborate on projects
- Build a portfolio of your coding projects
Think of GitHub as Google Drive for code, but with superpowers for tracking changes and collaborating.
Alternatives to GitHub
GitHub is by far the most popular platform for hosting Git repositories, but there are other options available:
All of these platforms will offer the same git
commands and features, but they may have different user interfaces. For example, GitLab’s version of ‘Pull Requests’ are called ‘Merge Requests’ (which is a more accurate name, to be honest).
Setting Up GitHub & Git
Here are the steps you need to follow to set up your GitHub account and configure Git on your computer.
1️⃣ Creating a GitHub Account
Visit GitHub and create an account if you don’t have one yet.
You will be asked to provide:
- an e-mail address. It doesn’t have to be your LSE e-mail address. Just use an e-mail that you actively check.
- a username. This is the most important part, as you will need to provide us with your username later on, for graded assignments.
A bit more on usernames:
You don’t need to create an obscure anonymous username, unless you want to. Assignments in this course, although private, will not be anonymous..
In fact, we suggest you use a professional username for your account.
In this course, we will encourage you to build a professional coding portfolio on GitHub. Your username is the first thing people will see when they look at your portfolio
💡 For extra security, you might want to enable 2-Factor Authentication on your account, too.
2️⃣ Creating a private repository
A GitHub repository (or ‘repo’) is like a project folder that lives on the internet. Unlike Google Drive or Dropbox which automatically sync your files, GitHub requires you to explicitly tell it when to save changes using Git commands. This manual approach gives you more control over what changes are shared and when.
When should you create a repository? Any time you start a new coding or personal project or want to organise a collection of related files. For this course, you’ll create separate repositories for your assignments and projects.
Did you arrive here from the W03 Lecture or Lab?
In 🗣️ W03 Lecture and in the 💻 W03 Lab, we will encourage you to practice creating a private repo and moving your Nuvolos file to it. When it’s time, follow the steps from Creating a new repository from the GitHub website, while making sure to adhre to the following requirements:
- Repository name:
my-ds105w-notes
- Visibility: Private
- Initialize this repository with: Add a README file (check this option)
- Add .gitignore: Python (select this option)
3️⃣ Set Up Git Authentication
Just like you need to log in to access your email or social media, you need to prove to GitHub that you’re allowed to make changes to your repositories. We’ll use the GitHub CLI (Command Line Interface) tool for this.
💡 In Nuvolos, the GitHub CLI is already installed.
If you want to set it on your own computer, which is not recommended at this stage in the course – it will distract you from the main learning objectives, you’ll first need to:
- Download Git from git-scm.com
- Install the GitHub CLI from cli.github.com
Once you have the GitHub CLI installed, you will then be able to use the gh
command from the Terminal to authenticate with GitHub:
gh auth login
You will be asked:
? Where do you use GitHub? [Use arrows to move, type to filter]
> GitHub.com
Other
Select GitHub.com (with the arrow keys of your keyboard) and press Enter.
Next, you will be asked:
? What is your preferred protocol for Git operations on this host?
HTTPS
> SSH
Select SSH (it is the most standard and secure option) and press Enter.
When asked:
? Generate a new SSH key to add to your GitHub account? (Y/n)
Type the letter Y and press Enter.
For the next two options, don’t type anything. Just press Enter to accept the default values:
? Enter a passphrase for your new SSH key (Optional):
? Title for your SSH key: GitHub CLI
In the end, you will see this message:
? How would you like to authenticate GitHub CLI?
> Login with a web browser
Paste an authentication token
Select Login with a web browser and press Enter.
A message like the following will appear:
! First copy your one-time code: D5C8-18E9
Press Enter to open https://github.com/login/device in your browser...
Do as it says and either click on the link or copy the URL and paste it in your browser. You will be asked to log in to GitHub on your browser (if you are not already) and then you will be asked to paste the one-time code. Do so and press Enter.
Finally, you will see a message like this:
✓ Authentication complete.
- gh config set -h github.com git_protocol ssh
✓ Configured git protocol
! Authentication credentials saved in plain text
✓ Uploaded the SSH key to your GitHub account: /home/datahub/.ssh/id_ed25519.pub
✓ Logged in as <your-username>
You are now authenticated with GitHub!
Final configuration steps
It might sound weird but after authenticating with GitHub, you need to tell Git who you are.
These settings only need to be done once on each computer you use.
The first configuration you need to set is your name. It can be your full name, a shorter version of it or the same as your GitHub username. In the Terminal, adapt the command below, replacing Your Name
with your actual name:
git config --global user.name "Your Name"
Next, you need to set your e-mail address. Use the same e-mail address you used to create your GitHub account:
git config --global user.email "Your E-mail"
Here are others you might want to set:
# Set the default text editor for Git
git config --global core.editor "nano"
# Make sure git commands are colored because why not?
git config --global color.ui auto
4️⃣ Cloning a Repository
💡Remember: you only clone a repository once to create a local copy of the repository on your computer. After cloning, you can make changes to the files and push them back to the remote repository on GitHub. You don’t need to clone again
To clone a repository, you need to know the URL of the repository. You can find this on the GitHub website by clicking on the green “Code” button and copying the URL. Make sure to copy the “SSH” URL, not the “HTTPS” one.
This URL will look something like this:
git@github.com:username/repository.git
To clone the repository, use the git clone
command followed by the URL:
git clone <repository-url>
That’s it! Now to interact with the repository, you can cd
into the repository folder and start making changes or fetching updates from the remote repository.
5️⃣ The Ceremony of Uploading Changes
When you make changes to the files in your repository, you need to tell Git to track those changes. There is a whole ceremony to this, and it will feel overly complicated at first but will become second nature after a few weeks.
In most DS105 assignments, you will submit your work by pushing your code to your GitHub repository, not by uploading files to Nuvolos or Moodle. This helps you get practice with industry-standard tools and workflows.
Explanation of the Git Workflow
Every time you make changes to your code, you will follow this workflow to ensure your changes are tracked and saved to your onlinne repository:
Add the changes to the staging area:
git add /path/from/repo-folder/to/file
or simply add all changes:
git add .
This is like saying, “Hey Git, I might want to save these changes. Keep an eye on them.” You can add multiple files or folders at once and you can repeat this step as many times as you need.
Commit the changes to the repository once you are happy with the changes you’ve made:
git commit -m "A brief message describing the changes you made"
This is more like saying, “OK, Git, I’m sure I want to save these changes. Make a note of them.” You should commit changes whenever you finish a significant portion of your work or when you need to stop for the day.
You can keep adding and committing changes as many times as you like. Each commit is like a snapshot of your project at that moment in time.
Push the changes to the remote repository on GitHub:
git push
Now this is when you say to Git, “Remember all those changes I’ve made and am committed to? Send them to the online repository on GitHub.” This is how you share your work with others and how you submit your assignments.