🛣️ Week 03 - A tutorial of Git, GitHub and Markdown
Lab Roadmap (90 min)
Today, you will exercise something every data science practitioner should know: how to use Git to keep track of your code and the git
commands on the Terminal so that you don’t have to rely on a GUI (graphical user interface) to do so.
📚 Week 03 Preparation
Before coming to the labs and lectures this week, you must install the software listed on the 📋 Getting Ready page. If you have any issues with the installation, please use the #help-installation
channel on Slack to get help from your peers and the teaching team.
🥅 Learning Objectives
- Create a GitHub repository
- Edit markdown files on GitHub
- Familiarise yourself with the reading official documentation page
- Set up your SSH keys on GitHub
- Use
git
commands on the Terminal to manage your code:git clone
git status
git add
git commit
git push
git pull
git log
📋 Lab Tasks
Here are the instructions for this lab. No need to wait for your class teacher. You can tackle them as soon as you come to the classroom.
Part 0: Export your chat logs (~ 3 min)
As part of the GENIAL project, we ask that you fill out the following form as soon as you come to the lab:
🎯 ACTION POINTS
🔗 CLICK HERE to export your chat log.
Thanks for being GENIAL! You are now one step closer to earning some prizes! 🎟️
👉 NOTE: You MUST complete the initial form.
If you really don’t want to participate in GENIAL1, just answer ‘No’ to the Terms & Conditions question - your e-mail address will be deleted from GENIAL’s database the following week.
Part I - Introduction to Markdown (15 min)
For those aspiring to pursue a career in data science, having a well-maintained GitHub profile is something recruiters value a lot. GitHub is the most popular platform to showcase your work, coding, and data storytelling skills.
A GitHub profile is also an excellent way to learn and practice the markdown language. Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. It is widely used in GitHub, Jupyter notebooks, and many other places. We will use this language extensively throughout the course.
🎯 ACTION POINTS
Follow the steps below. Raise your hand if you get stuck. Your class teacher will be able to help you.
Create your first repository on GitHub. This is a special repository that will serve as your GitHub profile.
Your class teacher will show you the same steps as in the GitHub’s official documentation page: About a profile README.
After you click the pencil icon, you will be taken to an editor to edit your profile’s
README.md
file.Copy the following text and paste it into the editor, replacing the gaps (
< >
) with the appropriate information:
Click here to see the text
## About me
I am a student at the _LSE_ studying < >.
I will use this GitHub profile to showcase my data science skills.
### Interests
- Python
- Data Science
- < >
Click on the Green Button that says ‘Commit Changes’.
- You will be asked to write a commit message.
- Write something like “Add profile README”
- Then click on Commit Changes.
Great! That was your first commit on this repo!
🗣️ CLASSROOM DISCUSSION
Your class teacher will guide a conversation about the following questions:
- What do you see on your GitHub profile now, and why is it different from the text you typed?
- What do you think the symbols you typed (
#
,-
,_
) mean? - What is the purpose of the
README.md
file? - What do you think is a commit message? Why is it important to write a good commit message?
Part II - SSH Key Setup on GitHub (20 min)
This section assumes that you’ve attended the W02 lecture and are thus familiar with SSH keys. We also assume that you’ve already created a GitHub account as instructed on the 📋 Getting Ready page.
🎯 ACTION POINTS
Go through the following steps and raise your hand if you get stuck. Your class teacher will be able to help you.
Let’s set up some configurations on your Git setup:
git config --global user.name "<your_name>" git config --global user.email "<your_email>"
This command tells
git
to set your name and email address as the default values for your commits. If you don’t set it up, you will get an error in the future parts of this tutorial.Create an SSH key on your machine using a key-generator program called
ssh-keygen
.Read the instructions from GitHub’s official website: Generating a new SSH key and adding it to the ssh-agent to find out how to do so. Remember to use the instructions appropriate for your Operating System.
Let GitHub know about your SSH key by adding it to your GitHub account.
Read the instructions from GitHub’s official website: Adding a new SSH key to your GitHub account to find out how to do so.
Test that your SSH key works by connecting to GitHub.
Read the instructions from GitHub’s official website: Testing your SSH connection to find out how to do so.
Cool. You are now all set up to use git
commands from your terminal.
Part III - Clone your first Git repository (20 min)
The repository you created in Part I is a special repository that serves as your GitHub profile. It is not a good idea to use it to store your code. Instead, you should create a new repository for each project you work on.
🎯 ACTION POINTS
Create a new repository on GitHub. You can call it
ds105
or whatever you like. Make sure you tick the box that says, ‘Initialize this repository with a README’.Once it is ready, go to your repository on GitHub and copy the SSH URL. You will find it under the green button that says ‘Code’:
On a Terminal, navigate to the folder where you want to store your repository. For example, if you want to store it in your home folder, type
cd ~
. Then, clone your repository by typing:git clone <url>
<url>
and not just url
?
This convention is used in documentation to indicate a placeholder, i.e., something you need to replace with the actual value. It is common to add <
and >
around placeholders to make them stand out.
In this case, you must replace <url>
with the URL you copied from GitHub.
Check that the repository was cloned successfully and has the same
README.md
file you see on the GitHub website.Run
git status
to see the status of your repository.
Click here to check if you got the right output.
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
The message above means your local repository is in sync with the remote repository on GitHub! If you don’t see the exact same thing as above, ask your class teacher for help.
Part IV - Syncing your local repository with GitHub (45 min)
Time to learn the inner workings of git
! 🤓 Pay extra attention to this section. These commands can be a bit tricky to understand at first.
💡 Git is more like a ‘Google Drive’ than an scp
Although not the perfect analogy, it helps to think of git as more like a cloud file storage. When you run Google Drive on your computer, the software automatically checks if your local files are in sync with those on the cloud. If they are not, it will automatically upload or download the files to make sure they are in sync.
Git is kind of like that, but you must manually keep track of the changes you make to your files.
🎯 ACTION POINTS
On the terminal, create a new folder called
week03
and add a new file calledwaiting.py
. You can do so by typing:mkdir week03 touch week03/waiting.py
Add the following Python code to the file
waiting.py
:import time 10) time.sleep( print('The waiting is complete.')
Run
git status
again. You should see a list of untracked files.
Click here to check if you got the right output.
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
week03/
nothing added to commit but untracked files present (use "git add" to track)
This time, the message indicates that git
has detected that you created a new folder called week03
and a new file called week03/waiting.py
, but it is not tracking them yet.
Untracked files, as the name implies, are not tracked by
git
. Files will only be tracked once we havegit add
ed them. Let’s do that:git add week03/waiting.py
This command tells
git
to add the file<file>
to the staging area. You can add files to the staging area before sending them to GitHub. You can add as many files as you like to the staging area before you send them to GitHub.Run
git status
again. You should see a list of files that are ready to be committed.
Click here to check if you got the right output.
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: week03/waiting.py
This time, the message indicates that git
has detected that you created a new file called week03/waiting.py
and that it is ready to be committed.
Say you are happy with the changes you made to your files - you don’t want to modify anything else in the
waiting.py
file. You can now commit your changes by typing:git commit
- You will be taken to your terminal’s defualt text editor where you can write a commit message. You MUST write a commit message.
This command tells
git
to commit the changes you made to the files in the staging area. 💡 TIP: Start your message with a verb in the imperative mood. For example, “Add waiting.py” is a good commit message. “Added waiting.py” is not a good commit message.Run
git status
again. You should see a message that your branch is ahead oforigin/main
by 1 commit. IMPORTANT: Your changes have been committed but not sent to GitHub yet!
👨🏫 TEACHING MOMENT
Your teacher will help you understand the output of step 8 below and guide you through the end of the lab.
Run
git log
to see your recent commits to your repository. You should see something likecommit <some-ugly-huge-number> (HEAD) Author: <your_name> <your_email> Date: <today's_date> Add waiting.py commit <another-ugly-huge-number> (origin/main, origin/HEAD, main) Author: <your_name> <your_email> Date: <today's_date> Add profile README
Depending on how you did things in Part I, your history log might look a bit different.
- The first commit is the one you just made. It is marked as (HEAD) because it represents where you currently are in the history of your repository.
- The huge number you see is the commit hash. It is a unique identifier for each commit. You can use this hash if you need to go back in time.
- The second commit is the one you made in Part I of this lab. This commit can be identified by three names:
origin/main
,origin/HEAD
, andmain
:origin
represents the remote repository on GitHub.origin/main
indicates that the remote repository on GitHub has a branch calledmain
.origin/HEAD
indicates where the remote repository on GitHub is currently pointing to. If you go to your repository on GitHub, you will see that it is pointing to themain
branch.
Now, let’s send your changes to GitHub. You can do so by typing:
git push
This command tells
git
to send the committed changes to GitHub (theorigin
).Check that your changes have been committed by going to your repository on GitHub. You should see the file
week03/waiting.py
there. Rungit status
andgit log
to see how the output changed.Now, on GitHub, edit the file
week03/waiting.py
. Do it the same way you did in Part I of this lab. Change the number of seconds from 10 to 20. Commit your changes with a message like “Wait for 20 seconds instead of 10”.Back on your computer, run
git pull
. You should see a message like:Updating <path_to_your_repo> Fast-forward week03/waiting.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
This command tells
git
to ‘download’ the latest changes from GitHub.Practice, practice, practice. Dedicate whatever time you have left to create and edit files both on GitHub and on your computer, and work out the right
git
commands to keep your local repository in sync with GitHub. Your class teacher will be able to help you if you get stuck.If you get super stuck, use
#help-git
on Slack to get help from your peers and the teaching team.
🏡 Take-Home Exercise
Only practice will make you a git
master. Use the instructions below to help you practice git
commands outside the classroom.
🎯 ACTION POINTS
- Generate some fake data. Create a new file called
generate_fake_data.py
inside the folder<your-github-repository>/exercises/week03-take-home
and paste the following Python code inside:
Click here to see the code
import random
from faker import Faker
= Faker()
fake
def generate_company():
return {
"name": fake.company(),
"mission": fake.bs(),
"catch_phrase": fake.catch_phrase()
}
if __name__ == '__main__':
print("Generating fake data...")
# Generate a limited set of companies
= [generate_company() for _ in range(5)]
companies = [company["name"] for company in companies]
company_names
print("I have just generated the following companies:")
print(company_names)
Test that the script works. You can do so by typing:
python3 generate_fake_data.py
You might need to install the
faker
package first. You can do so by typing on the Terminal:pip install faker
Commit and push your changes to GitHub. Add and commit your changes with a message like “Add script to create synthetic company data”.
Your first Git conflict
Let’s cause a conflict! 🤯
🎯 ACTION POINTS
Go back to the GitHub repository on your browser and edit the file
generate_fake_data.py
Edit the file so that you create 10 fake company names instead of 5. You can do so by changing the line:
= [generate_company() for _ in range(5)] companies
to look like this:
= [generate_company() for _ in range(10)] companies
Commit your changes with a message like “Generate 10 companies instead of 5”.
Now, DON’T
git pull
JUST YET! Instead, on your computer, edit the filegenerate_fake_data.py
to create 20 fake company names instead of 5.Add and commit your changes with a message like “Generate 20 companies instead of 5”:
git add generate_fake_data.py git commit -m "Generate 20 companies instead of 5"
Beautiful. Now, see things go wrong by trying to push your changes to GitHub:
git push
You should get an error message! You now have a conflict! 😱
There are two committed versions of the file now. You need to make a decision: do you want to keep the change you made on GitHub or the change you made on your computer?
Let’s go for keeping the version you have on your computer. Do a bit of research online and try to figure out how to solve the conflict so that your GitHub repository has the version you have on your computer. It would be great if you were to share your solutions on the
#help-git
channel on Slack.
Footnotes
We’re gonna cry a little bit, not gonna lie. But no hard feelings. We’ll get over it.↩︎