🛣️ Week 03 - A tutorial of Git, GitHub and Markdown

Lab Roadmap (90 min)

Author

Today, you will exercise something every data science practitioner should know: how to use Git to keep track of your code and the git commands on the Terminal so that you don’t have to rely on a GUI (graphical user interface) to do so.

📚 Week 03 Preparation

Before coming to the labs and lectures this week, you must install the software listed on the 📋 Getting Ready page. If you have any issues with the installation, please use the #help-installation channel on Slack to get help from your peers and the teaching team.

🥅 Learning Objectives

  • Create a GitHub repository
  • Edit markdown files on GitHub
  • Familiarise yourself with the reading official documentation page
  • Set up your SSH keys on GitHub
  • Use git commands on the Terminal to manage your code:
    • git clone
    • git status
    • git add
    • git commit
    • git push
    • git pull
    • git log

📋 Lab Tasks

Here are the instructions for this lab. No need to wait for your class teacher. You can tackle them as soon as you come to the classroom.

Part 0: Export your chat logs (~ 3 min)

As part of the GENIAL project, we ask that you fill out the following form as soon as you come to the lab:

🎯 ACTION POINTS

  1. 🔗 CLICK HERE to export your chat log.

    Thanks for being GENIAL! You are now one step closer to earning some prizes! 🎟️

👉 NOTE: You MUST complete the initial form.

If you really don’t want to participate in GENIAL1, just answer ‘No’ to the Terms & Conditions question - your e-mail address will be deleted from GENIAL’s database the following week.

Part I - Introduction to Markdown (15 min)

For those aspiring to pursue a career in data science, having a well-maintained GitHub profile is something recruiters value a lot. GitHub is the most popular platform to showcase your work, coding, and data storytelling skills.

A GitHub profile is also an excellent way to learn and practice the markdown language. Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. It is widely used in GitHub, Jupyter notebooks, and many other places. We will use this language extensively throughout the course.

🎯 ACTION POINTS

Follow the steps below. Raise your hand if you get stuck. Your class teacher will be able to help you.

  1. Create your first repository on GitHub. This is a special repository that will serve as your GitHub profile.

    Your class teacher will show you the same steps as in the GitHub’s official documentation page: About a profile README.

  2. After you click the pencil icon, you will be taken to an editor to edit your profile’s README.md file.

  3. Copy the following text and paste it into the editor, replacing the gaps (< >) with the appropriate information:

Click here to see the text
## About me

I am a student at the _LSE_ studying < >.

I will use this GitHub profile to showcase my data science skills.

### Interests

- Python 
- Data Science
- < >
  1. Click on the Green Button that says ‘Commit Changes’.

    • You will be asked to write a commit message.
    • Write something like “Add profile README”
    • Then click on Commit Changes.

    Great! That was your first commit on this repo!

  2. 🗣️ CLASSROOM DISCUSSION

    Your class teacher will guide a conversation about the following questions:

    • What do you see on your GitHub profile now, and why is it different from the text you typed?
    • What do you think the symbols you typed (#, -, _) mean?
    • What is the purpose of the README.md file?
    • What do you think is a commit message? Why is it important to write a good commit message?

Part II - SSH Key Setup on GitHub (20 min)

This section assumes that you’ve attended the W02 lecture and are thus familiar with SSH keys. We also assume that you’ve already created a GitHub account as instructed on the 📋 Getting Ready page.

🎯 ACTION POINTS

Go through the following steps and raise your hand if you get stuck. Your class teacher will be able to help you.

  1. Let’s set up some configurations on your Git setup:

    git config --global user.name "<your_name>"
    git config --global user.email "<your_email>"

    This command tells git to set your name and email address as the default values for your commits. If you don’t set it up, you will get an error in the future parts of this tutorial.

  2. Create an SSH key on your machine using a key-generator program called ssh-keygen.

    Read the instructions from GitHub’s official website: Generating a new SSH key and adding it to the ssh-agent to find out how to do so. Remember to use the instructions appropriate for your Operating System.

  3. Let GitHub know about your SSH key by adding it to your GitHub account.

    Read the instructions from GitHub’s official website: Adding a new SSH key to your GitHub account to find out how to do so.

  4. Test that your SSH key works by connecting to GitHub.

    Read the instructions from GitHub’s official website: Testing your SSH connection to find out how to do so.

Cool. You are now all set up to use git commands from your terminal.

Part III - Clone your first Git repository (20 min)

The repository you created in Part I is a special repository that serves as your GitHub profile. It is not a good idea to use it to store your code. Instead, you should create a new repository for each project you work on.

🎯 ACTION POINTS

  1. Create a new repository on GitHub. You can call it ds105 or whatever you like. Make sure you tick the box that says, ‘Initialize this repository with a README’.

  2. Once it is ready, go to your repository on GitHub and copy the SSH URL. You will find it under the green button that says ‘Code’:

  3. On a Terminal, navigate to the folder where you want to store your repository. For example, if you want to store it in your home folder, type cd ~. Then, clone your repository by typing:

    git clone <url>

This convention is used in documentation to indicate a placeholder, i.e., something you need to replace with the actual value. It is common to add < and > around placeholders to make them stand out.

In this case, you must replace <url> with the URL you copied from GitHub.

  1. Check that the repository was cloned successfully and has the same README.md file you see on the GitHub website.

  2. Run git status to see the status of your repository.

Click here to check if you got the right output.
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

The message above means your local repository is in sync with the remote repository on GitHub! If you don’t see the exact same thing as above, ask your class teacher for help.

Part IV - Syncing your local repository with GitHub (45 min)

Time to learn the inner workings of git! 🤓 Pay extra attention to this section. These commands can be a bit tricky to understand at first.

💡 Git is more like a ‘Google Drive’ than an scp

Although not the perfect analogy, it helps to think of git as more like a cloud file storage. When you run Google Drive on your computer, the software automatically checks if your local files are in sync with those on the cloud. If they are not, it will automatically upload or download the files to make sure they are in sync.

Git is kind of like that, but you must manually keep track of the changes you make to your files.

🎯 ACTION POINTS

  1. On the terminal, create a new folder called week03 and add a new file called waiting.py. You can do so by typing:

    mkdir week03
    touch week03/waiting.py
  2. Add the following Python code to the file waiting.py:

    import time
    time.sleep(10)
    
    print('The waiting is complete.')
  3. Run git status again. You should see a list of untracked files.

Click here to check if you got the right output.
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
(use "git add <file>..." to include in what will be committed)
    week03/

nothing added to commit but untracked files present (use "git add" to track)

This time, the message indicates that git has detected that you created a new folder called week03 and a new file called week03/waiting.py, but it is not tracking them yet.

  1. Untracked files, as the name implies, are not tracked by git. Files will only be tracked once we have git added them. Let’s do that:

    git add week03/waiting.py

    This command tells git to add the file <file> to the staging area. You can add files to the staging area before sending them to GitHub. You can add as many files as you like to the staging area before you send them to GitHub.

  2. Run git status again. You should see a list of files that are ready to be committed.

Click here to check if you got the right output.
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
(use "git restore --staged <file>..." to unstage)
    new file:   week03/waiting.py

This time, the message indicates that git has detected that you created a new file called week03/waiting.py and that it is ready to be committed.

  1. Say you are happy with the changes you made to your files - you don’t want to modify anything else in the waiting.py file. You can now commit your changes by typing:

    git commit
    • You will be taken to your terminal’s defualt text editor where you can write a commit message. You MUST write a commit message.

    This command tells git to commit the changes you made to the files in the staging area. 💡 TIP: Start your message with a verb in the imperative mood. For example, “Add waiting.py” is a good commit message. “Added waiting.py” is not a good commit message.

  2. Run git status again. You should see a message that your branch is ahead of origin/main by 1 commit. IMPORTANT: Your changes have been committed but not sent to GitHub yet!

👨‍🏫 TEACHING MOMENT

Your teacher will help you understand the output of step 8 below and guide you through the end of the lab.

  1. Run git log to see your recent commits to your repository. You should see something like

    commit <some-ugly-huge-number> (HEAD)
    Author: <your_name> <your_email>
    Date:   <today's_date>
    
        Add waiting.py
    
    commit <another-ugly-huge-number> (origin/main, origin/HEAD, main)
    Author: <your_name> <your_email>
    Date:   <today's_date>
    
        Add profile README

    Depending on how you did things in Part I, your history log might look a bit different.

  • The first commit is the one you just made. It is marked as (HEAD) because it represents where you currently are in the history of your repository.
  • The huge number you see is the commit hash. It is a unique identifier for each commit. You can use this hash if you need to go back in time.
  • The second commit is the one you made in Part I of this lab. This commit can be identified by three names: origin/main, origin/HEAD, and main:
    • origin represents the remote repository on GitHub.

    • origin/main indicates that the remote repository on GitHub has a branch called main.

    • origin/HEAD indicates where the remote repository on GitHub is currently pointing to. If you go to your repository on GitHub, you will see that it is pointing to the main branch.

  1. Now, let’s send your changes to GitHub. You can do so by typing:

    git push

    This command tells git to send the committed changes to GitHub (the origin).

  2. Check that your changes have been committed by going to your repository on GitHub. You should see the file week03/waiting.py there. Run git status and git log to see how the output changed.

  3. Now, on GitHub, edit the file week03/waiting.py. Do it the same way you did in Part I of this lab. Change the number of seconds from 10 to 20. Commit your changes with a message like “Wait for 20 seconds instead of 10”.

  4. Back on your computer, run git pull. You should see a message like:

    Updating <path_to_your_repo>
    Fast-forward
    week03/waiting.py | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

    This command tells git to ‘download’ the latest changes from GitHub.

  5. Practice, practice, practice. Dedicate whatever time you have left to create and edit files both on GitHub and on your computer, and work out the right git commands to keep your local repository in sync with GitHub. Your class teacher will be able to help you if you get stuck.

    If you get super stuck, use #help-git on Slack to get help from your peers and the teaching team.


🏡 Take-Home Exercise

Only practice will make you a git master. Use the instructions below to help you practice git commands outside the classroom.

🎯 ACTION POINTS

  1. Generate some fake data. Create a new file called generate_fake_data.py inside the folder <your-github-repository>/exercises/week03-take-home and paste the following Python code inside:
Click here to see the code
import random

from faker import Faker

fake = Faker()

def generate_company():
    return {
        "name": fake.company(),
        "mission": fake.bs(),
        "catch_phrase": fake.catch_phrase()
    }


if __name__ == '__main__':

    print("Generating fake data...")

    # Generate a limited set of companies
    companies = [generate_company() for _ in range(5)]
    company_names = [company["name"] for company in companies]

    print("I have just generated the following companies:")
    print(company_names)
  1. Test that the script works. You can do so by typing:

    python3 generate_fake_data.py

    You might need to install the faker package first. You can do so by typing on the Terminal:

    pip install faker
  2. Commit and push your changes to GitHub. Add and commit your changes with a message like “Add script to create synthetic company data”.

Your first Git conflict

Let’s cause a conflict! 🤯

🎯 ACTION POINTS

  1. Go back to the GitHub repository on your browser and edit the file generate_fake_data.py

  2. Edit the file so that you create 10 fake company names instead of 5. You can do so by changing the line:

    companies = [generate_company() for _ in range(5)]

    to look like this:

    companies = [generate_company() for _ in range(10)]
  3. Commit your changes with a message like “Generate 10 companies instead of 5”.

  4. Now, DON’T git pull JUST YET! Instead, on your computer, edit the file generate_fake_data.py to create 20 fake company names instead of 5.

  5. Add and commit your changes with a message like “Generate 20 companies instead of 5”:

    git add generate_fake_data.py
    git commit -m "Generate 20 companies instead of 5"
  6. Beautiful. Now, see things go wrong by trying to push your changes to GitHub:

    git push

    You should get an error message! You now have a conflict! 😱

    There are two committed versions of the file now. You need to make a decision: do you want to keep the change you made on GitHub or the change you made on your computer?

  7. Let’s go for keeping the version you have on your computer. Do a bit of research online and try to figure out how to solve the conflict so that your GitHub repository has the version you have on your computer. It would be great if you were to share your solutions on the #help-git channel on Slack.

Footnotes

  1. We’re gonna cry a little bit, not gonna lie. But no hard feelings. We’ll get over it.↩︎