📝 W04 Formative Exercise - Learn Git as you revise

2024/25 Autumn Term

Author
Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

Note (22 October 2024):

Your first graded assignment is now available. Head over to the ✍️ W06 Summative (20%) page to read the instructions.

However, unless you are an experience coder, I highly recommend completing this formative first! It will equip you with the basic Git skills you need to complete the summative.

Context

How long will this take?

  • I won’t be able to give you a precise estimate, but I suggest reserving 2 hours to work on this exercise with clear focus and attention to detail.

⏲️ Due Date:

  • Friday, 25 October 2024, 8.00 PM (the day of the W04 labs)

📃 Submission

  • Submission is made entirely via the GitHub repository you created for this exercise. We will assess you based on the contents of your repository by the due date. We will IGNORE any commits made after the deadline.

  • This is not a graded exercise, but you will get individual feedback on your work. To receive feedback, you need to have added all instructors to your GitHub repository as collaborators and have pushed your changes to the repository by the due date.

    • This is not graded but your class teacher will consider your work when determining your class grade if you are a 🧳 General Course student.
  • Do you need an extension? Fill out the extension request form and 📧 .

🎯 Main Objectives:

The goal of this exercise is to help you practice the following skills:

  • Create a new GitHub repository.
  • Add files to the repository.
  • Use the basic Git commands to sync your files with the repository.
  • Create a summary of what you have learned in the course so far.

📋 NOTE: We will not provide any files directly. You are expected to create your own files and directories as needed through the Nuvolos Cloud terminal.

Task 1: Create a GitHub Repository

🎯 ACTION POINTS:

  1. Visit GitHub and create an account if you don’t have one yet.

    • 💡 Throughout this course we will encourage you build a professional coding portfolio on GitHub, so it is important to choose a professional username for your account.
  2. Create a new private repository on GitHub and name it ds105a-w04-exercise.

    Follow the instructions on the Creating a new repository from the web UI section of the GitHub documentation but also make sure to adhre to the following requirements:

    • Repository name: ds105a-w04-exercise
    • Visibility: Private
    • Initialize this repository with: Add a README file (check this option)
    • Add .gitignore: Python (select this option)

📋 What is a GitHub repository?

GitHub is the most popular platform used by developers around the world to store code and collaborate on coding projects with others. A repository is a place on GitHub where you can store your code.

It helps to think of a GitHub repository (aka. a ‘repo’) as a folder that lives on the internet, similar to how you have folders on Google Drive, Dropbox or OneDrive. However, while on these other services, you don’t need to press any button to sync your files, on GitHub you need to do this manually by using the git command line tool.

By the way, Git and GitHub are two different things! Git is a software tool you can run from the command line (your Terminal shell) to manage the different versions of your code while GitHub is a website that stores your code in the cloud and lets you manage the versions of your files using git commands or via their website. There are other Git providers out there: GitLab, Bitbucket, etc.

  1. Add your instructors as collaborator to the repository.

    Click on the Settings tab of your repository, then on the left-hand side menu, click on the ‘Collaborators and teams’ option. Click on the ‘Add people’ button and type our GitHub usernames. Make sure to give me ‘Maintain’ access to the repository.

    Our GitHub usernames are:

    Figure 1. The location of the ‘Add People’ button in the GitHub repository settings.
  2. Authenticate your GitHub account using the Terminal.

    On the VS Code within Nuvolos, open a new terminal and type the following command:

    gh auth login

    👉 When asked “Where do you use GitHub?”, select the option “GitHub.com”

    👉 When asked “What is your preferred protocol for Git operations on this host?”, select “SSH” 1.

    • You might get asked for a ‘Title for your SSH key’. You can leave it blank and press Enter. You can also accept the default file location for the SSH key by pressing Enter.

    • If asked for a passphrase, you can leave it blank and press Enter.

    👉 When asked “How would you like to authenticate GitHub CLI?”, select “Login with a web browser”. Hit Enter afterwards to follow the link they provide, type in the one-time code on the new window and then authorize the connection with your GitHub account. After you have successfully authenticated, you can close the browser window and return to the terminal.

    👉 You should see a message indicating that the authentication was complete. This means you can now use git commands from the Terminal to sync files to and from GitHub.

What if I want to connect from my own computer too?

You can follow precisely the same steps above once you have installed the following software on your computer:

  • Install Git on your computer.
  • Install GitHub CLI on your computer to have access to the gh command.

You might need to close and reopen your terminal after installing the software to make sure the new commands are available.

  1. Create a clone of your repository. Right now, your repository only exists on GitHub’s servers. If you want to open it on VS Code, you will need to create a local copy of it. This is called a ‘clone’.

    In the terminal, navigate to the directory where you want to store your repository (I suggest /files on Nuvolos), and type:

    git clone <repository-ssh-url>

    Replace <repository-ssh-url> with the URL of your repository. See the image below to find the URL of your repository. Don’t forget to remove the < and > characters! We use them just to indicate that you should replace the text inside them.

    Figure 2. Where to find the SSH URL of your repository on GitHub.

    💡 You only need to clone the repository once to be able to sync your files. You will only need to type the command above again if you decide to clone your repo to another computer.

  2. cd to the cloned folder and run this:

    git config --global core.editor "nano"

    The above will make sure that the app nano is used when you need to write a commit message (more on that later).

📋 Advantages of having your code on a Git repository:

  • Automatic backups: Your code is stored in the cloud. If you lose your computer, you can always download your code again from GitHub.
  • Work from any device: You can work on your code from within your computer, from inside the Nuvolos Cloud, or from any other computer, as long as it has an internet connection and Git installed.
  • Version control: You can keep track of the changes you make to your code and if you need to go back to something you wrote but deleted a few days ago, you can do that easily. No more “final_final_final_v2.py” files!
  • Collaboration: You can work with others on the same codebase. You can see what changes they made, and they can see what changes you made. You can also work on different parts of the code at the same time without stepping on each other’s toes.

Task 2: The Git rituals

You created a repository on GitHub which will act as your ‘project folder’ for this exercise. Now, we need to learn the basic Git commands to sync your files with the repository.

graph LR
    style EDIT color:#e26a4f, fill:white, stroke:#e26a4f;
    style STATUS color:white, fill:#e26a4f;
    style ADD color:white, fill:#e26a4f;
    style COMMIT color:white, fill:#e26a4f;
    style PUSH color:white, fill:#e26a4f;


EDIT[Edit code] -- "'What have<br>I changed?'" --> STATUS[git status]
STATUS --> EDIT
EDIT -- "'I'm happy with<br>those changes'" --> ADD[git add]
ADD -- "'I finished a significant <br> portion of my code' <br> OR 'I need to stop for today'" --> COMMIT[git commit]
COMMIT -- "write a <br> commit message" --> PUSH[git push]

COMMIT --> EDIT
PUSH --> EDIT

Figure 3. The basic Git workflow you will follow in this exercise.

🎯 ACTION POINTS:

  1. Add a figure that represents you to your repository.

    Create a new directory called figures in your repository. Inside this directory, upload an image that represents you. If you are doing this from Nuvolos, not your computer, use curl to download an image from the internet or right-click on the VS Code file explorer and click on ‘Upload…’.

    You should end up with the following directory structure:

    ds105a-w04-exercise/
    ├── README.md
    ├── .gitignore
    └── figures/
        └── your-image.jpg

    If you go to the GitHub website and navigate to your repository, you will be able to confirm that NOTHING has changed yet. This is because you need to tell Git that you want to track this new file using the git add command.

  2. Check what files have changed.

    In the terminal, make sure you are inside the ds105a-w04-exercise directory and type:

    git status

    This command will show you what files have changed since the last time you synced your repository. You should see that the figures/ folder is listed under ‘untracked files’ in red.

    💡 The git status command tells you what you have changed compared to the last time you synced your repository. It won’t tell you if you’ve made changes to the same repo on another computer, for example.

  3. Add the new file to the staging area.

    To tell Git that you want to track the figures/your-image.jpg file, you need to add it to the staging area. This is done with the git add command. In the terminal, type:

    git add figures/your-image.jpg

    This would have also worked: git add figures/ (it would add all files inside the figures/ directory).

    If you run git status again, you should see that the figures/ file is now listed as a ‘new file’ in green. You can keep editing your files and adding them to the staging area as many times as you want. The git status command will always show you what you have changed since the last time you synced your repository.

    If you go to the GitHub website again you will notice that once again NOTHING has changed. There are still two more steps to go before the file is actually stored on GitHub’s servers.

  4. Commit the changes.

    To tell Git to ‘make a record of these set of changes’, you use git commit. The commit is what you will see on the GitHub website when you look at the history of your repository.

    In the terminal, type:

    git commit -m "Add a figure that represents me"

    You must provide a string message after the -m flag. This message should be a short description of what you did in this commit 2.

    If you forget to add the -m flag, Git will open the nano editor for you to write a commit message. If this happens, write your message, then press Ctrl + X, then Y and then Enter to save the message and exit the editor. The lines that contain # will be ignored by Git.

    If you run git status again, you will see that there are no changes to commit. This means that Git has recorded the changes you made to the figures/your-image.jpg file.

    💡 I recommend that you commit your changes every time you made a change that is significant enough to be saved. For example, when you reworked a piece of the README file, when you add a new section to your Jupyter Notebook, when you fix a problem in your code, when you reorganise your code, when you add a new set of files to your repository, etc.

  5. Push the changes to GitHub.

    After one or multiple commits, you can push (upload) your changes to GitHub’s servers. This is done with the git push command. In the terminal, type:

    git push

    That’s it. If you have Internet access, your changes will be uploaded to GitHub’s servers. You can now go to the GitHub website and see the changes you made to your repository.

    At this point, everyone with access to the repository can see the changes you made. They can also download the files you uploaded to their own computers.

    💡 You don’t need to push your changes after every single commit, but it’s probably a good idea. This way, you don’t forget to back up your changes.

How do I get changes I pushed to GitHub on another computer?

If you have made changes to your repository from another computer, you can sync those changes to your current computer by running:

git pull

This command will download the changes from GitHub’s servers to your computer. If you have made changes on both your computer and another computer, you might need to resolve conflicts. This is a more advanced topic that we will cover in future exercises.

Task 3: The real exercise

Your goal is to use this repository to store a summary of what you have learned in the course so far and which will be useful in the future. You can use markdown files (beyond the README.md), Jupyter Notebooks, images, etc. to demonstrate what you have learned so far. The precise format of your summary is up to you. We covered a lot of ground, so the challenge here is how to synthesise this information in a way that is useful to you in the future. Feel free to use AI tools like ChatGPT or Notebook LM to help you with that.

💡 Refer to the 🚀 Challenge Task of the 💻 Week 03 Lab for an idea of the type of problem sets you will have to solve in the near future. Once the instructions are released, you will find that the ✍️ W06 Summative (20%) will be very similar to that task.

Keep in mind that we care a lot more about your process than your final product. When giving feedback, we will look at the history of commits of your repository to understand how you grew the repository over time and how you changed your mind about what to include and what to exclude in your summary.

What are we looking for?

📋 SPECIFICATION CARD:

  • The README.md file includes a heading ‘About me’ that contains a picture. The picture is visible when we open the README file on the GitHub website.
  • The README.md file includes a list of topics you cover in your repository, with links to the files where you cover them.
  • The summary of what you have learned is clear and easy to read.
  • There is demonstration of code in the repository, either inside Jupyter Notebooks (.ipynb), or as code highlighted in Markdown files (.md).
  • Whenever Markdown is used, it is formatted nicely and is easy to read.
  • If there are Jupyter Notebooks in the repository and they include code, the code runs without errors and produces the expected output when we Restart and Run All Cells.
  • If you used Generative AI tools (ChatGPT, NotebookLM, etc.), there is a demonstration of how they have been helpful to your learning process.
A README.md template

Here’s a starting point for what you could include in your README.md file:

# What do I know?

This repository contains my self-reflection notes on everything I've learned in the [DS105A (2024/25) course](https://lse-dsi.github.io/2024/autumn-term) so far.

## What you will find here

Here you will find my notes on the following topics:

- Operating Systems and file systems
- Terminal shells (e.g. the `bash` shell on Linux)
- The different data file formats
- Python essentials (with some code demonstrations)
- Python tricks for dealing with files
- Text formatting using Markdown
- Jupyter Notebooks

and more!

\#TODO: Shorten or expand on the topics above as needed and add links to the Notebook(s) throughout.

## How I used AI in this revision

\#TODO: If I end up using an AI tool for this, I will explain how I did it here or add a link to a separate `.md` file or Jupyter Notebook where I explain it.

## About me

\#TODO: Add a short bio + a picture that represents me.

💡 Remember to commit & push frequently!

Footnotes

  1. Read about what SSH means here (or ask us about it on Slack)↩︎

  2. Do you want to impress markers? Write great Git commit messages↩︎