πŸ’» Week 07 - Lab Roadmap (90 min)

DS105 - Data for Data Science

Author

Anton Boichenko

Published

02 November 2022

During the last weeks, you have been exploring your computer and the Internet. You know now how to extract data from web pages and conduct robust data collection. Now, it’s time to learn how to format your data science projects.

For projects to run smoothly they must be clear to the members of your team. Today we will be exploring tools that help us achieve that:

  1. GitHub
  2. Markdown language

In this lab, we will recreate a small project that would allow you to understand both of these tools.

Part 1: Welcome to GitHub (45 min)

Part 1: Welcome to GitHub

The first thing we will do is explore GitHub. GitHub is a collaborative tool for developers with an estimated number of users of 83 million. It allows developers to track changes in their code and collaborate in a straightforward and sustainable way.

When applying to jobs related to data science, you will most likely be asked about your GitHub skills. It is because they are crucial for data science teams.

Let’s get ourselves to GitHub.

🀝 WORKING TOGETHER

  1. Register on GitHub if you haven’t done so yet.
  2. Create your first repository (repo for short) and call it DS-105-playground. Make sure to make it private and add a README file.
  3. Generate your personal token using this instruction.
  4. Using git clone command in the prompt clone the whole repository folder to your Desktop. It will ask for your username and your password. Provide it with your username and your token instead of your password.

Now we have a copy of the GitHub repo on our machine. What we can do now is go ahead and change it.

🎯 ACTION POINTS

  1. Use the prompt and navigate to the folder you have just cloned.
  2. Using vim or nano change the README file to have your name and the name of your program.

Let’s add those changes to the file on GitHub.

🀝 WORKING TOGETHER

  1. cd to the folder of your GitHub repo on your local machine.
  2. Use git add * to add all the changes to the commit.
  3. Use git commit -m "message" to make a commit. Replace β€œmessage” with text that describes what you did in that repo.
  4. Use git push to push the changes to the GitHub repo.
  5. Go to your GitHub repo page and check if the changes took place.
Part 2: Let’s make it prettier (45 min)

Part 2: Let’s make it prettier

We have added some information to the README file. However, we would usually want it to tell us quite a bit about the project we are doing. It should look pretty. For instance, this repository by a big coding team contains a lot of information.

Let’s try and make our README file as beautiful. README files on GitHub are written using Markdown language. It is a very simple language that you can implement straight away. Follow the steps to make your README beautiful.

🎯 ACTION POINTS

  1. Download a code editor such as VScode.
  2. Open the README file using the editor.
  3. Explore the Markdown cheatsheet.
  4. Create the following in your README file:
    • the first-level heading that says β€œWeek 5 Lab”
    • your name in bold and the name of your programme in italics
    • the second level heading that says β€œA little about me”

Now it’s time to go even further!

  1. Create a folder in your DS-105-playground folder called img.
  2. Add your picture to this folder.
  3. Go back to your README file and add that picture after the second heading. Use the cheatsheet to understand how.
  4. Add some information about yourself and format it in any way you want.
  5. Commit and push the changes to GitHub and check if your changes have taken place. Don’t forget to commit it with a descriptive message.
🏠 Take-home exercise

🏠 Take-home exercise

The following exercises will help you prepare for your 20-minutes group presentation next week (18 November 2022).

Next week, when you start collecting data for your group project, try to use all the different tools you have learned about in this course so far:

  • Inside the DS-105-playground, create either a Jupyter Notebook (if using Python) or an R Markdown file (if using R)
  • Write your web scraping or API code to collect the data
  • Convert the data you collected to a data frame
  • Compute summary statistics of your data: how many rows did you get? how many columns?
  • Add pieces of markdown here and there so you can understand your own code later.
  • Can you take it to the next level? Try to produce a couple of plots!
  • Now, save your notebook and go to the terminal to commit and push your notebook to your Github repository.
    • Open your github repository on your browser and navigate to the notebook you just uploaded via git. You should be able to see the markdown, code and even plots rendered in your browser.