π» Week 07 - Lab Roadmap (90 min)
DS105 - Data for Data Science
During the last weeks, you have been exploring your computer and the Internet. You know now how to extract data from web pages and conduct robust data collection. Now, itβs time to learn how to format your data science projects.
For projects to run smoothly they must be clear to the members of your team. Today we will be exploring tools that help us achieve that:
- GitHub
- Markdown language
In this lab, we will recreate a small project that would allow you to understand both of these tools.
Part 1: Welcome to GitHub (45 min)
Part 1: Welcome to GitHub
The first thing we will do is explore GitHub. GitHub is a collaborative tool for developers with an estimated number of users of 83 million. It allows developers to track changes in their code and collaborate in a straightforward and sustainable way.
When applying to jobs related to data science, you will most likely be asked about your GitHub skills. It is because they are crucial for data science teams.
Letβs get ourselves to GitHub.
π€ WORKING TOGETHER
- Register on GitHub if you havenβt done so yet.
- Create your first repository (repo for short) and call it
DS-105-playground
. Make sure to make it private and add aREADME
file. - Generate your personal token using this instruction.
- Using
git clone
command in the prompt clone the whole repository folder to your Desktop. It will ask for your username and your password. Provide it with your username and your token instead of your password.
Now we have a copy of the GitHub repo on our machine. What we can do now is go ahead and change it.
π― ACTION POINTS
- Use the prompt and navigate to the folder you have just cloned.
- Using
vim
ornano
change theREADME
file to have your name and the name of your program.
Letβs add those changes to the file on GitHub.
π€ WORKING TOGETHER
cd
to the folder of your GitHub repo on your local machine.- Use
git add *
to add all the changes to the commit. - Use
git commit -m "message"
to make a commit. Replace βmessageβ with text that describes what you did in that repo. - Use
git push
to push the changes to the GitHub repo. - Go to your GitHub repo page and check if the changes took place.
Part 2: Letβs make it prettier (45 min)
Part 2: Letβs make it prettier
We have added some information to the README file. However, we would usually want it to tell us quite a bit about the project we are doing. It should look pretty. For instance, this repository by a big coding team contains a lot of information.
Letβs try and make our README
file as beautiful. README
files on GitHub are written using Markdown language. It is a very simple language that you can implement straight away. Follow the steps to make your README
beautiful.
π― ACTION POINTS
- Download a code editor such as VScode.
- Open the
README
file using the editor. - Explore the Markdown cheatsheet.
- Create the following in your
README
file:- the first-level heading that says βWeek 5 Labβ
- your name in bold and the name of your programme in italics
- the second level heading that says βA little about meβ
Now itβs time to go even further!
- Create a folder in your
DS-105-playground
folder calledimg
. - Add your picture to this folder.
- Go back to your
README
file and add that picture after the second heading. Use the cheatsheet to understand how. - Add some information about yourself and format it in any way you want.
- Commit and push the changes to GitHub and check if your changes have taken place. Donβt forget to commit it with a descriptive message.
π Take-home exercise
π Take-home exercise
The following exercises will help you prepare for your 20-minutes group presentation next week (18 November 2022).
Next week, when you start collecting data for your group project, try to use all the different tools you have learned about in this course so far:
- Inside the
DS-105-playground
, create either a Jupyter Notebook (if using Python) or an R Markdown file (if using R) - Write your web scraping or API code to collect the data
- Convert the data you collected to a data frame
- Compute summary statistics of your data: how many rows did you get? how many columns?
- Add pieces of markdown here and there so you can understand your own code later.
- Can you take it to the next level? Try to produce a couple of plots!
- Now, save your notebook and go to the terminal to commit and push your notebook to your Github repository.
- Open your github repository on your browser and navigate to the notebook you just uploaded via git. You should be able to see the markdown, code and even plots rendered in your browser.