ποΈ Week 02 β Day 02: Web scraping tricks
+ live demo of Generative AI (GitHub Copilot) for debugging code
π₯ Learning Objectives
Review the goals for today
At the end of the day you should be able to:
- Create your own website with just Markdown (no HTML or CSS required)
- Choose between CSS and XPath selectors to scrape data from websites
- Write human-readable scraping code
- Store scraped data in a structured format
- Use Generative AI tools, such as GitHub Copilot to debug code
πΈοΈ Web scraping tricks
Todayβs lecture material is all contained in the following Jupyter notebook:
π Save the notebook on the ME204/code
folder.
π Take-Home Activity: Creating a website with Markdown
I suggest you take some time to complete this activity later today or tomorrow. It will help you to consolidate your knowledge of Markdown and start preparing you for the final project.
Yesterday, we experimented with creating a very simple HTML page. Your goal here is to recreate that page, only this time using Markdown language β the same language we have been using within our Jupyter notebooks.
π― ACTION POINTS:
If you havenβt done so already, head to GitHub and create an account. Choose a username that you like and that you will be happy to share with others.
π‘ TIP: Choose a username that you wouldnβt mind using in a more professional, serious setting. GitHub can be used for personal projects but it is perhaps more commonly used for professional reasons, such as for hosting a portfolio of your data science projects.
Create a personal website for yourself on GitHub. Follow all the steps from GitHub Pagesβ βCreating your websiteβ tutorial.
On Step 2, pay close attention to the instructions. For example, if your GitHub username is
jonjoncardoso
, then the name of your repository should bejonjoncardoso.github.io
.On Step 6, choose the
main
branch.
Once you have created your website, go to
<your-username>.github.io
and see your website live!- If you donβt see anything or you donβt know how to follow this intruction, call me over and I will help you.
Now, edit the
README.md
file in your repository and add the following metadata to the top of the file:--- title: "YOUR NAME" subtitle: "YOUR OCCUPATION" ---
Click on the green button that says βCommit changesβ and then go back to your website. After a couple of minutes, you should see the changes reflected on your website.
Go back to editing your
README.md
file. After the metadata, add some Markdown code to create the same structure as the HTML code from yesterday.- Before committing your changes, you can click on the βPreviewβ tab to see how your Markdown code will be rendered.
π Another useful reference page: Markdown Cheatsheet.
Your goal now is to emulate the same structure of the website we created yesterday, but using Markdown instead of HTML. You will have to figure out which Markdown syntax to use to create the same HTML structure from yesterday!
π‘ TIP: Weβre doing all this because at the end of this course, you will have to produce a website just like this, using Markdown, to showcase your data manipulation skills in your final project.