πŸ—“οΈ Week 02 – Day 02: Web scraping tricks

+ live demo of Generative AI (GitHub Copilot) for debugging code

Author
Published

16 July 2024

πŸ₯… Learning Objectives

Review the goals for today

At the end of the day you should be able to:

  • Create your own website with just Markdown (no HTML or CSS required)
  • Choose between CSS and XPath selectors to scrape data from websites
  • Write human-readable scraping code
  • Store scraped data in a structured format
  • Use Generative AI tools, such as GitHub Copilot to debug code

πŸ•ΈοΈ Web scraping tricks

Today’s lecture material is all contained in the following Jupyter notebook:

πŸ‘‰ Save the notebook on the ME204/code folder.

πŸ“‹ Take-Home Activity: Creating a website with Markdown

I suggest you take some time to complete this activity later today or tomorrow. It will help you to consolidate your knowledge of Markdown and start preparing you for the final project.

Yesterday, we experimented with creating a very simple HTML page. Your goal here is to recreate that page, only this time using Markdown language – the same language we have been using within our Jupyter notebooks.

🎯 ACTION POINTS:

  1. If you haven’t done so already, head to GitHub and create an account. Choose a username that you like and that you will be happy to share with others.

    πŸ’‘ TIP: Choose a username that you wouldn’t mind using in a more professional, serious setting. GitHub can be used for personal projects but it is perhaps more commonly used for professional reasons, such as for hosting a portfolio of your data science projects.

  2. Create a personal website for yourself on GitHub. Follow all the steps from GitHub Pages’ β€œCreating your website” tutorial.

    • On Step 2, pay close attention to the instructions. For example, if your GitHub username is jonjoncardoso, then the name of your repository should be jonjoncardoso.github.io.

    • On Step 6, choose the main branch.

  3. Once you have created your website, go to <your-username>.github.io and see your website live!

    • If you don’t see anything or you don’t know how to follow this intruction, call me over and I will help you.
  4. Now, edit the README.md file in your repository and add the following metadata to the top of the file:

    ---
    title: "YOUR NAME"
    subtitle: "YOUR OCCUPATION"
    ---
  5. Click on the green button that says β€œCommit changes” and then go back to your website. After a couple of minutes, you should see the changes reflected on your website.

  6. Go back to editing your README.md file. After the metadata, add some Markdown code to create the same structure as the HTML code from yesterday.

    • Before committing your changes, you can click on the β€˜Preview’ tab to see how your Markdown code will be rendered.

    πŸ”— Another useful reference page: Markdown Cheatsheet.

    Your goal now is to emulate the same structure of the website we created yesterday, but using Markdown instead of HTML. You will have to figure out which Markdown syntax to use to create the same HTML structure from yesterday!

πŸ’‘ TIP: We’re doing all this because at the end of this course, you will have to produce a website just like this, using Markdown, to showcase your data manipulation skills in your final project.