πŸ’» Week 08 Lab

Data Collection and Initial Transformation for Mini Project 2

Author
Published

14 March 2025

πŸ₯… Learning Goals
By the end of this lab, you should be able to: i) Set up Reddit API credentials securely, ii) Collect data from multiple subreddits, iii) Transform JSON API responses into structured DataFrames, and iv) Prepare the three core DataFrames required for your Mini Project 2 database.
DS105W course icon

Last Updated: 13 March 2025, 19:30 GMT

πŸ“Time and Location: Friday, 14 March 2025. Check your timetable for the exact time and location of your class.

πŸ“‹ Preparation

To prepare for this lab, ensure you have:

  • Attended the πŸ—£οΈ Week 08 lecture
  • Reviewed the requirements for ✍️ Mini-Project 2
  • Created a Reddit account (if you do not already have one)
  • Considered which subreddits you might want to analyse for your project

πŸ›£οΈ Lab Roadmap (90 min)

Note to class teachers: This is a flexible support session - assist students individually or conduct short demonstrations for common issues. Encourage peer support. Focus on helping students set up Reddit API credentials and collect initial data. The goal is for each student to have the three DataFrames ready by the end of the session.

🦸🏻 This lab is a SUPER TECH SUPPORT session!

Use the entire session to make progress on your ✍️ Mini-Project 2.

When seeking help:

  1. Clearly explain what you are trying to achieve
  2. Show what you have attempted so far
  3. Share any error messages you are encountering

🦾 Using AI tools instead of letting them use you: If you like to use Generative AI tools like ChatGPT (or ideally, GitHub Copilot) for coding, you can ask your class teacher for tips on how to prompt it effectively to generate code that is easier to understand and not unnecessarily complex (as is often the case with the current wave of AI-generated code).

Suggested Pace

(Feel free to proceed as you wish, but following this will make it easier for your class teacher to assist)

🎯 ACTION POINTS:

  1. Review the πŸ—£οΈ Week 05’s lecture notebooks, particularly NB00. It contains step-by-step guidance on setting up Reddit API credentials and collecting data, including important information on security and efficiency.

  2. Follow the instructions on the ✍️ Mini-Project 2 page and clone your designated GitHub repository.

  3. Create your NB01 and begin transferring the relevant code from the W05 lecture notebook to it.

  4. Start by collecting JSON data from a single subreddit. Always begin small. Break tasks into smaller components if needed. If needed, review notes on the fundamentals of lists, loops, dictionaries. This will save time in the long run.

  5. Check the πŸ—£οΈ Week 08’s lecture for useful code to convert JSON into appropriate DataFrames.

  6. Revise your code, click β€˜Restart’ on your notebook, then click β€˜Clear all outputs’ and finally click β€˜Run all’ to ensure everything runs smoothly from top to bottom. Enhance your code and Markdown before adding more data.

  7. Aim to reach a point where you can collect comments from the selected Reddit data.

  8. πŸ† If you feel confident, attempt to collect data from multiple subreddits.

  9. πŸ† If you feel very confident, try to collect data from multiple subreddits and their comments.

  10. πŸ† If you feel like a superstar, aim to get the data into a database!

Remember: The ✍️ Mini-Project 2 deadline is 26 March 2025, 8pm UK time. Make the most of this lab session!

πŸ“š Resources

Here are some useful references for the techniques you will need: