Using the Reddit API for Data Collection

Author

Dr Jon Cardoso-Silva

Published

11 March 2025

Icon representing the themes of data transformation and insight discovery.

You will need this for your ✍️ Mini Project 2.

Using the Reddit API for Data Collection

Reddit is a rich source of data for social media analysis, sentiment analysis, and trend detection. This guide will walk you through the process of setting up access to the Reddit API and using it to collect data for your projects.

⚠️ IMPORTANT

When collecting data from Reddit or any social media platform:

  1. Always respect Reddit’s API Terms of Use
  2. Be mindful of privacy concerns when analysing user data
  3. Consider the ethical implications of your research
  4. Be aware of rate limits to avoid being temporarily blocked

Why Use the Reddit API?

The Reddit API allows you to:

  • Collect posts and comments from specific subreddits
  • Analyse trends and discussions across different communities
  • Track how content spreads and evolves over time
  • Gather data for sentiment analysis and opinion mining
  • Study community dynamics and user behaviour

1️⃣ Create a Reddit Account

Before you can use the Reddit API, you need to have a Reddit account.

  1. Visit Reddit’s registration page
  2. Create an account using your email address
  3. Verify your email by clicking the link sent to your inbox
  4. Take note of your username and password as you’ll need them later

2️⃣ Create a Reddit App

To access the API, you need to create a Reddit app that will provide you with the necessary credentials.

  1. Log in to your Reddit account
  2. Navigate to App Preferences
  3. Scroll down to the “developed applications” section
  4. Click “create app” or “create another app” button

Figure 1. Creating a new Reddit app
  1. Fill in the required information:

    • Name: Give your app a descriptive name (e.g., “DS105W (2024/25) Data Collection”)
    • App type: Select “script”
    • Description: Briefly describe what your app will do
    • About URL: You can leave this blank for personal projects
    • Redirect URI: Use http://localhost:8000 for personal use

    In case you are not a robot, you will need to click “I am not a robot” and then click “create app”.

  2. Take note of your credentials:

    • Client ID: The string under your app name (looks like a random string of letters and numbers)
    • Client Secret: The string labelled “secret”

Figure 2. Locating your Reddit app credentials

💡 TIP: Keep your Client ID and Client Secret secure. NEVER share them publicly or add them in a commit – EVER!. You will learn to use .env files to store your credentials in Week 08 lecture.

3️⃣ Add the Reddit API to your Python Environment

On a project, say the ✍️ Mini Project 2, you will need to consult your Client ID and Client Secret, plus your Reddit username and password in a secure way. This is so NO ONE else has access to your API credentials.

This is very important!

Follow the steps below carefully:

  1. Create a .env file (it’s just a plain text file) under the root of your project.

    That is, say you are working at the location /files/mini-project-2, then the location of the .env file is /files/mini-project-2/.env.

  2. Add the following lines to the .env file:

    CLIENT_ID=your_client_id
    CLIENT_SECRET=your_client_secret
    USERNAME=your_reddit_username
    PASSWORD=your_reddit_password
  3. Save the file and close it.

  4. Install the pydotenv package:

    pip install python-dotenv

    This package will allow you to load the .env file in your project securely.

  5. Test that it works. On a python script or a notebook inside your project, write:

    from dotenv import load_dotenv
    load_dotenv()

    If you did everything correctly, you should see the credentials loaded into your environment.

    print(os.getenv("CLIENT_ID"))

    This should print the Client ID. (Remove the line before committing)

  6. NEVER add the .env file to a public repository. Your repository should have a .gitignore file that includes the .env file.