Using the Reddit API for Data Collection

You will need this for your ✍️ Mini Project 2.
Using the Reddit API for Data Collection
Reddit is a rich source of data for social media analysis, sentiment analysis, and trend detection. This guide will walk you through the process of setting up access to the Reddit API and using it to collect data for your projects.
⚠️ IMPORTANT
When collecting data from Reddit or any social media platform:
- Always respect Reddit’s API Terms of Use
- Be mindful of privacy concerns when analysing user data
- Consider the ethical implications of your research
- Be aware of rate limits to avoid being temporarily blocked
Why Use the Reddit API?
The Reddit API allows you to:
- Collect posts and comments from specific subreddits
- Analyse trends and discussions across different communities
- Track how content spreads and evolves over time
- Gather data for sentiment analysis and opinion mining
- Study community dynamics and user behaviour
1️⃣ Create a Reddit Account
Before you can use the Reddit API, you need to have a Reddit account.
- Visit Reddit’s registration page
- Create an account using your email address
- Verify your email by clicking the link sent to your inbox
- Take note of your username and password as you’ll need them later
2️⃣ Create a Reddit App
To access the API, you need to create a Reddit app that will provide you with the necessary credentials.
- Log in to your Reddit account
- Navigate to App Preferences
- Scroll down to the “developed applications” section
- Click “create app” or “create another app” button
Fill in the required information:
- Name: Give your app a descriptive name (e.g., “DS105W (2024/25) Data Collection”)
- App type: Select “script”
- Description: Briefly describe what your app will do
- About URL: You can leave this blank for personal projects
- Redirect URI: Use
http://localhost:8000
for personal use
In case you are not a robot, you will need to click “I am not a robot” and then click “create app”.
Take note of your credentials:
- Client ID: The string under your app name (looks like a random string of letters and numbers)
- Client Secret: The string labelled “secret”
💡 TIP: Keep your Client ID and Client Secret secure. NEVER share them publicly or add them in a commit – EVER!. You will learn to use .env
files to store your credentials in Week 08 lecture.
3️⃣ Add the Reddit API to your Python Environment
On a project, say the ✍️ Mini Project 2, you will need to consult your Client ID and Client Secret, plus your Reddit username and password in a secure way. This is so NO ONE else has access to your API credentials.
This is very important!
Follow the steps below carefully:
Create a
.env
file (it’s just a plain text file) under the root of your project.That is, say you are working at the location
/files/mini-project-2
, then the location of the.env
file is/files/mini-project-2/.env
.Add the following lines to the
.env
file:CLIENT_ID=your_client_id CLIENT_SECRET=your_client_secret USERNAME=your_reddit_username PASSWORD=your_reddit_password
Save the file and close it.
Install the
pydotenv
package:pip install python-dotenv
This package will allow you to load the
.env
file in your project securely.Test that it works. On a python script or a notebook inside your project, write:
from dotenv import load_dotenv load_dotenv()
If you did everything correctly, you should see the credentials loaded into your environment.
print(os.getenv("CLIENT_ID"))
This should print the Client ID. (Remove the line before committing)
NEVER add the
.env
file to a public repository. Your repository should have a.gitignore
file that includes the.env
file.