🔐 Week 02 | Day 03
API Authentication Patterns

ME204 – Data Engineering for the Social World

Dr Jon Cardoso-Silva

23 July 2025

API Authentication: Your Reference Guide

This deck contains copy-pasteable patterns for common API authentication methods.

💡 Purpose: Use these slides as a reference when working with APIs that require authentication. Each pattern can be adapted for different services.

First Things First: What is a REST API?

Think of APIs as Official Data Access

Think of APIs as developers of a company giving you explicit permission to access their data, provided you adhere to their rules.

Another Real-World Analogy (than the one about a restaurant)

Library System:

  • You want a book → You ask the librarian
  • Librarian checks: Are you allowed? Which book? How many?
  • If approved → Librarian gets the book for you
  • You follow the library’s rules (return on time, handle carefully)

API System:

  • You want data → You make a request to the API
  • API checks: Valid credentials? Correct format? Within limits?
  • If approved → API returns the data you requested
  • You follow the API’s rules (rate limits, authentication, data usage)

REST API Visual Flow

rest_flow Your_Code Your Python Code (requests.get) Internet The Internet Your_Code->Internet 1. HTTP Request + Authentication Internet->Your_Code 6. Your Data! Company_API Company's API Server (Reddit, Twitter, etc.) Internet->Company_API 2. Validate & Process Company_API->Internet 5. JSON Response Database Company's Database Company_API->Database 3. Query Data Database->Company_API 4. Return Results

Why “REST” API?

REST = REpresentational State Transfer

What it means in practice:

  • Uses standard HTTP methods
    (GET, POST, PUT, DELETE)
  • URLs represent resources
    (/users/123, /posts/456)
  • Responses typically in JSON format
  • Stateless
    (each request is independent of previous ones)

Example REST URL patterns:

GET /users/johndoe          → Get user info
GET /users/johndoe/posts    → Get user's posts  
POST /posts                 → Create new post
PUT /posts/123              → Update post 123
DELETE /posts/123           → Delete post 123

HTTP Headers vs Parameters: The Fundamentals

Every HTTP Request Has Two Main Parts

🗂️ Headers (Metadata)

  • What they are: Instructions about the request
  • Where they go: Hidden from the URL
  • Purpose: Authentication, content type, user agent
  • Analogy: The envelope of a letter

Examples:

Authorization: Bearer abc123
Content-Type: application/json
User-Agent: MyApp/1.0

📋 Parameters (Query String)

  • What they are: The actual data you’re requesting
  • Where they go: Visible in the URL after ?
  • Purpose: Filtering, sorting, limiting results
  • Analogy: The address on the envelope

Examples:

?limit=10&sort=date&category=news

Visual: Headers vs Parameters in Action

http_request Request HTTP Request URL URL: https://api.reddit.com/r/python/hot?limit=10 Request->URL Contains Headers Headers (Hidden): Authorization: Bearer token123 User-Agent: MyApp/1.0 Request->Headers Contains Params Parameters (Visible): limit=10 sort=hot URL->Params Query string

Code Example: Headers vs Parameters

Wrong: Putting auth in parameters

# DON'T DO THIS - API key visible in URL!
url = "https://api.example.com/data"
params = {
    "api_key": "secret123",  # ❌ Exposed!
    "limit": 10
}
response = requests.get(url, params=params)

# URL becomes:
# https://api.example.com/data?api_key=secret123&limit=10

Correct: Using headers for auth

# DO THIS - API key hidden in headers
url = "https://api.example.com/data"
headers = {
    "X-API-Key": "secret123"  # ✅ Hidden!
}
params = {
    "limit": 10  # ✅ Data filtering only
}
response = requests.get(url, headers=headers, params=params)

Obviously: read the docs of the API you’re using to understand it!

Beyond Query Parameters: Other Types

So far we’ve only seen query parameters (in the URL). APIs use different parameter types:

1. Query Parameters (GET requests)

# Visible in URL after ?
params = {"limit": 10, "sort": "date"}
response = requests.get(url, params=params)
# Results in: https://api.example.com/data?limit=10&sort=date

2. Form Data (POST requests)

# Sent in request body, like a web form
data = {"username": "john", "message": "Hello world"}
response = requests.post(url, data=data)

3. JSON Data (POST/PUT requests)

# Structured data in request body
json_data = {"title": "New Post", "content": "Post content"}
response = requests.post(url, json=json_data)

What is a User-Agent Header?

User-Agent tells the API what application (WHO) is making the request.

Examples of User-Agent strings:

Chrome browser:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 Chrome/91.0.4472.124

Python requests:
python-requests/2.28.1

Your custom app:
MyWeatherApp/1.0 (contact@example.com)

Why APIs care:

  • Rate limiting by application type
  • Analytics about API usage
  • Blocking bots if needed
  • Support - easier to help users

What you should do:

headers = {
    "User-Agent": "ME204-Project/1.0"
}

Some APIs require a User-Agent header

Some APIs may require you to identify yourself more explicitly in the user agent header.

From Wikimedia API (the organisation behind Wikipedia):

Requests (e.g. from browsers or scripts) that do not send a descriptive User-Agent header, may encounter an error message like this:

Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.

User-Agent required by Wikimedia

They ask people to use:

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

That is:

<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]

“Parts that are not applicable can be omitted.”

Inspecting HTTP Requests in Chrome

You can see the actual HTTP requests your browser makes:

Step 1: Open Chrome DevTools

  1. Right-click on any webpage → “Inspect”
  2. Or press F12 (Windows/Linux) or Cmd+Option+I (Mac)

Step 2: Go to Network Tab

  1. Click “Network” tab in DevTools
  2. Refresh the page (F5 or Cmd+R)
  3. Click on any request to see details

Inspecting HTTP Requests in Chrome (continued)

Step 3: Examine Request Details

You’ll see:

  • Headers tab: All HTTP headers (Request & Response)
  • Preview tab: Formatted response data
  • Response tab: Raw response body
  • Cookies tab: Session information

Try This Now!

  1. Open Chrome DevTools (F12)
  2. Go to Network tab
  3. Send a request to OpenMeteo API (or any other API or any other website)
  4. Click on the request to see headers
  5. Look for User-Agent in Request Headers!

Setting Headers in Python

Setting custom headers

headers = {
    "Authorization": "Bearer token123",
    "User-Agent": "ME204-Project/1.0",
    "Accept": "application/json"
}
response = requests.get(url, headers=headers)

Inspecting what was actually sent

# After making the request, you can see what headers were sent
print(response.request.headers)

It could look like this:

# This shows you the actual headers that went out:
{'User-Agent': 'ME204-Project/1.0', 'Accept-Encoding': 'gzip, deflate', 
 'Accept': 'application/json', 'Connection': 'keep-alive', 
 'Authorization': 'Bearer token123'}

Understanding API “Apps”

What is an “App” in API Context?

Once we start connecting to new APIs, we will need to register our “app” with the API provider.

📱 “App” = Your Registered Application/Client

When you sign up for API access, you create an “app” which is just:

  • A name for your project (e.g., “ME204 Weather Analysis”)
  • A set of credentials (client ID, client secret, API key)
  • Permission settings for what data you can access

Think of it as registering your code project with the API service, not building a mobile app!

API Registration Process

api_registration You You (Developer) DevPortal Developer Portal (https://www.reddit.com/dev/api/) You->DevPortal 1. Sign up & create 'app' Credentials API Credentials (Keys, IDs, Secrets) DevPortal->Credentials 2. Generate credentials YourCode Your Python Code Credentials->YourCode 3. Use in requests YourCode->DevPortal 4. Make API calls

Hiding Secrets: Security Fundamentals

🕵️‍♂️ I will ask you to be good at hiding your API secrets!
(from unintended eyes)

Why We Hide API Credentials

What Happens If You Don’t Hide Secrets?

If you commit API keys to GitHub:

  1. Bots scan GitHub for exposed credentials within minutes
  2. Your API key gets stolen and used for malicious purposes
  3. You get charged for usage you didn’t make
  4. Your access gets revoked by the API provider
  5. Your project stops working

Real example: AWS bills for thousands of dollars from stolen keys!

Leaked API keys in the wild

The .env File Solution

Problem: You need credentials in your code, but can’t commit them to GitHub.

Solution: Store them in a .env file that stays on your computer only.

.env file (never commit this!)

# Reddit API credentials  
REDDIT_CLIENT_ID=abc123def456
REDDIT_CLIENT_SECRET=xyz789uvw012
REDDIT_USERNAME=your_username
REDDIT_PASSWORD=your_password

# OpenWeather API key
OPENWEATHER_API_KEY=1234567890abcdef

Your Python code

import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Access them safely
client_id = os.getenv("REDDIT_CLIENT_ID")
api_key = os.getenv("OPENWEATHER_API_KEY")

# Use in your requests
headers = {"X-API-Key": api_key}

Essential Security Setup

Step 1: Install python-dotenv

pip install python-dotenv

Step 2: Create .gitignore file

# Never commit these files
.env
*.env
.env.local
.env.production

# Also ignore common sensitive files
config.json
secrets.txt
credentials.csv

Step 3: Load in your Python code

import os
from dotenv import load_dotenv

load_dotenv()  # This loads variables from .env file
api_key = os.getenv("YOUR_API_KEY")  # Access them safely

Security Checklist

Before Every Commit

  1. Check your .env file exists and contains your secrets
  2. Verify .gitignore includes .env
  3. Run git status - make sure .env is NOT listed
  4. Double-check your code uses os.getenv(), not hardcoded keys
  5. Test locally first before committing anything

Remember: Once something is on GitHub, it’s potentially public forever!

Let’s Put This Into Practice: Reddit Setup

Now that you understand the security principles, let’s set up Reddit API access together.

Reddit Account & App Creation

🎯 Follow along with these steps:

  1. Create a Reddit account (if you don’t have one): reddit.com/register

  2. Create a Reddit “app” for API access: reddit.com/prefs/apps

  3. Follow these setup steps:

    • Click “Create App” or “Create Another App”
    • Name: ME204_2025_YourName (e.g., ME204_2025_JonCardoso)
    • App type: Script
    • Description: LSE ME204 Course - Reddit API Practice
    • About URL: Leave blank
    • Redirect URI: http://localhost:8000 (required, but won’t be used)

Your Reddit App Credentials

After creating your app, you’ll see a screen like this:

📝 Write down these 4 pieces of information:

  1. Reddit Username: Your Reddit username
  2. Reddit Password: Your Reddit password
  3. Client ID: The string under your app name (red box in image)
  4. Client Secret: The “secret” string (blue box in image)

Critical Security Reminder

NEVER put these credentials directly in your code or notebooks!

We’ll create a .env file in the next step to store them securely.

Moving to the Notebook

Now we’ll switch to hands-on coding!

What We’ll Do Next

  1. Open the lecture notebook: ME204_W02D03_lecture.ipynb
  2. Create a .env file with your Reddit credentials
  3. Test the authentication step by step
  4. Move the notebook to your me204-study-notes repository
  5. Verify your .gitignore protects your .env file

Let’s code together! 💻

🔚 Back to Theory: Authentication Mechanisms

[This section will be covered after the hands-on work]

The Authentication Spectrum

auth_spectrum None No Auth Key API Key None->Key Basic Basic Auth Key->Basic Bearer Bearer Token Basic->Bearer OAuth OAuth 2.0 Bearer->OAuth mTLS Mutual TLS OAuth->mTLS Simple Simple Secure Secure Simple->Secure

Increasing Security & Complexity

Pattern 1: API Key Authentication

What You’ll Encounter

  • Public APIs: OpenWeather, NewsAPI, many government APIs
  • Simple services: When OAuth would be overkill
  • Development/testing: Many APIs provide test keys

Real-World Examples

Flow Diagram

api_key_flow Client Your App API API Server Client->API  requests.get(...)  headers={'X-API-Key': 'your_key'} API->Client  JSON Response

Advantages

  • Inherent simplicity - developers are familiar with API keys, making adoption quick
  • Easy to implement and revoke
  • Perfect for rate limiting and usage tracking

Limitations

  • Less secure than authentication tokens - if stolen, can be used indefinitely until revoked
  • Susceptible to interception if connection isn’t encrypted
  • Lack fine-grained access controls found in more advanced methods like OAuth

Copy-Pasteable Code: API Key

If you need to work with an API that requires an API key, you will likely need code that looks like this:

# STEP 1: Load the API key securely from the .env file
load_dotenv()
api_key = os.getenv("API_KEY")

# STEP 2: Make the API request
# Option 1: Header approach (most common)
headers = {"X-API-Key": api_key}
response = requests.get("https://api.example.com/data", headers=headers)
# Option 2: Query parameter (it depends on the API)
params = {"api_key": api_key, "limit": 10}
response = requests.get("https://api.example.com/data", params=params)

Pattern 2: Basic Authentication

What You’ll Encounter

  • Legacy systems: Older enterprise APIs
  • Internal tools: Company-specific services with HTTPS
  • Simple enterprise APIs: When OAuth would be overkill

Real-World Examples

  • Greenhouse Harvest API: API to access data from hiring platform Greenhouse.

They also warn about API keys on their page!

Flow Diagram

basic_auth_flow Client Your App API API Server Client->API  Authorization: Basic base64(user:pass) API->Client  JSON Response

Advantages

  • Extremely simple - no cookies, session management, or login pages required
  • Universal browser support
  • Easy to implement and debug

Limitations

  • Provides no confidentiality protection - credentials are merely Base64 encoded, not encrypted
  • Not a secure method of user authentication and doesn’t protect transmitted entities
  • Vulnerable to man-in-the-middle attacks if not used with HTTPS
  • No built-in token expiration

Copy-Pasteable Code: Basic Auth

If you encounter an API that requires Basic Authentication, you will likely need code that looks like this:

# STEP 1: Set up your credentials (from .env file or API docs)
load_dotenv()
username = os.getenv("API_USERNAME")
password = os.getenv("API_PASSWORD")

# STEP 2: Make the API request using HTTPBasicAuth
response = requests.get(
    "https://api.example.com/data",
    auth=HTTPBasicAuth(username, password)
)
# STEP 3: Handle the response
response.raise_for_status()
data = response.json()

Pattern 3: Bearer Token Authentication

What You’ll Encounter

  • Modern REST APIs: Most current web services
  • Cloud services: AWS, Azure, Google Cloud APIs
  • Social media APIs: After OAuth flow completion
  • Enterprise APIs: Microsoft Graph, Salesforce, etc.

Flow Diagram (Two steps)

bearer_flow Client Your App Auth Auth Server Client->Auth  Login Request API Resource Server Client->API  Authorization: Bearer <token> Auth->Client  Access Token API->Client  JSON Response

Advantages

  • Stateless authentication - no need to store user sessions on the server
  • Compact and fast - efficiently transmitted in HTTP headers, ideal for APIs
  • Can be used for both authentication and authorization with embedded claims

Limitations

  • Once issued, JWTs cannot be revoked easily until they expire
  • Token contents are readable by anyone - security through cryptographic signature, not secrecy

Copy-Pasteable Code: Bearer Token

If you encounter an API that requires Bearer Token authentication, you will likely need code that looks like this:

# STEP 1: Get the token (process varies by service)
auth_response = requests.post("https://auth.example.com/token", {
    "username": "your_username",
    "password": "your_password"
})
token = auth_response.json()["access_token"]
# STEP 2: Use the token in your API requests
headers = {"Authorization": f"Bearer {token}"}
response = requests.get("https://api.example.com/data", headers=headers)

Pattern 4: OAuth 2.0 Authentication

What You’ll Encounter

  • Major platforms: Google, GitHub, Reddit, Facebook APIs
  • When user consent required: Accessing personal data
  • Enterprise integrations: Connecting to company systems
  • Most social media APIs: Twitter, LinkedIn, Instagram

Real-World Examples

  • Google APIs: OAuth 2.0 for accessing user data like Google Drive
  • Facebook/Meta Platform APIs: Social login and data access
  • Reddit API: Comprehensive data gathering capabilities
  • GitHub API: Repository and user management

OAuth 2.0 Flow Diagram

oauth_flow User User AuthServer Authorization Server (e.g., Reddit) User->AuthServer 3. Grant permission App Your App App->AuthServer 1. Request authorization App->AuthServer 5. Exchange code for token ResourceServer Resource Server (API) App->ResourceServer 7. API request with token AuthServer->User 2. Login & consent AuthServer->App 4. Authorization code AuthServer->App 6. Access token ResourceServer->App 8. Protected resource

Advantages

  • Enables limited access delegation without sharing passwords
  • Users can revoke access to individual applications without changing passwords
  • Fine-grained permission scopes
  • Industry standard with broad ecosystem support

Limitations

  • Steep learning curve for developers
  • Complex implementation compared to simpler methods
  • Requires careful handling of authorization codes and tokens
  • Potential for security vulnerabilities if incorrectly implemented

The Four Grant Types

  1. Authorization Code: Most secure, for web applications
  2. Client Credentials: For server-to-server communication
  3. Device Code: For devices without browsers
  4. Refresh Token: For obtaining new access tokens

Secure Credential Management

The .env Pattern

# .env file (never commit this!)
API_KEY=your_secret_key_here
CLIENT_ID=your_client_id
CLIENT_SECRET=your_client_secret
USERNAME=your_username
PASSWORD=your_password
# Python code
import os
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("API_KEY")
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")

.gitignore Template

# Environment variables
.env
.env.local

# API keys and secrets
*.key
*.pem

# Python cache
__pycache__/
*.pyc

Error Handling Essentials

Always Check Response Status

response = requests.get(url, headers=headers)

# This will raise an exception for 4xx and 5xx status codes
response.raise_for_status()

data = response.json()

Handle Common Errors

try:
    response = requests.get(url, headers=headers, timeout=30)
    response.raise_for_status()
    return response.json()
    
except requests.exceptions.RequestException as e:
    print(f"API request failed: {e}")
    return None

Common Header Patterns

API Documentation Quick Reference

Service Type Header Pattern Example
API Key X-API-Key {"X-API-Key": "abc123"}
Bearer Token Authorization {"Authorization": "Bearer token123"}
Basic Auth Authorization {"Authorization": "Basic dXNlcjpwYXNz"}
Custom Service-specific {"X-RapidAPI-Key": "key123"}

User-Agent Requirements

Many APIs require a User-Agent header:

headers = {
    "Authorization": f"Bearer {token}",
    "User-Agent": "MyApp/1.0 (contact@example.com)"
}

Remember These Key Points

Essential Patterns

  • Store credentials in .env files
  • Always check response.raise_for_status()
  • Use headers dictionary for tokens
  • Handle pagination for large datasets
  • Test manually first, then automate

Resources & References

Remember: These slides are your reference toolkit. Bookmark them! 🔖

LSE Summer School 2025 | ME204 Week 02 Day 03