💻 Week 04 Lab
Building APIs with FastAPI and Pydantic
By the end of this lab, you should be able to: i) Create Pydantic models with validation constraints, ii) Build FastAPI endpoints that serve structured data, iii) Filter API responses using query parameters, iv) Document your API effectively using FastAPI’s built-in tools.
This lab builds on the 🖥️ Week 04 Lecture where you explored how to validate products between Waitrose and OpenFoodFacts. Now you’ll build an API that serves the Waitrose data using FastAPI and Pydantic.
📋 Preparation
- If you are on Nuvolos, you should have access to the following files in your
week04/folder:waitrose.jsonl(scraped Waitrose products)off_sample_responses.json(OpenFoodFacts responses)If you are not coding on Nuvolos, you can download the files from the 🖥️ Week 04 Lecture page.
- If you are on Nuvolos, open the regular
VS Codeapplication (not the Chromium + Selenium version). The Selenium app is slower and you don’t need a browser for today’s work. - Create a new folder called
week04/for today’s work (not in your Problem Set 1 repository)
Update Your Environment
You need to add FastAPI and its dependencies to your food conda environment. Open the environment.yml file in your week04/ folder (or your project root) and add the following under dependencies:
# --- API development ---
- fastapi
- uvicorn
- pydanticThen update the environment:
conda env update -f environment.yml --prune
conda activate food
Click HERE to see a full environment.yml file
The following environment.yml file has been tested on Nuvolos and on Windows.
name: food
channels:
- conda-forge
- defaults
dependencies:
# Core
- python>=3.12,<3.13
- ipykernel>=7.0,<8.0
- ipython>=8.20,<10.0
- ipywidgets>=8.0,<9.0
- tqdm>=4.66,<5.0
- pip>=23.0
# Data collection (Part A: Scraper)
- requests>=2.32,<3.0
- scrapy>=2.11,<2.15
- selenium>=4.20,<5.0
- pip:
- webdriver-manager
# Data manipulation (Both)
- numpy>=2.0,<3.0
- pandas>=2.2,<3.0
# API development (Part B: API)
- fastapi>=0.110,<1.0
- pydantic>=2.8,<3.0
- uvicorn>=0.30,<1.0
- httpx>=0.27,<1.0
# Testing (Both)
- pytest>=8.0,<9.0🛣️ Lab Roadmap
| Part | Activity Type | Focus | Time | Outcome |
|---|---|---|---|---|
| Intro | 👤 Teaching Moment | Schema concept recap | 10 min | Understand how schemas connect lecture to lab |
| Section 1 | 👤 Teaching (10 min) ⏸️ Action Points (15 min) |
From Data to Schema | 25 min | Working Pydantic model with validation |
| Section 2 | 👤 Teaching Moment | Your First Endpoint | 10 min | Running FastAPI app with barcode lookup |
| Section 3 | ⏸️ Action Points | Expanding the API | 30 min | Multiple endpoints with filtering and documentation |
| Section 4 | 👤 Teaching Moment | Connecting to Problem Set 1 | 10 min | Understand how to apply these patterns to Part B |
| Bonus | ⏸️ Self-paced | Testing with pytest (optional) | Remaining time | Automated tests for your API |
👉 NOTE: Whenever you see a 👤 TEACHING MOMENT, this means your class teacher deserves your full attention!
Introduction: Schema Concept Recap (10 mins)
This section is a TEACHING MOMENT
Your class teacher will review the schema concept from Monday’s lecture and explain how it connects to today’s work.
In Monday’s lecture, you saw the schema concept appear in three contexts:
- OpenFoodFacts API docs describe the structure of product data (50+ fields returned per product)
- Scrapy’s
items.pydefines the structure of your scraped data as a key-value pair, like a dictionary but one where the keys are pre-determined and can’t be changed without throwing an error. Read more about Scrapy’s items.py here. - Pydantic models define the structure of data your API serves (with type validation). Read more about Pydantic models here.
Today you’ll work with the Waitrose data from lecture and build an API that serves it. The waitrose.jsonl file contains products with fields like name, category, barcode, url, and food_type.
Your task: create a Pydantic model that matches this structure, then build FastAPI endpoints that serve the data.
⚠️ NOTE: Even though you already have a pairing with your partner for your ✍️ Problem Set 1, do not work on your partner’s repository during this lab. Otherwise, things will get confusing.
Throughout this lab, work inside your week04/ folder only.
Section 1: From Data to Schema (25 mins)
Part A: Inspect the Data (10 mins)
This section is a TEACHING MOMENT
Your class teacher will load the Waitrose data and demonstrate how to inspect its structure before designing the Pydantic model.
Your class teacher will show you how to:
- Load
waitrose.jsonlinto a pandas DataFrame - Examine the data types and structure of each field
- Identify which fields are required vs optional
- Notice that
barcodeandfood_typeare lists (one product can have multiple values)
Part B: Build the Pydantic Model (15 mins)
🎯 ACTION POINTS
Create a new file week04/models.py and implement the following:
Import the necessary Pydantic components:
from pydantic import BaseModel, FieldCreate a WaitroseProduct model that matches the data structure:
class WaitroseProduct(BaseModel): name: str category: str url: str barcode: list[str] = Field(min_length=1) # At least one barcode required. Pydantic will raise a `ValidationError` if you pass an empty list. food_type: list[dict] | None = None # Complex nested structure, optionalWhat if you wanted to allow an empty barcode list? You can do so by removing the
min_lengthconstraint. Update the barcode line to:barcode: list[str] = Field(default_factory=list)
What does list[dict] | None = None mean?
This syntax combines two Python features you may not have seen before:
The | (pipe) operator for types means “or”. Writing list[dict] | None tells Python (and Pydantic) that food_type can be either a list[dict] or None. This is called a union type. Before Python 3.10, you had to write Optional[list[dict]] instead, which means the same thing but is harder to read.
The = None at the end sets the default value. If you create a WaitroseProduct without specifying food_type, it defaults to None rather than raising an error.
Putting it together: food_type: list[dict] | None = None means “this field accepts a list of dictionaries or nothing, and defaults to nothing.”
You’ll see the same pattern used later for query parameters in FastAPI:
def get_products(category: str | None = None):This tells FastAPI that category is an optional string. If the user doesn’t provide it in the URL, it defaults to None, and your code can check for that.
Test your model by creating valid and invalid instances.
For this, either create a Jupyter Notebook or open
ipythonin your terminal (make sure you are inside theweek04/folder).Then, import your model class:
from models import WaitroseProductThen, the following code should work:
# This should work product = WaitroseProduct( name="Waitrose Essential Bread", category="Bakery", url="https://www.waitrose.com/...", barcode=["5000128931830"] ) print(product)But this should fail:
# This should fail: empty barcode list violates min_length constraint bad_product = WaitroseProduct( name="Test Product", category="Bakery", url="https://...", barcode=[] # Error: must have at least 1 item )Do you see why?
🗒️ NOTE: Pydantic validation happens automatically when you try to create an instance. If the data doesn’t match the schema, Pydantic raises a clear error message. Visit Pydantic’s documentation for a comprehensive overview of Pydantic models and validation.
Experiment with the constraints:
- What happens if you omit the name field?
- What happens if you provide barcode as a single string instead of a list?
- What happens if you add an extra field that isn’t in the model?
While you could return just a simple dictionary or list of dictionaries on your API, it is a good practice to return a Pydantic model instance. This is because it will be validated automatically by Pydantic, and it will also be serialized to JSON automatically by FastAPI. The users of your API will be able to see your schema.
Section 2: Your First Endpoint (10 mins)
This section is a TEACHING MOMENT
Your class teacher will walk you through creating a FastAPI application step by step.
Step 1: Create main.py
Create a new file called main.py inside your week04/ folder. Start by adding these imports and the app setup:
import os
import pandas as pd
from fastapi import FastAPI, HTTPException
from models import WaitroseProduct
def __is_running_on_nuvolos():
"""
If we are running this script from Nuvolos Cloud,
there will be an environment variable called HOSTNAME
which starts with 'nv-'
"""
hostname = os.getenv("HOSTNAME")
return hostname is not None and hostname.startswith("nv-")
if __is_running_on_nuvolos():
app = FastAPI(root_path="/proxy/8000/")
else:
app = FastAPI()What’s happening here? FastAPI is a Python framework for building web APIs. You write Python functions and FastAPI turns them into HTTP endpoints that any application can call. The line app = FastAPI() creates an application instance that will hold all your endpoints.
Why the Nuvolos check?
Nuvolos routes web traffic through a proxy, which changes the base URL of your API. Instead of https://some-ip:8000/, it becomes https://some-ip/proxy/8000/. Setting root_path="/proxy/8000/" tells FastAPI about this so the interactive documentation page works correctly.
If you’re running on your local machine, none of this matters and app = FastAPI() is all you need.
Step 2: Load the data
Add the following below the app setup. This loads the Waitrose data once when the application starts, not on every request:
df = pd.read_json("waitrose.jsonl", lines=True)
products_data = df.to_dict(orient="records")
print(f"✅ Loaded {len(products_data)} products")At this point, your main.py has an app and data, but no endpoints. An endpoint is a URL path that your API responds to. Right now, if someone tried to visit your API, there would be nothing to see.
Step 3: Add your first endpoint
Let’s create an endpoint that returns a single product by its barcode. Add this to main.py:
@app.get("/product/{barcode}")
def get_product(barcode: str) -> WaitroseProduct:
matched = df[df["barcode"].apply(lambda codes: barcode in codes)]
if matched.empty:
raise HTTPException(status_code=404, detail="Product not found")
product = matched.iloc[0].to_dict()
return WaitroseProduct(**product)The @app.get("/product/{barcode}") line is something called a decorator in Python. It’s a way to intercept a function and modify how it runs.
This particular decorator tells FastAPI: “When someone sends a GET request to /product/{barcode}, run this function.” The {barcode} part is a path parameter, meaning whatever value appears in that position in the URL gets passed to your function as the barcode argument.
So if someone visits /product/5010003065604, FastAPI calls get_product(barcode="5010003065604").
Step 4: Run it!
Your main.py now has an app, data, and an endpoint. But the code is just a Python file sitting on disk. To actually start serving requests, you need a server. That’s what uvicorn does: it listens for incoming HTTP requests and routes them to your FastAPI code.
Open a terminal, make sure you’re in the week04/ folder with your food environment active, and run:
uvicorn main:app --reload
What do the arguments to uvicorn mean?
uvicorn main:app --reloadmain= the Python file (main.py)app= the FastAPI instance inside that file (the variable you created withapp = FastAPI())--reload= restart automatically when you save changes (useful during development, optional but recommended)
You should see output in your terminal telling you the server is running. Now open the Swagger UI documentation page:
- On Nuvolos: the URL shown in your terminal output, with
/proxy/8000/docsat the end - On your local machine:
http://localhost:8000/docs
This is FastAPI’s auto-generated interactive documentation. You can see your /product/{barcode} endpoint listed. Click “Try it out”, type a barcode (try 5010003065604), and click “Execute” to see the response. FastAPI serialised your Pydantic model to JSON automatically.
Path parameters vs query parameters
In the lecture, we looked at how parameters appear in URLs. FastAPI supports two kinds:
Path parameters are part of the URL path itself, like the barcode endpoint you just built:
GET /product/5010003065604 → barcode = "5010003065604"
GET /product/5065009224333 → barcode = "5065009224333"
In FastAPI, you define them with curly braces in the path:
@app.get("/product/{barcode}")
def get_product(barcode: str) -> WaitroseProduct:
...Query parameters come after a ? in the URL:
GET /products?category=bakery → category = "bakery"
GET /products?category=bakery&name_contains=sourdough → category = "bakery", name_contains = "sourdough"
In FastAPI, any function parameter that is not in the path is automatically treated as a query parameter:
@app.get("/products")
def get_products(category: str | None = None) -> list[WaitroseProduct]:
...You’ll add query parameters in Section 3.
📖 Read more: FastAPI Path Parameters and FastAPI Query Parameters
Section 3: Expanding the API (30 mins)
🎯 ACTION POINTS
Your class teacher will be walking around to help you troubleshoot. Work through the following at your own pace.
Part A: Add a /products Endpoint
Your /product/{barcode} endpoint is useful when you know the exact barcode, but what if you want to browse products? Let’s add a /products endpoint that returns multiple products.
Start with a simple version that returns the first 10 products. Add this to your main.py:
@app.get("/products")
def get_products() -> list[WaitroseProduct]:
return [WaitroseProduct(**p) for p in products_data[:10]]Save the file (uvicorn should restart automatically), then test it in /docs. The [:10] slice is intentional: returning all 2,500+ products at once makes the response slow and the Swagger UI unresponsive. Start small while developing.
Now make it more useful by adding a category query parameter. Query parameters let users filter results without needing a separate endpoint for every combination. When someone visits /products?category=bakery, FastAPI extracts category = "bakery" from the URL and passes it to your function. Replace your endpoint with:
@app.get("/products")
def get_products(category: str | None = None) -> list[WaitroseProduct]:
results = products_data
if category:
results = [p for p in results if p.get("category") == category]
return [WaitroseProduct(**p) for p in results[:10]]Test it in /docs: try with no parameters (returns first 10 from any category), then with category=bakery, then category=frozen. Observe how the results change.
Next, add a name_contains parameter for case-insensitive substring search:
@app.get("/products")
def get_products(
category: str | None = None,
name_contains: str | None = None
) -> list[WaitroseProduct]:
results = products_data
if category:
results = [p for p in results if p.get("category") == category]
if name_contains:
results = [p for p in results if name_contains.lower() in p["name"].lower()]
return [WaitroseProduct(**p) for p in results[:10]]Notice how each filter narrows down results further. You can combine them: /products?category=bakery&name_contains=sourdough filters by category first, then searches within those results. Test each filter in /docs and try combining them together.
Part B: Improve API Documentation
FastAPI auto-generates documentation from your code, but you can make it more informative by adding descriptions directly to your endpoints. Try updating your /products endpoint with a summary and description:
@app.get(
"/products",
response_model=list[WaitroseProduct],
summary="Get all Waitrose products",
description="Returns a list of products, optionally filtered by category or name"
)
def get_products(
category: str | None = None,
name_contains: str | None = None
) -> list[WaitroseProduct]:
# ... implementation
passSection 4: Connecting to Problem Set 1 (10 mins)
This section is a TEACHING MOMENT Your class teacher will explain how today’s practice transfers to ✍️ Problem Set 1 Part B.
🎯 ACTION POINTS
Think about how you’ll apply these patterns to ✍️ Problem Set 1 Part B:
- You’ll receive your partner’s scraper and data (ideally, you shouldn’t need to re-run their scraper, but just grab their data directly)
- You’ll inspect their data structure (it might differ slightly from today’s example or from the one you created yourself in your own repository)
- You’ll create Pydantic models matching their schema
- You’ll build FastAPI endpoints serving their data
The OpenFoodFacts enrichment (NOVA classification) can happen in several places. You could either:
- Add a Scrapy pipeline: Enrich during scraping, before data is saved
- Add a post-scraping script: Load all the scraped data, enrich it, save the enriched version
- Inside FastAPI: Enrich on-the-fly when serving data (very slow though but the output of your own API is guaranteed to be always up to date)
Choose based on the architecture of the code you inherited (or how it would look like in your own repo). There’s no single correct answer. Document your choice in your README.
Bonus Section: Testing with pytest (Optional)
Click here for pytest testing instructions
If you finish early, add automated tests for your API.
Create week04/test_main.py:
from fastapi.testclient import TestClient
from main import app
client = TestClient(app)
def test_get_all_products():
"""Test that /products returns a list of products."""
response = client.get("/products")
assert response.status_code == 200
products = response.json()
assert len(products) > 0
assert "name" in products[0]
assert "category" in products[0]
def test_filter_by_category():
"""Test that category filtering works correctly."""
response = client.get("/products?category=bakery")
assert response.status_code == 200
products = response.json()
# All returned products should be from bakery category
for p in products:
assert p["category"] == "bakery"
def test_invalid_category_returns_empty():
"""Test that filtering by nonexistent category returns empty list."""
response = client.get("/products?category=NonexistentCategory")
assert response.status_code == 200
products = response.json()
assert len(products) == 0Run your tests:
pytest test_main.py -v