💻 Week 04 Lab

Building APIs with FastAPI and Pydantic

nuvolos

fastapi

pydantic

api-development

Build a working API using the Waitrose data from lecture.

Author

Dr Jon Cardoso-Silva

Published

10 May 2026

Modified

10 May 2026

🥅 Learning Goals

By the end of this lab, you should be able to: i) Create Pydantic models with validation constraints, ii) Build FastAPI endpoints that serve structured data, iii) Filter API responses using query parameters, iv) Document your API effectively using FastAPI’s built-in tools.

This lab builds on the 🖥️ Week 04 Lecture where you explored how to validate products between Waitrose and OpenFoodFacts. Now you’ll build an API that serves the Waitrose data using FastAPI and Pydantic.

📋 Preparation

If you are on Nuvolos, you should have access to the following files in your week04/ folder:
- waitrose.jsonl (scraped Waitrose products)
- off_sample_responses.json (OpenFoodFacts responses)
  
  If you are not coding on Nuvolos, you can download the files from the 🖥️ Week 04 Lecture page.
If you are on Nuvolos, open the regular VS Code application (not the Chromium + Selenium version). The Selenium app is slower and you don’t need a browser for today’s work.
Create a new folder called week04/ for today’s work (not in your Problem Set 1 repository)

Update Your Environment

You need to add FastAPI and its dependencies to your food conda environment. Open the environment.yml file in your week04/ folder (or your project root) and add the following under dependencies:

  # --- API development ---
  - fastapi
  - uvicorn
  - pydantic

Then update the environment:

conda env update -f environment.yml --prune
conda activate food

Click HERE to see a full environment.yml file

The following environment.yml file has been tested on Nuvolos and on Windows.

name: food
channels:
  - conda-forge
  - defaults
dependencies:
  # Core
  - python>=3.12,<3.13
  - ipykernel>=7.0,<8.0
  - ipython>=8.20,<10.0
  - ipywidgets>=8.0,<9.0
  - tqdm>=4.66,<5.0
  - pip>=23.0
  
  # Data collection (Part A: Scraper)
  - requests>=2.32,<3.0
  - scrapy>=2.11,<2.15
  - selenium>=4.20,<5.0
  - pip:
    - webdriver-manager
  
  # Data manipulation (Both)
  - numpy>=2.0,<3.0
  - pandas>=2.2,<3.0
  
  # API development (Part B: API)
  - fastapi>=0.110,<1.0
  - pydantic>=2.8,<3.0
  - uvicorn>=0.30,<1.0
  - httpx>=0.27,<1.0
  
  # Testing (Both)
  - pytest>=8.0,<9.0

🛣️ Lab Roadmap

How the W04 lab will be structured
Part	Activity Type	Focus	Time	Outcome
Intro	👤 Teaching Moment	Schema concept recap	10 min	Understand how schemas connect lecture to lab
Section 1	👤 Teaching (10 min) ⏸️ Action Points (15 min)	From Data to Schema	25 min	Working Pydantic model with validation
Section 2	👤 Teaching Moment	Your First Endpoint	10 min	Running FastAPI app with barcode lookup
Section 3	⏸️ Action Points	Expanding the API	30 min	Multiple endpoints with filtering and documentation
Section 4	👤 Teaching Moment	Connecting to Problem Set 1	10 min	Understand how to apply these patterns to Part B
Bonus	⏸️ Self-paced	Testing with pytest (optional)	Remaining time	Automated tests for your API

👉 NOTE: Whenever you see a 👤 TEACHING MOMENT, this means your class teacher deserves your full attention!

Introduction: Schema Concept Recap (10 mins)

This section is a TEACHING MOMENT

Your class teacher will review the schema concept from Monday’s lecture and explain how it connects to today’s work.

In Monday’s lecture, you saw the schema concept appear in three contexts:

OpenFoodFacts API docs describe the structure of product data (50+ fields returned per product)
Scrapy’s items.py defines the structure of your scraped data as a key-value pair, like a dictionary but one where the keys are pre-determined and can’t be changed without throwing an error. Read more about Scrapy’s items.py here.
Pydantic models define the structure of data your API serves (with type validation). Read more about Pydantic models here.

Today you’ll work with the Waitrose data from lecture and build an API that serves it. The waitrose.jsonl file contains products with fields like name, category, barcode, url, and food_type.

Your task: create a Pydantic model that matches this structure, then build FastAPI endpoints that serve the data.

⚠️ NOTE: Even though you already have a pairing with your partner for your ✍️ Problem Set 1, do not work on your partner’s repository during this lab. Otherwise, things will get confusing.

Throughout this lab, work inside your week04/ folder only.

Section 1: From Data to Schema (25 mins)

Part A: Inspect the Data (10 mins)

This section is a TEACHING MOMENT

Your class teacher will load the Waitrose data and demonstrate how to inspect its structure before designing the Pydantic model.

Your class teacher will show you how to:

Load waitrose.jsonl into a pandas DataFrame
Examine the data types and structure of each field
Identify which fields are required vs optional
Notice that barcode and food_type are lists (one product can have multiple values)

Part B: Build the Pydantic Model (15 mins)

🎯 ACTION POINTS

Create a new file week04/models.py and implement the following:

Import the necessary Pydantic components:
```
from pydantic import BaseModel, Field
```

Create a WaitroseProduct model that matches the data structure:

class WaitroseProduct(BaseModel):
    name: str
    category: str
    url: str
    barcode: list[str] = Field(min_length=1)   # At least one barcode required. Pydantic will raise a `ValidationError` if you pass an empty list.
    food_type: list[dict] | None = None        # Complex nested structure, optional

What if you wanted to allow an empty barcode list? You can do so by removing the min_length constraint. Update the barcode line to:

barcode: list[str] = Field(default_factory=list)

What does list[dict] | None = None mean?

This syntax combines two Python features you may not have seen before:

The | (pipe) operator for types means “or”. Writing list[dict] | None tells Python (and Pydantic) that food_type can be either a list[dict] or None. This is called a union type. Before Python 3.10, you had to write Optional[list[dict]] instead, which means the same thing but is harder to read.

The = None at the end sets the default value. If you create a WaitroseProduct without specifying food_type, it defaults to None rather than raising an error.

Putting it together: food_type: list[dict] | None = None means “this field accepts a list of dictionaries or nothing, and defaults to nothing.”

You’ll see the same pattern used later for query parameters in FastAPI:

def get_products(category: str | None = None):

This tells FastAPI that category is an optional string. If the user doesn’t provide it in the URL, it defaults to None, and your code can check for that.

Test your model by creating valid and invalid instances.

For this, either create a Jupyter Notebook or open ipython in your terminal (make sure you are inside the week04/ folder).

Then, import your model class:
```
from models import WaitroseProduct
```
Then, the following code should work:
```
# This should work
product = WaitroseProduct(
    name="Waitrose Essential Bread",
    category="Bakery",
    url="https://www.waitrose.com/...",
    barcode=["5000128931830"]
)
print(product)
```
But this should fail:
```
# This should fail: empty barcode list violates min_length constraint
bad_product = WaitroseProduct(
    name="Test Product",
    category="Bakery",
    url="https://...",
    barcode=[]  # Error: must have at least 1 item
)
```
Do you see why?

🗒️ NOTE: Pydantic validation happens automatically when you try to create an instance. If the data doesn’t match the schema, Pydantic raises a clear error message. Visit Pydantic’s documentation for a comprehensive overview of Pydantic models and validation.
Experiment with the constraints:
- What happens if you omit the name field?
- What happens if you provide barcode as a single string instead of a list?
- What happens if you add an extra field that isn’t in the model?

While you could return just a simple dictionary or list of dictionaries on your API, it is a good practice to return a Pydantic model instance. This is because it will be validated automatically by Pydantic, and it will also be serialized to JSON automatically by FastAPI. The users of your API will be able to see your schema.

Section 2: Your First Endpoint (10 mins)

This section is a TEACHING MOMENT

Your class teacher will walk you through creating a FastAPI application step by step.

Step 1: Create `main.py`

Create a new file called main.py inside your week04/ folder. Start by adding these imports and the app setup:

import os

import pandas as pd

from fastapi import FastAPI, HTTPException

from models import WaitroseProduct


def __is_running_on_nuvolos():
    """
    If we are running this script from Nuvolos Cloud,
    there will be an environment variable called HOSTNAME
    which starts with 'nv-'
    """
    hostname = os.getenv("HOSTNAME")
    return hostname is not None and hostname.startswith("nv-")

if __is_running_on_nuvolos():
    app = FastAPI(root_path="/proxy/8000/")
else:
    app = FastAPI()

What’s happening here? FastAPI is a Python framework for building web APIs. You write Python functions and FastAPI turns them into HTTP endpoints that any application can call. The line app = FastAPI() creates an application instance that will hold all your endpoints.

Why the Nuvolos check?

Nuvolos routes web traffic through a proxy, which changes the base URL of your API. Instead of https://some-ip:8000/, it becomes https://some-ip/proxy/8000/. Setting root_path="/proxy/8000/" tells FastAPI about this so the interactive documentation page works correctly.

If you’re running on your local machine, none of this matters and app = FastAPI() is all you need.

Step 2: Load the data

Add the following below the app setup. This loads the Waitrose data once when the application starts, not on every request:

df = pd.read_json("waitrose.jsonl", lines=True)
products_data = df.to_dict(orient="records")

print(f"✅ Loaded {len(products_data)} products")

At this point, your main.py has an app and data, but no endpoints. An endpoint is a URL path that your API responds to. Right now, if someone tried to visit your API, there would be nothing to see.

Step 3: Add your first endpoint

Let’s create an endpoint that returns a single product by its barcode. Add this to main.py:

@app.get("/product/{barcode}")
def get_product(barcode: str) -> WaitroseProduct:
    matched = df[df["barcode"].apply(lambda codes: barcode in codes)]

    if matched.empty:
        raise HTTPException(status_code=404, detail="Product not found")
    
    product = matched.iloc[0].to_dict()
    return WaitroseProduct(**product)

The @app.get("/product/{barcode}") line is something called a decorator in Python. It’s a way to intercept a function and modify how it runs.

This particular decorator tells FastAPI: “When someone sends a GET request to /product/{barcode}, run this function.” The {barcode} part is a path parameter, meaning whatever value appears in that position in the URL gets passed to your function as the barcode argument.

So if someone visits /product/5010003065604, FastAPI calls get_product(barcode="5010003065604").

Step 4: Run it!

Your main.py now has an app, data, and an endpoint. But the code is just a Python file sitting on disk. To actually start serving requests, you need a server. That’s what uvicorn does: it listens for incoming HTTP requests and routes them to your FastAPI code.

Open a terminal, make sure you’re in the week04/ folder with your food environment active, and run:

uvicorn main:app --reload

What do the arguments to uvicorn mean?

uvicorn main:app --reload

main = the Python file (main.py)
app = the FastAPI instance inside that file (the variable you created with app = FastAPI())
--reload = restart automatically when you save changes (useful during development, optional but recommended)

You should see output in your terminal telling you the server is running. Now open the Swagger UI documentation page:

On Nuvolos: the URL shown in your terminal output, with /proxy/8000/docs at the end
On your local machine: http://localhost:8000/docs

This is FastAPI’s auto-generated interactive documentation. You can see your /product/{barcode} endpoint listed. Click “Try it out”, type a barcode (try 5010003065604), and click “Execute” to see the response. FastAPI serialised your Pydantic model to JSON automatically.

Path parameters vs query parameters

In the lecture, we looked at how parameters appear in URLs. FastAPI supports two kinds:

Path parameters are part of the URL path itself, like the barcode endpoint you just built:

GET /product/5010003065604  →  barcode = "5010003065604"
GET /product/5065009224333  →  barcode = "5065009224333"

In FastAPI, you define them with curly braces in the path:

@app.get("/product/{barcode}")
def get_product(barcode: str) -> WaitroseProduct:
    ...

Query parameters come after a ? in the URL:

GET /products?category=bakery                        →  category = "bakery"
GET /products?category=bakery&name_contains=sourdough  →  category = "bakery", name_contains = "sourdough"

In FastAPI, any function parameter that is not in the path is automatically treated as a query parameter:

@app.get("/products")
def get_products(category: str | None = None) -> list[WaitroseProduct]:
    ...

You’ll add query parameters in Section 3.

📖 Read more: FastAPI Path Parameters and FastAPI Query Parameters

Section 3: Expanding the API (30 mins)

🎯 ACTION POINTS

Your class teacher will be walking around to help you troubleshoot. Work through the following at your own pace.

Part A: Add a `/products` Endpoint

Your /product/{barcode} endpoint is useful when you know the exact barcode, but what if you want to browse products? Let’s add a /products endpoint that returns multiple products.

Start with a simple version that returns the first 10 products. Add this to your main.py:

@app.get("/products")
def get_products() -> list[WaitroseProduct]:
    return [WaitroseProduct(**p) for p in products_data[:10]]

Save the file (uvicorn should restart automatically), then test it in /docs. The [:10] slice is intentional: returning all 2,500+ products at once makes the response slow and the Swagger UI unresponsive. Start small while developing.

Now make it more useful by adding a category query parameter. Query parameters let users filter results without needing a separate endpoint for every combination. When someone visits /products?category=bakery, FastAPI extracts category = "bakery" from the URL and passes it to your function. Replace your endpoint with:

@app.get("/products")
def get_products(category: str | None = None) -> list[WaitroseProduct]:
    results = products_data

    if category:
        results = [p for p in results if p.get("category") == category]
    
    return [WaitroseProduct(**p) for p in results[:10]]

Test it in /docs: try with no parameters (returns first 10 from any category), then with category=bakery, then category=frozen. Observe how the results change.

Next, add a name_contains parameter for case-insensitive substring search:

@app.get("/products")
def get_products(
    category: str | None = None,
    name_contains: str | None = None
) -> list[WaitroseProduct]:
    results = products_data
    
    if category:
        results = [p for p in results if p.get("category") == category]
    
    if name_contains:
        results = [p for p in results if name_contains.lower() in p["name"].lower()]
    
    return [WaitroseProduct(**p) for p in results[:10]]

Notice how each filter narrows down results further. You can combine them: /products?category=bakery&name_contains=sourdough filters by category first, then searches within those results. Test each filter in /docs and try combining them together.

Part B: Improve API Documentation

FastAPI auto-generates documentation from your code, but you can make it more informative by adding descriptions directly to your endpoints. Try updating your /products endpoint with a summary and description:

@app.get(
    "/products",
    response_model=list[WaitroseProduct],
    summary="Get all Waitrose products",
    description="Returns a list of products, optionally filtered by category or name"
)
def get_products(
    category: str | None = None,
    name_contains: str | None = None
) -> list[WaitroseProduct]:
    # ... implementation
    pass

Section 4: Connecting to Problem Set 1 (10 mins)

This section is a TEACHING MOMENT Your class teacher will explain how today’s practice transfers to ✍️ Problem Set 1 Part B.

🎯 ACTION POINTS

Think about how you’ll apply these patterns to ✍️ Problem Set 1 Part B:

You’ll receive your partner’s scraper and data (ideally, you shouldn’t need to re-run their scraper, but just grab their data directly)
You’ll inspect their data structure (it might differ slightly from today’s example or from the one you created yourself in your own repository)
You’ll create Pydantic models matching their schema
You’ll build FastAPI endpoints serving their data

The OpenFoodFacts enrichment (NOVA classification) can happen in several places. You could either:

Add a Scrapy pipeline: Enrich during scraping, before data is saved
Add a post-scraping script: Load all the scraped data, enrich it, save the enriched version
Inside FastAPI: Enrich on-the-fly when serving data (very slow though but the output of your own API is guaranteed to be always up to date)

Choose based on the architecture of the code you inherited (or how it would look like in your own repo). There’s no single correct answer. Document your choice in your README.

Bonus Section: Testing with pytest (Optional)

Click here for pytest testing instructions

If you finish early, add automated tests for your API.

Create week04/test_main.py:

from fastapi.testclient import TestClient
from main import app

client = TestClient(app)


def test_get_all_products():
    """Test that /products returns a list of products."""
    response = client.get("/products")
    assert response.status_code == 200
    products = response.json()
    assert len(products) > 0
    assert "name" in products[0]
    assert "category" in products[0]


def test_filter_by_category():
    """Test that category filtering works correctly."""
    response = client.get("/products?category=bakery")
    assert response.status_code == 200
    products = response.json()
    
    # All returned products should be from bakery category
    for p in products:
        assert p["category"] == "bakery"


def test_invalid_category_returns_empty():
    """Test that filtering by nonexistent category returns empty list."""
    response = client.get("/products?category=NonexistentCategory")
    assert response.status_code == 200
    products = response.json()
    assert len(products) == 0

Run your tests:

pytest test_main.py -v

Write additional tests for: - Name search functionality - Combining multiple filters - Edge cases (empty strings, missing fields)

Appendix | Resources

Documentation Links

Problem Set 1

📋 Preparation

Update Your Environment

🛣️ Lab Roadmap

Introduction: Schema Concept Recap (10 mins)

Section 1: From Data to Schema (25 mins)

Part A: Inspect the Data (10 mins)

Part B: Build the Pydantic Model (15 mins)

Section 2: Your First Endpoint (10 mins)

Step 1: Create main.py

Step 2: Load the data

Step 3: Add your first endpoint

Step 4: Run it!

Section 3: Expanding the API (30 mins)

Part A: Add a /products Endpoint

Part B: Improve API Documentation

Section 4: Connecting to Problem Set 1 (10 mins)

Bonus Section: Testing with pytest (Optional)

Appendix | Resources

Step 1: Create `main.py`

Part A: Add a `/products` Endpoint