💻 Week 07 Lab

Building a Click Pipeline and Wiring It to GitHub Actions

click

github-actions

pipeline

ps2

Build a skeleton Click pipeline for Problem Set 2 and automate it with GitHub Actions.

Author

Dr Jon Cardoso-Silva

Published

10 May 2026

Modified

10 May 2026

🥅 Learning Goals

By the end of this lab, you should be able to: i) Structure a multi-stage data pipeline as a Click CLI, ii) Run pipeline stages individually and in sequence from the terminal, iii) Define a GitHub Actions workflow that installs dependencies and runs the same pipeline on a remote machine, iv) Read a GitHub Actions run log and match it to local terminal output.

The 🖥️ Week 07 Lecture showed how to configure a GitHub Actions workflow to run your pipeline steps on a remote GitHub-hosted machine. This lab makes that correspondence concrete: you will build a Click CLI that defines your pipeline stages, run it locally to confirm it works, then create a GitHub Actions workflow that runs the same commands on a remote machine.

📍 Session Details

Date: Tuesday, 03 March 2026
Time: Check your timetable for the precise time of your class
Duration: 90 minutes

🛣️ Lab Roadmap

How the W07 lab will be structured
Part	Activity Type	Focus	Time	Outcome
Part 0	👤 Teaching Moment	Pipeline principles recap + PS2 goals	10 min	Shared vocabulary before coding
Part 1	Setup	Accept ✍️ Problem Set 2 repository	5 min	Cloned repo ready
Part 2	⏸️ Action Points	Create `requirements.txt` and `pipeline.py` with Click	25-35 min	Working CLI with placeholder stages
Part 3	⏸️ Action Points	Wire the CLI to GitHub Actions	30-40 min	Green tick on the Actions tab
Part 4	🗣️ Discussion (if time allows)	AI agents	0-15 min	Vocabulary for agent concepts

👉 NOTE: Whenever you see a 👤 TEACHING MOMENT, this means your class teacher deserves your full attention!

Part 0: Barry’s Opening (10 min)

This section is a TEACHING MOMENT

Barry will recap the three pipeline design principles from the lecture and introduce what ✍️ Problem Set 2 asks you to build.

Barry will recap these concepts introduced in the lecture:

Atomicity: each stage does one thing only
Idempotency: running the same step twice produces the same output
Modularity: each stage is independent of the internal logic of other stages

These principles will guide the TPI pipeline you design for ✍️ Problem Set 2 too. The goal of today’s lab is to practice the pattern of defining pipeline stages as CLI commands, not to write the actual pipeline logic yet.

Part 1: Accept ✍️ Problem Set 2 Repository (5 min)

Accept the GitHub Classroom link below and clone the repository to your machine. The repo contains the ✍️ Problem Set 2 brief. You will create the pipeline files yourself in the next part.

COMING SOON: Link to GitHub Classroom assignment.

Part 2: Create `requirements.txt` and `pipeline.py` (25 min)

🎯 ACTION POINTS

Conda will be more robust on the long run but for now, let’s keep things simple and use pip for this lab. Later on, you may need to switch to conda for more complex dependencies.

Step 1: Create `requirements.txt`

Create a file called requirements.txt in the root of your repository with these two packages:

click
tqdm

This is intentionally minimal. Your ✍️ Problem Set 2 will need more packages as you build the actual pipeline in W08+, but for today you only need Click to define the CLI structure.

Install the dependencies now:

pip install -r requirements.txt

Step 2: Create `pipeline.py`

Create a file called pipeline.py in the root of your repository. This file defines your pipeline as a Click CLI with one subcommand per stage. Here is a worked example to start from:

import logging

import click

logging.basicConfig(level=logging.INFO)

@click.group()
def cli():
    """TPI data pipeline."""
    pass

@cli.command()
def crawl():
    """Collect data from TPI website."""
    logging.info("crawl: stage not yet implemented")

@cli.command()
def extract():
    """Extract structured content from collected data."""
    logging.info("extract: stage not yet implemented")

@cli.command()
def embed():
    """Generate embeddings from extracted content."""
    logging.info("embed: stage not yet implemented")

@cli.command()
def serve():
    """Start the search API."""
    logging.info("serve: stage not yet implemented")

if __name__ == "__main__":
    cli()

The stage names (crawl, extract, embed, serve) are just examples. Replace them with whatever stages you imagine your ✍️ Problem Set 2 pipeline will need. It is completely fine to revise these names later as you learn more about TPI data and RAG pipelines in W08+.

Each function body should only log a message for now. The real logic comes later. The point of this lab is to practise the architectural pattern: name your stages, wire them into a CLI, and confirm they run. The docstring on each command becomes help text when you run python pipeline.py --help. Writing a clear one-line description is good practice.

📖 Read more: documenting Click commands.

Step 3: Run individual stages

Test each stage on its own:

python pipeline.py crawl

python pipeline.py embed

You should see the logging messages appear. Try python pipeline.py --help to confirm all your stages are listed.

Step 4: Add a `run-all` command

You can run all stages in sequence using Click’s Context.invoke. Add this command to your pipeline.py:

@cli.command("run-all")
@click.pass_context
def run_all(ctx):
    """Run all pipeline stages in sequence."""
    ctx.invoke(crawl)
    ctx.invoke(extract)
    ctx.invoke(embed)
    ctx.invoke(serve)

Then run:

python pipeline.py run-all

All stages should fire in sequence and you should see four logging messages.

🔔 IMPORTANT:

If you rename your stage functions, update the ctx.invoke calls in run-all to match.

📖 Read more: Click: Context.invoke API reference

Step 5: Push to GitHub

Commit both files and push:

git add requirements.txt pipeline.py
git commit -m "Add skeleton pipeline with Click"
git push

You will need this on GitHub for Part 3.

Part 3: Wire to GitHub Actions (30 min)

🎯 ACTION POINTS

Step 1: Create the workflow file

Create a file under the .github/workflows/ directory called pipeline.yml in your repository. You will need to create the .github/workflows/ directory first if it does not exist.

Type this YAML into the pipeline.yml file:

name: TPI pipeline

on:
  push:
    branches: [main]

jobs:
  pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run pipeline
        run: python pipeline.py run-all

Each block in this YAML file maps to something you already understand from the lecture. The run: lines are the same commands you just ran in your terminal. If you want a refresher on what each piece does, see the 🖥️ Week 07 Lecture page.

Step 2: Push and verify

Commit the workflow file and push:

git add .github/workflows/pipeline.yml
git commit -m "Add GitHub Actions workflow"
git push

Open your repository on GitHub and click the Actions tab. You should see a workflow run triggered by your push. Wait for it to finish and look for a green tick.

Step 3: Read the logs

Click into the workflow run, then open the Run pipeline step. You should see the same logging messages you saw locally. The output is identical because the command is identical: python pipeline.py run-all runs the same way on GitHub’s Ubuntu machine as it does on your laptop.

Step 4: Troubleshooting

🔧 Common issues

“My workflow isn’t triggering”: check that the file is at exactly .github/workflows/pipeline.yml (not .yaml, not in a different folder).
ModuleNotFoundError: No module named 'click': check that requirements.txt is committed and not listed in .gitignore.
“The workflow ran but I see no log output”: confirm that pipeline.py uses logging.info() and that logging.basicConfig(level=logging.INFO) is at the top of the file. Alternatively, swap logging.info() for print() if you prefer. Both work.

💡 If you are stuck on the GitHub Actions step, ask Barry for help. The most common mistake is a file path typo in the workflow YAML.

Part 4: AI Agents Discussion (bonus, only if time allows)

Open discussion: What makes something an AI agent?

This discussion only runs if Parts 2 and 3 are complete. It is optional.

Opening question: “You have just built a pipeline that runs automatically when you push code, installs its own dependencies, and executes every stage in sequence without you touching it. What is the difference between that and an AI agent?”

There is no single correct answer, which is precisely the point. Consider the MIT CSAIL four-part characterisation of agents (from their 2025 AI Agent Index):

Autonomy: operating with minimal human oversight
Goal complexity: pursuing high-level objectives through planning
Environmental interaction: interacting with the world through tools and APIs
Generality: handling under-specified instructions and adapting to new tasks

Which of these four will your ✍️ Problem Set 2 pipeline have once fully built? That question is more useful than any definitive answer about what counts as an agent.

📖 Further reading:

Appendix | Resources

Course links

GitHub Actions

Click

✍️ Problem Set 2

TPI Centre corporates
GitHub Classroom link: TBC