๐ป Week 07 Lab
Building a Click Pipeline and Wiring It to GitHub Actions
By the end of this lab, you should be able to: i) Structure a multi-stage data pipeline as a Click CLI, ii) Run pipeline stages individually and in sequence from the terminal, iii) Define a GitHub Actions workflow that installs dependencies and runs the same pipeline on a remote machine, iv) Read a GitHub Actions run log and match it to local terminal output.
The ๐ฅ๏ธ Week 07 Lecture showed how to configure a GitHub Actions workflow to run your pipeline steps on a remote GitHub-hosted machine. This lab makes that correspondence concrete: you will build a Click CLI that defines your pipeline stages, run it locally to confirm it works, then create a GitHub Actions workflow that runs the same commands on a remote machine.
๐ Session Details
- Date: Tuesday, 03 March 2026
- Time: Check your timetable for the precise time of your class
- Duration: 90 minutes
๐ฃ๏ธ Lab Roadmap
| Part | Activity Type | Focus | Time | Outcome |
|---|---|---|---|---|
| Part 0 | ๐ค Teaching Moment | Pipeline principles recap + PS2 goals | 10 min | Shared vocabulary before coding |
| Part 1 | Setup | Accept โ๏ธ Problem Set 2 repository | 5 min | Cloned repo ready |
| Part 2 | โธ๏ธ Action Points | Create requirements.txt and pipeline.py with Click |
25-35 min | Working CLI with placeholder stages |
| Part 3 | โธ๏ธ Action Points | Wire the CLI to GitHub Actions | 30-40 min | Green tick on the Actions tab |
| Part 4 | ๐ฃ๏ธ Discussion (if time allows) | AI agents | 0-15 min | Vocabulary for agent concepts |
๐ NOTE: Whenever you see a ๐ค TEACHING MOMENT, this means your class teacher deserves your full attention!
Part 0: Barryโs Opening (10 min)
This section is a TEACHING MOMENT
Barry will recap the three pipeline design principles from the lecture and introduce what โ๏ธ Problem Set 2 asks you to build.
Barry will recap these concepts introduced in the lecture:
- Atomicity: each stage does one thing only
- Idempotency: running the same step twice produces the same output
- Modularity: each stage is independent of the internal logic of other stages
These principles will guide the TPI pipeline you design for โ๏ธ Problem Set 2 too. The goal of todayโs lab is to practice the pattern of defining pipeline stages as CLI commands, not to write the actual pipeline logic yet.
Part 1: Accept โ๏ธ Problem Set 2 Repository (5 min)
Accept the GitHub Classroom link below and clone the repository to your machine. The repo contains the โ๏ธ Problem Set 2 brief. You will create the pipeline files yourself in the next part.
COMING SOON: Link to GitHub Classroom assignment.
Part 2: Create requirements.txt and pipeline.py (25 min)
๐ฏ ACTION POINTS
Conda will be more robust on the long run but for now, letโs keep things simple and use pip for this lab. Later on, you may need to switch to conda for more complex dependencies.
Step 1: Create requirements.txt
Create a file called requirements.txt in the root of your repository with these two packages:
click
tqdm
This is intentionally minimal. Your โ๏ธ Problem Set 2 will need more packages as you build the actual pipeline in W08+, but for today you only need Click to define the CLI structure.
Install the dependencies now:
pip install -r requirements.txtStep 2: Create pipeline.py
Create a file called pipeline.py in the root of your repository. This file defines your pipeline as a Click CLI with one subcommand per stage. Here is a worked example to start from:
import logging
import click
logging.basicConfig(level=logging.INFO)
@click.group()
def cli():
"""TPI data pipeline."""
pass
@cli.command()
def crawl():
"""Collect data from TPI website."""
logging.info("crawl: stage not yet implemented")
@cli.command()
def extract():
"""Extract structured content from collected data."""
logging.info("extract: stage not yet implemented")
@cli.command()
def embed():
"""Generate embeddings from extracted content."""
logging.info("embed: stage not yet implemented")
@cli.command()
def serve():
"""Start the search API."""
logging.info("serve: stage not yet implemented")
if __name__ == "__main__":
cli()The stage names (crawl, extract, embed, serve) are just examples. Replace them with whatever stages you imagine your โ๏ธ Problem Set 2 pipeline will need. It is completely fine to revise these names later as you learn more about TPI data and RAG pipelines in W08+.
Each function body should only log a message for now. The real logic comes later. The point of this lab is to practise the architectural pattern: name your stages, wire them into a CLI, and confirm they run. The docstring on each command becomes help text when you run python pipeline.py --help. Writing a clear one-line description is good practice.
๐ Read more: documenting Click commands.
Step 3: Run individual stages
Test each stage on its own:
python pipeline.py crawlpython pipeline.py embedYou should see the logging messages appear. Try python pipeline.py --help to confirm all your stages are listed.
Step 4: Add a run-all command
You can run all stages in sequence using Clickโs Context.invoke. Add this command to your pipeline.py:
@cli.command("run-all")
@click.pass_context
def run_all(ctx):
"""Run all pipeline stages in sequence."""
ctx.invoke(crawl)
ctx.invoke(extract)
ctx.invoke(embed)
ctx.invoke(serve)Then run:
python pipeline.py run-allAll stages should fire in sequence and you should see four logging messages.
๐ IMPORTANT:
If you rename your stage functions, update the ctx.invoke calls in run-all to match.
๐ Read more: Click: Context.invoke API reference
Step 5: Push to GitHub
Commit both files and push:
git add requirements.txt pipeline.py
git commit -m "Add skeleton pipeline with Click"
git pushYou will need this on GitHub for Part 3.
Part 3: Wire to GitHub Actions (30 min)
๐ฏ ACTION POINTS
Step 1: Create the workflow file
Create a file under the .github/workflows/ directory called pipeline.yml in your repository. You will need to create the .github/workflows/ directory first if it does not exist.
Type this YAML into the pipeline.yml file:
name: TPI pipeline
on:
push:
branches: [main]
jobs:
pipeline:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run pipeline
run: python pipeline.py run-allEach block in this YAML file maps to something you already understand from the lecture. The run: lines are the same commands you just ran in your terminal. If you want a refresher on what each piece does, see the ๐ฅ๏ธ Week 07 Lecture page.
Step 2: Push and verify
Commit the workflow file and push:
git add .github/workflows/pipeline.yml
git commit -m "Add GitHub Actions workflow"
git pushOpen your repository on GitHub and click the Actions tab. You should see a workflow run triggered by your push. Wait for it to finish and look for a green tick.
Step 3: Read the logs
Click into the workflow run, then open the Run pipeline step. You should see the same logging messages you saw locally. The output is identical because the command is identical: python pipeline.py run-all runs the same way on GitHubโs Ubuntu machine as it does on your laptop.
Step 4: Troubleshooting
๐ง Common issues
- โMy workflow isnโt triggeringโ: check that the file is at exactly
.github/workflows/pipeline.yml(not.yaml, not in a different folder). ModuleNotFoundError: No module named 'click': check thatrequirements.txtis committed and not listed in.gitignore.- โThe workflow ran but I see no log outputโ: confirm that
pipeline.pyuseslogging.info()and thatlogging.basicConfig(level=logging.INFO)is at the top of the file. Alternatively, swaplogging.info()forprint()if you prefer. Both work.
๐ก If you are stuck on the GitHub Actions step, ask Barry for help. The most common mistake is a file path typo in the workflow YAML.
Part 4: AI Agents Discussion (bonus, only if time allows)
Open discussion: What makes something an AI agent?
This discussion only runs if Parts 2 and 3 are complete. It is optional.
Opening question: โYou have just built a pipeline that runs automatically when you push code, installs its own dependencies, and executes every stage in sequence without you touching it. What is the difference between that and an AI agent?โ
There is no single correct answer, which is precisely the point. Consider the MIT CSAIL four-part characterisation of agents (from their 2025 AI Agent Index):
- Autonomy: operating with minimal human oversight
- Goal complexity: pursuing high-level objectives through planning
- Environmental interaction: interacting with the world through tools and APIs
- Generality: handling under-specified instructions and adapting to new tasks
Which of these four will your โ๏ธ Problem Set 2 pipeline have once fully built? That question is more useful than any definitive answer about what counts as an agent.
๐ Further reading:
Appendix | Resources
Course links
- ๐ฅ๏ธ W07 Lecture
- ๐ Syllabus
- ๐ค DS205 AI Tutor
Slack
GitHub Actions
โ๏ธ Problem Set 2
- TPI Centre corporates
- GitHub Classroom link: TBC