DS205 2025-2026 Winter Term Icon

Week 08 Reference: Environment Setup and Troubleshooting

Nuvolos baseline, local caveats, GPU options, and CI separation

Author

Dr Jon Cardoso-Silva

Published

06 March 2026

Use this page as your setup reference for ✍️ Problem Set 2.

Earlier in the course, the environment setup probably felt like a formality. That is no longer true. We are going to start using heavier packages, that clash with a lot of other existing packages we normally have in our Python environment and so your setup choices can decide whether your workflow is smooth or painful to run.

Nuvolos is still the best starting point for consistency across the class but even there, if you add extra dependencies later, you still need to test carefully.

Start here: choose the right target runtime

Do not treat all environments as interchangeable.

Runtime context matters before package choice
Runtime target Typical OS What to optimise for Main risk
Nuvolos Linux Class consistency and reproducibility Assuming extra packages will behave exactly like baseline
GitHub Actions Ubuntu runner Minimal, repeatable CI checks Accidentally carrying local fixes into CI
Your local machine Windows/macOS/Linux Personal iteration speed Dependency/version variability across machines

Course baseline for Nuvolos (Ubuntu runtime)

This is the suggested baseline for class work on Nuvolos. It is Linux-first, so it is not the same as the Windows local setup.

name: rag
channels:
  - conda-forge
dependencies:
  - python=3.11
  - pip
  - numpy
  - pandas
  - python-dotenv
  - jupyter
  - ipykernel
  - ipywidgets
  - poppler
  - tesseract
  - pandoc
  - pip:
      - --extra-index-url https://download.pytorch.org/whl/cpu
      - torch==2.5.1+cpu
      - "unstructured[pdf]"
      - sentence-transformers
      - python-magic

Nuvolos reminders:

  • Keep installs inside your conda environment.
  • If a package is not available in your active environment, check with staff before trying ad hoc workarounds.
  • Keep HuggingFace cache on shared mount:
import os
os.environ["HF_HOME"] = "/space_mounts/ds205/huggingface"

CI is not your local environment (environment.ci.yml)

Use a dedicated CI spec for GitHub Actions, usually Ubuntu.

  • CI should be minimal and reproducible.
  • Local fixes should not leak into CI.
  • Runner machines are ephemeral, so local env naming conventions are not the priority.
Suggested environment.ci.yml (Ubuntu runner)
channels:
  - conda-forge
dependencies:
  - python=3.11
  - pip
  - numpy
  - pandas
  - python-dotenv
  - poppler
  - tesseract
  - pandoc
  - pip:
      - --extra-index-url https://download.pytorch.org/whl/cpu
      - torch==2.5.1+cpu
      - "unstructured[pdf]"
      - sentence-transformers
      - python-magic

Local machine variants (use with caution)

I have tested these setups locally, but every machine is different and local behaviour can vary wildly. Treat these files as starting points, not guarantees.

Linux local

Use the same YAML shown in the Nuvolos baseline first, then adjust only if needed.

macOS local

  • If needed, adapt package variants to match your machine.
  • If you use Apple Silicon GPU acceleration, this is the suggested YAML:
Suggested environment.gpu-macos.yml
name: rag-gpu-macos
channels:
  - conda-forge
dependencies:
  - python=3.11
  - pip
  - numpy
  - pandas
  - python-dotenv
  - jupyter
  - ipykernel
  - ipywidgets
  - poppler
  - tesseract
  - pandoc
  - pip:
      - torch==2.5.1
      - torchvision==0.20.1
      - torchaudio==2.5.1
      - "unstructured[pdf]"
      - sentence-transformers
      - python-magic

Windows local

  • Use:
    • Nuvolos baseline YAML on CPU (rag)
    • the GPU YAML below for optional NVIDIA GPU (rag-gpu-win)
  • For some Windows setups, this fallback can unblock torch runtime conflicts:
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
Suggested environment.gpu-win.yml
name: rag-gpu-win
channels:
  - conda-forge
dependencies:
  - python=3.11
  - pip
  - numpy
  - pandas
  - python-dotenv
  - jupyter
  - ipykernel
  - ipywidgets
  - poppler
  - tesseract
  - pandoc
  - pip:
      - --extra-index-url https://download.pytorch.org/whl/cu124
      - torch==2.5.1+cu124
      - torchvision==0.20.1+cu124
      - torchaudio==2.5.1+cu124
      - "unstructured[pdf]"
      - sentence-transformers
      - python-magic-bin

Optional GPU setup notes

Windows (NVIDIA)

If you choose the GPU path (environment.gpu-win.yml):

  1. Open Settings.
  2. Go to System > Display > Graphics.
  3. Under Add an app, choose Desktop app, then Browse.
  4. Add the environment-specific python.exe (for example ...\\envs\\rag-gpu-win\\python.exe) and Jupyter executable if relevant.
  5. Open Options, select High performance, then Save.

Optional multi-GPU selection:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

macOS (Apple Silicon)

If you choose the GPU path (environment.gpu-macos.yml), no macOS graphics-panel change is required.

Runtime device selection:

import torch
device = "mps" if torch.backends.mps.is_available() else "cpu"

Why these dependencies are here

  • poppler: PDF utilities used by parsing workflows
  • tesseract: OCR support for scanned pages
  • pandoc: conversion support for rich text formats
  • python-magic / python-magic-bin: file type detection
  • libreoffice: conversion support for Office-like files

Final practical rule

Start on Nuvolos with the baseline rag environment, test early, and only then branch into local or GPU variants if you genuinely need them.

If you add packages later, rerun a small end-to-end pipeline test immediately instead of assuming the environment is still healthy.