Week 08 Reference: Environment Setup and Troubleshooting
Nuvolos baseline, local caveats, GPU options, and CI separation
Use this page as your setup reference for ✍️ Problem Set 2.
Earlier in the course, the environment setup probably felt like a formality. That is no longer true. We are going to start using heavier packages, that clash with a lot of other existing packages we normally have in our Python environment and so your setup choices can decide whether your workflow is smooth or painful to run.
Nuvolos is still the best starting point for consistency across the class but even there, if you add extra dependencies later, you still need to test carefully.
Start here: choose the right target runtime
Do not treat all environments as interchangeable.
| Runtime target | Typical OS | What to optimise for | Main risk |
|---|---|---|---|
| Nuvolos | Linux | Class consistency and reproducibility | Assuming extra packages will behave exactly like baseline |
| GitHub Actions | Ubuntu runner | Minimal, repeatable CI checks | Accidentally carrying local fixes into CI |
| Your local machine | Windows/macOS/Linux | Personal iteration speed | Dependency/version variability across machines |
Course baseline for Nuvolos (Ubuntu runtime)
This is the suggested baseline for class work on Nuvolos. It is Linux-first, so it is not the same as the Windows local setup.
name: rag
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- numpy
- pandas
- python-dotenv
- jupyter
- ipykernel
- ipywidgets
- poppler
- tesseract
- pandoc
- pip:
- --extra-index-url https://download.pytorch.org/whl/cpu
- torch==2.5.1+cpu
- "unstructured[pdf]"
- sentence-transformers
- python-magicNuvolos reminders:
- Keep installs inside your conda environment.
- If a package is not available in your active environment, check with staff before trying ad hoc workarounds.
- Keep HuggingFace cache on shared mount:
import os
os.environ["HF_HOME"] = "/space_mounts/ds205/huggingface"CI is not your local environment (environment.ci.yml)
Use a dedicated CI spec for GitHub Actions, usually Ubuntu.
- CI should be minimal and reproducible.
- Local fixes should not leak into CI.
- Runner machines are ephemeral, so local env naming conventions are not the priority.
Suggested environment.ci.yml (Ubuntu runner)
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- numpy
- pandas
- python-dotenv
- poppler
- tesseract
- pandoc
- pip:
- --extra-index-url https://download.pytorch.org/whl/cpu
- torch==2.5.1+cpu
- "unstructured[pdf]"
- sentence-transformers
- python-magicLocal machine variants (use with caution)
I have tested these setups locally, but every machine is different and local behaviour can vary wildly. Treat these files as starting points, not guarantees.
Linux local
Use the same YAML shown in the Nuvolos baseline first, then adjust only if needed.
macOS local
- If needed, adapt package variants to match your machine.
- If you use Apple Silicon GPU acceleration, this is the suggested YAML:
Suggested environment.gpu-macos.yml
name: rag-gpu-macos
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- numpy
- pandas
- python-dotenv
- jupyter
- ipykernel
- ipywidgets
- poppler
- tesseract
- pandoc
- pip:
- torch==2.5.1
- torchvision==0.20.1
- torchaudio==2.5.1
- "unstructured[pdf]"
- sentence-transformers
- python-magicWindows local
- Use:
- Nuvolos baseline YAML on CPU (
rag) - the GPU YAML below for optional NVIDIA GPU (
rag-gpu-win)
- Nuvolos baseline YAML on CPU (
- For some Windows setups, this fallback can unblock
torchruntime conflicts:
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
Suggested environment.gpu-win.yml
name: rag-gpu-win
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- numpy
- pandas
- python-dotenv
- jupyter
- ipykernel
- ipywidgets
- poppler
- tesseract
- pandoc
- pip:
- --extra-index-url https://download.pytorch.org/whl/cu124
- torch==2.5.1+cu124
- torchvision==0.20.1+cu124
- torchaudio==2.5.1+cu124
- "unstructured[pdf]"
- sentence-transformers
- python-magic-binOptional GPU setup notes
Windows (NVIDIA)
If you choose the GPU path (environment.gpu-win.yml):
- Open Settings.
- Go to System > Display > Graphics.
- Under Add an app, choose Desktop app, then Browse.
- Add the environment-specific
python.exe(for example...\\envs\\rag-gpu-win\\python.exe) and Jupyter executable if relevant. - Open Options, select High performance, then Save.
Optional multi-GPU selection:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"macOS (Apple Silicon)
If you choose the GPU path (environment.gpu-macos.yml), no macOS graphics-panel change is required.
Runtime device selection:
import torch
device = "mps" if torch.backends.mps.is_available() else "cpu"Why these dependencies are here
poppler: PDF utilities used by parsing workflowstesseract: OCR support for scanned pagespandoc: conversion support for rich text formatspython-magic/python-magic-bin: file type detectionlibreoffice: conversion support for Office-like files
Final practical rule
Start on Nuvolos with the baseline rag environment, test early, and only then branch into local or GPU variants if you genuinely need them.
If you add packages later, rerun a small end-to-end pipeline test immediately instead of assuming the environment is still healthy.