π» Week 08 Lab
Exploring Transformer Models with HuggingFace

Last Updated: 10 March 2025, 18:00
πTime and Location: Tuesday, 11 March 2025. Check your timetable for the precise time and location of your class.
π§ͺ Lab Overview
Todayβs lab focuses on hands-on exploration of transformer models, building on the concepts introduced in the π£οΈ Week 08 Lecture. Youβll work with the same climate document corpus from last week but using more advanced NLP techniques.
Download also the utils.py
file:
Prerequisites
We assume you:
- β Attended or reviewed the π£οΈ Week 08 Lecture
- β Have your Python environment set up with the required packages
- β Have access to the NDC corpus from last weekβs lab
If you have not installed the packages, we recommend you run the labβs notebook on Nuvolos. The packages and environment are already set up for you there.
π£οΈ Lab Structure
Part 1: Setting Up (10 min)
This is an π― ACTION POINT for you to work on individually.
- Environment Setup
- Continue using your
embedding-env
from last week - Update with new requirements (see below)
- Load the ClimateBERT model
- Continue using your
- Loading the Data
- Work with the NDC corpus from last week
- Compare βlazyβ vs βrobustβ preprocessing approaches
Click HERE to see the updated requirements.txt
# Core data science packages
numpy==1.26.4
pandas==2.2.3
matplotlib==3.10.1
scikit-learn==1.6.1
# NLP and text processing
nltk==3.9.1
gensim==4.3.3
langdetect==1.0.9
# Transformers and deep learning
transformers==4.39.3
datasets==2.18.0
torch==2.2.1
# Visualization
lets-plot==4.6.0
# Utilities
tqdm==4.67.1
ipykernel==6.29.5
ipywidgets==8.1.5
Part 2: Document Chunking and Embeddings (20 min)
π£οΈ TEACHING MOMENT
Your class teacher will guide you through:
- Fine-grained document chunking strategies
- Computing embeddings with transformer models
- Comparing embedding similarities
Part 3: Open Exploration (50 min)
This is your chance to deeply explore transformer models and their capabilities. Choose from these suggested areas or pursue your own interests.
π― ACTION POINTS:
- Choose an area to explore
- Document your findings
- Share screenshots of interesting discoveries in the
#social
Slack channel
π Looking Ahead
The techniques explored today will be directly relevant to βοΈ Problem Set 2 (to be released soon). Use this lab to:
- Experiment with different approaches to document analysis
- Understand how transformer models handle climate-specific language
- Practice explaining your methodology and findings
Your ability to link practical explorations to theoretical concepts will be key for the upcoming assignment.