💻 Week 08 Lab

Exploring Transformer Models with HuggingFace

Author

Published

11 March 2025

🥅 Learning Goals

By the end of this lab, you will have explored transformer-based models for climate document analysis, experimented with contextual embeddings, and discussed potential insights that can be derived from these advanced NLP techniques.

Last Updated: 10 March 2025, 18:00

📍Time and Location: Tuesday, 11 March 2025. Check your timetable for the precise time and location of your class.

🧪 Lab Overview

Today’s lab focuses on hands-on exploration of transformer models, building on the concepts introduced in the 🗣️ Week 08 Lecture. You’ll work with the same climate document corpus from last week but using more advanced NLP techniques.

Download also the utils.py file:

Prerequisites

We assume you:

✅ Attended or reviewed the 🗣️ Week 08 Lecture
✅ Have your Python environment set up with the required packages
✅ Have access to the NDC corpus from last week’s lab

If you have not installed the packages, we recommend you run the lab’s notebook on Nuvolos. The packages and environment are already set up for you there.

🛣️ Lab Structure

Part 1: Setting Up (10 min)

This is an 🎯 ACTION POINT for you to work on individually.

Environment Setup
- Continue using your embedding-env from last week
- Update with new requirements (see below)
- Load the ClimateBERT model
Loading the Data
- Work with the NDC corpus from last week
- Compare ‘lazy’ vs ‘robust’ preprocessing approaches

Click HERE to see the updated requirements.txt

# Core data science packages
numpy==1.26.4
pandas==2.2.3
matplotlib==3.10.1
scikit-learn==1.6.1

# NLP and text processing
nltk==3.9.1
gensim==4.3.3
langdetect==1.0.9

# Transformers and deep learning
transformers==4.39.3
datasets==2.18.0
torch==2.2.1

# Visualization
lets-plot==4.6.0

# Utilities
tqdm==4.67.1
ipykernel==6.29.5
ipywidgets==8.1.5

Part 2: Document Chunking and Embeddings (20 min)

🗣️ TEACHING MOMENT

Your class teacher will guide you through:

Fine-grained document chunking strategies
Computing embeddings with transformer models
Comparing embedding similarities

Part 3: Open Exploration (50 min)

This is your chance to deeply explore transformer models and their capabilities. Choose from these suggested areas or pursue your own interests.

🎯 ACTION POINTS:

Choose an area to explore
Document your findings
Share screenshots of interesting discoveries in the #social Slack channel

🏠 Looking Ahead

The techniques explored today will be directly relevant to ✍️ Problem Set 2 (to be released soon). Use this lab to:

Experiment with different approaches to document analysis
Understand how transformer models handle climate-specific language
Practice explaining your methodology and findings

Your ability to link practical explorations to theoretical concepts will be key for the upcoming assignment.