The DS105A Data Science Workflow

Author

Dr Jon Cardoso-Silva

Published

13 December 2025

Understanding the Data Science Workflow

In DS105A, we follow a structured approach to working with data. This workflow represents the typical stages you will move through when conducting any data analysis project.

Think of this workflow as a map. Sometimes you move forward through the stages. Other times you loop back when you discover something new or need better data.

The Two Entry Points

Every data project starts in one of two ways:

Question-driven start: You have a specific question you want to answer. You then find data to help answer it.
Curiosity-driven start: You have interesting data and want to explore what questions it can answer.

Both approaches are valid. The workflow works the same way once you begin.

The Complete Workflow

Vertical View

Horizontal View

What Each Stage Produces

The table below shows concrete examples of what you create at each stage. These are not abstract concepts. They are actual code, files, and insights you will produce.

Stage	Jupyter Notebook	README/Website	Files
COLLECT	`response = requests.get(url).json()`	“Data collected from Open-Meteo API”	Raw variables in memory
STORE	`with open('weather.json', 'w') as f: json.dump(data, f)`	“Raw data saved to data/raw/”	`.json`, `.csv` files
PREPARE	Clean DataFrames: `df.dropna()`, `df.astype()`	“Cleaned 1,200 records, removed 3% missing values”	Analysis-ready tables
EXPLORE	`df.describe()`, histograms, `df.corr()`	“Temperature peaked in July (avg 24°C)”	Summary statistics, plots
INVESTIGATE	Statistical tests: `scipy.stats.ttest_ind()`	“London 40% hotter than Paris since 2010”	Hypothesis test results
MODEL	`RandomForestRegressor.fit()`, feature importance	“Model predicts temperature with 85% accuracy”	Trained models, predictions
COMMUNICATE	Markdown narratives, publication plots	Executive summaries, key takeaways	Clean reports, presentations

The Feedback Loops

Notice the dashed arrows in the diagram. These represent the reality of data work. You rarely move straight from start to finish.

When you investigate your data, you might discover:

You need more data (loop back to COLLECT)
Your data structure needs changing (loop back to PREPARE)
New questions emerge (loop back to EXPLORE)

This is normal. This is how data science actually works.

How This Course Teaches the Workflow

Each week builds your competence in different parts of this workflow:

Weeks 1-4: Focus on COLLECT, STORE, and PREPARE. You learn to work with data tables.
Week 4: Your first complete pass through the entire workflow.
Week 5: Deeper work on EXPLORE and COMMUNICATE through visualisation.
Weeks 7-10: Complex data preparation, reshaping, joining multiple sources.
Winter Term: Group project where you execute the full workflow independently.

Every practice exercise and assignment is designed around this workflow. When you feel lost, return to this page and ask yourself: “Which stage am I working on right now?”