

The DS105A Data Science Workflow
Understanding the Data Science Workflow
In DS105A, we follow a structured approach to working with data. This workflow represents the typical stages you will move through when conducting any data analysis project.
Think of this workflow as a map. Sometimes you move forward through the stages. Other times you loop back when you discover something new or need better data.
The Two Entry Points
Every data project starts in one of two ways:
- Question-driven start: You have a specific question you want to answer. You then find data to help answer it.
- Curiosity-driven start: You have interesting data and want to explore what questions it can answer.
Both approaches are valid. The workflow works the same way once you begin.
The Complete Workflow
Vertical View
Horizontal View
What Each Stage Produces
The table below shows concrete examples of what you create at each stage. These are not abstract concepts. They are actual code, files, and insights you will produce.
Stage | Jupyter Notebook | README/Website | Files |
---|---|---|---|
COLLECT | response = requests.get(url).json() |
“Data collected from Open-Meteo API” | Raw variables in memory |
STORE | with open('weather.json', 'w') as f: json.dump(data, f) |
“Raw data saved to data/raw/” | .json , .csv files |
PREPARE | Clean DataFrames: df.dropna() , df.astype() |
“Cleaned 1,200 records, removed 3% missing values” | Analysis-ready tables |
EXPLORE | df.describe() , histograms, df.corr() |
“Temperature peaked in July (avg 24°C)” | Summary statistics, plots |
INVESTIGATE | Statistical tests: scipy.stats.ttest_ind() |
“London 40% hotter than Paris since 2010” | Hypothesis test results |
MODEL | RandomForestRegressor.fit() , feature importance |
“Model predicts temperature with 85% accuracy” | Trained models, predictions |
COMMUNICATE | Markdown narratives, publication plots | Executive summaries, key takeaways | Clean reports, presentations |
The Feedback Loops
Notice the dashed arrows in the diagram. These represent the reality of data work. You rarely move straight from start to finish.
When you investigate your data, you might discover:
- You need more data (loop back to COLLECT)
- Your data structure needs changing (loop back to PREPARE)
- New questions emerge (loop back to EXPLORE)
This is normal. This is how data science actually works.
How This Course Teaches the Workflow
Each week builds your competence in different parts of this workflow:
- Weeks 1-4: Focus on COLLECT, STORE, and PREPARE. You learn to work with data tables.
- Week 4: Your first complete pass through the entire workflow.
- Week 5: Deeper work on EXPLORE and COMMUNICATE through visualisation.
- Weeks 7-10: Complex data preparation, reshaping, joining multiple sources.
- Winter Term: Group project where you execute the full workflow independently.
Every practice exercise and assignment is designed around this workflow. When you feel lost, return to this page and ask yourself: “Which stage am I working on right now?”