π» Week 02 - Class Roadmap (90 min)
2025/26 Autumn Term
Welcome to the second seminar/lab class of DS101A!
In the week 02 lecture, we introduced you to the most common data types and file formats.
In this lab, we get to practice the same concepts with a limited amount of code, while also exploring how generative AI tools can enhance (rather than replace) our data science learning.
You can keep referencing these readings on the common data types and file formats throughout:
Step 01 - Common discussion on data and datatypes (15 min)
Letβs have a look at a few datasets with economic data:
- What sorts of access do these sites provide?
- What are all the ways you can think of to obtain data?
- What are some good sources for social, economic and political data?
- Do you recognize any data types we saw on Monday?
Have a look at the readings listed at the beginning of the roadmap ((Sturz 2023) and (Shah 2020) and (βGetting Started with Python Lists and Dictionaries. Scaleway Documentationβ 2024)) and answer the following questions:
- What is a Boolean data type in Python?
- What is a list?
- What is a dictionary?
As you work through these concepts, consider: How might AI tools like ChatGPT help you understand these data types better? Could you ask it to generate examples or explain concepts in different ways? The key is using AI to deepen your understanding, not to bypass the learning process.
Step 02 β Common formats (15 min)
- Download the following dataset:
- Your class teacher will share a Google Colab notebook on Slack with you. Work through the notebook and follow instructions there.
While AI can help generate code to load CSV files, ask yourself: What am I learning by understanding the pd.read_csv()
function myself? The syntax is straightforward, but the parameters (encoding
, sep
, na_values
) teach us about data quality issues. Use AI to explain why these parameters matter, not just to write the code.
Step 03 β Observations and storytelling (20 min)
What sort of observations can you make from the data from step 02?
AI-Enhanced Analysis Discussion:
- If you were to use AI to generate plots from this data, what would you gain? What might you lose?
- Consider: AI can quickly create visualizations, but does it understand the context of your data?
- How might delegating plotting to AI help you focus on higher-level analytical thinking?
- Conversely, what insights come from manually exploring different chart types and parameters?
Critical Thinking Exercise:
- Try describing your data to an AI tool and ask it to suggest appropriate visualizations
- Compare its suggestions with your own intuitions about the data
- What does this tell you about the importance of domain knowledge in data analysis?
Step 04 β AI and the Data Pipeline: Where to Draw the Line (15 min)
- Data Cleaning Considerations:
- Explore the JSON generated at JSON Crack
- Try and change the structure of the JSON you made
- Discuss: Should we let AI handle data cleaning decisions automatically? Why or why not?
- Group Discussion Points:
- Low-stakes automation: Where might AI safely handle routine tasks (e.g., converting file formats, generating boilerplate code)?
- High-stakes decisions: What parts of data analysis require human judgment (e.g., handling missing values, outlier treatment, feature selection)?
- The βblack boxβ problem: How do we maintain transparency and reproducibility when using AI tools?
Remember: The goal is to use AI as a thinking partner that enhances your capabilities, not as a replacement for developing your own analytical skills. Ask yourself:
- Am I using AI to help me understand concepts better?
- Or am I using it to avoid having to think through problems myself?
The former builds expertise; the latter creates dependency.
Step 05 β Reflection and Best Practices (10 min)
Individual Reflection: Write down your thoughts on:
- One way AI tools helped you understand the material better today
- One instance where doing something manually (rather than with AI) was more valuable for learning
- How you might establish personal guidelines for AI use in data science work
Class Discussion: Share insights about balancing AI assistance with skill development.