📦 Final Project (30%)

2024/25 Winter Term

Author
Published

31 March 2025

DS105W course icon

This is your final assessment for DS105W and it is worth 30% of your final grade (go to 👤 Individual Contribution to Group Project (10%) for more details on your individual contribution). Building on skills developed throughout the Winter Term, you’ll work in teams to develop a complete data science project from start to finish.

Overview

Your team will:

  1. Choose and justify your own data sources
  2. Come up with curiosity-driven exploratory questions to pose to the data
  3. Update the public-facing GitHub Pages website to narrate your findings
  4. Document your development process using GitHub’s collaboration features
  5. Submit individual contribution reflections with evidence of your role as both PILOT and COPILOT

Key Requirements:

  • Team size: 3-4 members (as assigned in Week 10)
  • Deadline: 29 May 2025, 8pm UK time
  • Submission: Via GitHub repository
  • Individual reflections: Due same time as group submission (see Individual Contribution Guidelines)

📝 Project Requirements

You are assessed as a group on the following components:

Data Sources & Collection

You have complete freedom in choosing your data sources, but they must meet specific requirements. For detailed guidance on selecting appropriate data sources, please refer to our comprehensive guide: 🔍 Choosing Your Data Source

In summary, your primary data source must be one of:

  • Data collected through an API using the requests library
  • Self-collected data with proper documentation and consent
  • Complex static datasets requiring significant reshaping

Remember that simple bulk downloads (e.g., basic CSVs or pre-made Kaggle datasets) are not acceptable as your main data source, though they can be used to supplement your analysis.

Technical Implementation

The scale of this project is similar to ✍️ Mini-Project 2. Here are the key technical requirements:

Required Components

  1. Data Collection & Processing
  2. Database Implementation
  3. Analysis & Visualisation
  4. Documentation

Optional Enhancements

  • Interactive dashboards (be careful not to overdo it)
  • Multi-page website with navigation
  • Advanced data analysis techniques (you must really know what you’re doing though)

Development Process

Your project must demonstrate collaborative development through GitHub’s features:

  • Task distribution via GitHub Project Boards with balanced workload
  • Pull Requests linked to Issues
  • Code review process with substantive feedback
  • Code is well-organised and it feels like a cohesive group effort

Important: Your development process is as important as the final product. We will be looking at how effectively you used GitHub’s collaboration features throughout the project lifecycle.

📑 Marking Criteria

Criterion Weight What We’re Looking For
Clear Intent 20% Evidence of reflection on coding choices, thoughtful documentation, and deliberate implementation following course principles. Code that shows understanding rather than just “making it work.”
Data Transformation Mastery 40% Effective application of data manipulation techniques, proper reshaping and cleaning, appropriate use of pandas and SQL. Database design that shows understanding of relational principles.
Effective Visualisations 30% Visualisations that tell a story rather than just describe data. Appropriate aesthetic choices, clear narrative flow, and insightful titles that convey meaning.
Collaborative Development 10% Evidence of team coordination through GitHub features, balanced contributions, and effective project management. Code that feels like a cohesive group effort rather than disconnected individual pieces.

If one team member is not pulling their weight, try to solve it amicably with them first. If may be that they are overwhelmed by other commitments. If nothing works, contact Jon.

Note on Grading:

Following institutional guidance on grade normalisation, we will maintain high standards for this assessment. As usual, marks above 70 will be reserved for truly exceptional work that demonstrates mastery of course concepts and professional-quality implementation.

However, as noted in previous assessments, I find the artificial ‘cap’ at 70+ marks unnecessary for an undergraduate course focused on hands-on experience. If your work clearly demonstrates meaningful engagement beyond a shallow level, I’ll be happy to award distinctions.

👤 Individual Contributions

Each team member must also submit a personal reflection demonstrating their contribution to the project. For detailed requirements and marking criteria, please see the Individual Contribution Guidelines.

🤔 Need Help?

  • We will offer drop-in sessions in May 2025 to help you with any issues
  • Attend office hours during Spring Term 2025 (check StudentHub for availability)
  • Review Week 10-11 materials on project management and Git collaboration

Footnotes

  1. don’t add too many visualisations to the website!↩︎

  2. e.g., “Rising Engagement in Tech Subreddits” rather than “Comments over Time”↩︎

  3. Try to make it evident to us that you truly understand the summary statistics you’re using↩︎

  4. If using a technique not covered in the course (statistical methods, machine learning, etc), everything is super well-explained and we feel confident that you know what you’re doing.↩︎

  5. In the analysis, don’t make any claims that are not supported by the data. If necessary, draw on academic or other reputable sources to support your understanding of the data and the analysis.↩︎