📦 Final Project (30%)
2024/25 Winter Term

This is your final assessment for DS105W and it is worth 30% of your final grade (go to 👤 Individual Contribution to Group Project (10%) for more details on your individual contribution). Building on skills developed throughout the Winter Term, you’ll work in teams to develop a complete data science project from start to finish.
Overview
Your team will:
- Choose and justify your own data sources
- Come up with curiosity-driven exploratory questions to pose to the data
- Update the public-facing GitHub Pages website to narrate your findings
- Document your development process using GitHub’s collaboration features
- Submit individual contribution reflections with evidence of your role as both PILOT and COPILOT
Key Requirements:
- Team size: 3-4 members (as assigned in Week 10)
- Deadline: 29 May 2025, 8pm UK time
- Submission: Via GitHub repository
- Individual reflections: Due same time as group submission (see Individual Contribution Guidelines)
📝 Project Requirements
You are assessed as a group on the following components:
Data Sources & Collection
You have complete freedom in choosing your data sources, but they must meet specific requirements. For detailed guidance on selecting appropriate data sources, please refer to our comprehensive guide: 🔍 Choosing Your Data Source
In summary, your primary data source must be one of:
- Data collected through an API using the
requests
library - Self-collected data with proper documentation and consent
- Complex static datasets requiring significant reshaping
Remember that simple bulk downloads (e.g., basic CSVs or pre-made Kaggle datasets) are not acceptable as your main data source, though they can be used to supplement your analysis.
Technical Implementation
The scale of this project is similar to ✍️ Mini-Project 2. Here are the key technical requirements:
Required Components
- Data Collection & Processing
- Database Implementation
- Analysis & Visualisation
- Documentation
Optional Enhancements
- Interactive dashboards (be careful not to overdo it)
- Multi-page website with navigation
- Advanced data analysis techniques (you must really know what you’re doing though)
Development Process
Your project must demonstrate collaborative development through GitHub’s features:
- Task distribution via GitHub Project Boards with balanced workload
- Pull Requests linked to Issues
- Code review process with substantive feedback
- Code is well-organised and it feels like a cohesive group effort
Important: Your development process is as important as the final product. We will be looking at how effectively you used GitHub’s collaboration features throughout the project lifecycle.
📑 Marking Criteria
Criterion | Weight | What We’re Looking For |
---|---|---|
Clear Intent | 20% | Evidence of reflection on coding choices, thoughtful documentation, and deliberate implementation following course principles. Code that shows understanding rather than just “making it work.” |
Data Transformation Mastery | 40% | Effective application of data manipulation techniques, proper reshaping and cleaning, appropriate use of pandas and SQL. Database design that shows understanding of relational principles. |
Effective Visualisations | 30% | Visualisations that tell a story rather than just describe data. Appropriate aesthetic choices, clear narrative flow, and insightful titles that convey meaning. |
Collaborative Development | 10% | Evidence of team coordination through GitHub features, balanced contributions, and effective project management. Code that feels like a cohesive group effort rather than disconnected individual pieces. |
If one team member is not pulling their weight, try to solve it amicably with them first. If may be that they are overwhelmed by other commitments. If nothing works, contact Jon.
Note on Grading:
Following institutional guidance on grade normalisation, we will maintain high standards for this assessment. As usual, marks above 70 will be reserved for truly exceptional work that demonstrates mastery of course concepts and professional-quality implementation.
However, as noted in previous assessments, I find the artificial ‘cap’ at 70+ marks unnecessary for an undergraduate course focused on hands-on experience. If your work clearly demonstrates meaningful engagement beyond a shallow level, I’ll be happy to award distinctions.
👤 Individual Contributions
Each team member must also submit a personal reflection demonstrating their contribution to the project. For detailed requirements and marking criteria, please see the Individual Contribution Guidelines.
🤔 Need Help?
- We will offer drop-in sessions in May 2025 to help you with any issues
- Attend office hours during Spring Term 2025 (check StudentHub for availability)
- Review Week 10-11 materials on project management and Git collaboration
Footnotes
don’t add too many visualisations to the website!↩︎
e.g., “Rising Engagement in Tech Subreddits” rather than “Comments over Time”↩︎
Try to make it evident to us that you truly understand the summary statistics you’re using↩︎
If using a technique not covered in the course (statistical methods, machine learning, etc), everything is super well-explained and we feel confident that you know what you’re doing.↩︎
In the analysis, don’t make any claims that are not supported by the data. If necessary, draw on academic or other reputable sources to support your understanding of the data and the analysis.↩︎