DS105 2025-2026 Autumn Term Icon

📦 Final Project (30%)

2025/26 Autumn Term

Author

Dr Jon Cardoso-Silva

Published

17 December 2025

Modified

17 December 2025

This is your final group assessment for DS105A, worth 30% of your final grade. Go to 👤 Individual Contribution (10%) for details on your individual reflection component. Building on skills developed throughout the Autumn Term, you’ll work in teams to develop a complete data science project from start to finish.

Deadline Winter Term W03 (February 2026) at 8 pm UK time
💎 Weight 30% of final grade (group component)
👥 Teams 3-4 members (assigned in Week 11)
📤 Submission Via GitHub repository

📝 Overview

Your team will:

  1. Choose and justify your own data sources
  2. Pose curiosity-driven exploratory questions to the data
  3. Build a public-facing narrative website showcasing your findings
  4. Work collaboratively using Git in a way that suits your team well
  5. Submit individual contribution reflections with evidence of your role

Key Requirements:

  • Scale similar to ✍️ Mini-Project 2
  • Primary data from API, self-collected, or complex static sources
  • SQLite database with properly designed schema
  • Narrative website (replacing traditional REPORT.md)
  • Individual reflections due same time as group submission (see Individual Contribution Guidelines)

📊 Data Sources & Collection

You have complete freedom in choosing your data sources, but they must meet specific requirements. For detailed guidance, refer to our guide: 📚 Choosing Your Data Source.

Your primary data source must be one of:

  • Data collected through an API using the requests library
  • Self-collected data with proper documentation and consent
  • Complex static datasets requiring significant reshaping

Simple bulk downloads (basic CSVs or pre-made Kaggle datasets) are not acceptable as your main data source unless expressly permitted by Jon, though they can supplement your analysis.

🛠️ Technical Requirements

The scale of this project is similar to ✍️ Mini-Project 2. Only this time, I don’t expect you to write ‘personal reflections’ in the notebooks. Instead, you should make sure whatever code and sections you write are properly documented and explain why you wrote it the way you did so that it is readable and easy to understand by someone else in your team.

⚠️ IMPORTANT: This time, you choose how to organise your repository. Make sure there are no oddities (notebooks in the root folder, data mixed up with notebooks, etc.)!

Here are the key technical requirements:

Required Components

  1. Data Collection & Processing
    • Documented data collection methodology
    • Clear data cleaning and processing steps
    • Data processing happens as early as possible in the workflow, such that all the data used in the analysis notebook(s) comes directly from the database
    • Efficient and well-organised code
    • Use of vectorised operations whenever possible
    • Proper error handling if necessary (e.g., if it is likely that the API will fail, write try... except... blocks to handle the error gracefully)
  2. Database Implementation
    • SQLite database with properly designed schema (data types, primary keys, foreign keys)
    • Table must have at least two tables, and they must be meaningfully connected to each other by foreign keys
    • Appropriate table relationships
    • Database reads and writes at the right times
  3. Analysis & Visualisation
    • At least three distinct visualisations or table summaries
    • Clear narrative flow throughout the website
    • Thoughtful choice of visualisations
    • Good use of pandas (or SQL) to reshape data for analysis
    • Plot titles convey insights, not just describe what’s shown
    • Critical interpretation of quantitative metrics used
    • If advanced techniques are used, they are clearly explained and justified
    • No claims unsubstantiated by the data

Optional Enhancements

  • Interactive dashboards (be careful not to overdo it and not to make it the main point of the website)
  • Multi-page website with navigation
  • Statistical techniques and machine learning models (you must really know what you’re doing as we will raise the level of scrutiny we apply to these)
    • Remember, when in doubt, avoid making strong bold claims and opt for “We believe this [real pattern shown in a plot] is due to […] because [… provide academic literature or other reputable sources to support the claim…]”

🌐 Website Requirement

Instead of a REPORT.md, this time your project must include a narrative website hosted via GitHub Pages. This website tells the story of your analysis and presents your findings to a general audience.

This is the same website you used for your Pitch Presentation. You will replace what you had there with the new analysis and findings.

Click here for 💡 Website Implementation Options

💡 FOUR WAYS TO BUILD YOUR WEBSITE

You can choose any of these approaches, all deployed via GitHub Pages:

  1. Plain Markdown (docs/index.md)
    • Simplest option, minimal setup
    • Write markdown, GitHub renders it automatically
    • Good for straightforward narratives
  2. Jekyll Themes
    • More polished appearance with minimal effort
    • Choose from GitHub’s supported themes
    • Add a _config.yml file to customise
  3. Quarto
    • You should feel comfortable following online tutorials by now. You will need to spend some time on the Quarto documentation to get a feel for how it works for websites and how to publish the to GitHub Pages.
  4. AI-Generated HTML
    • Use Claude or similar to generate custom HTML/CSS
    • Full creative control over design
    • Requires more iteration but can produce nice-looking results

All options are equally valid. Choose based on your team’s strengths and time constraints.

👥 Development Process

Your team has flexibility in how you coordinate work. The pitch presentations revealed that most teams prefer sequential workflows, and that’s perfectly fine.

Choose Your Workflow:

  • Sequential work: One person works at a time, passing the baton. Simpler coordination, fewer merge conflicts.
  • Parallel Git workflows: Multiple people working simultaneously with branches. More complex but potentially faster.

Both approaches are valid. What matters is that your code feels like a cohesive group effort, not disconnected individual pieces.

Play to Your Strengths

Team members can specialise based on their interests and skills:

  • Some might focus on data collection and database design
  • Others might lead the website development and visualisation
  • Someone might coordinate documentation and ensure consistency

Flexible workload distribution is encouraged. Not everyone needs to contribute equally to every component, but everyone must contribute meaningfully overall.

Click here for 💡 Coordination Tips

💡 TIPS FOR TEAM COORDINATION

  • Communicate regularly: Brief weekly check-ins work better than long meetings
  • Document decisions: Keep notes on why you chose certain approaches
  • Commit frequently: Small, frequent commits are easier to track than large infrequent ones
  • Handle conflicts early: If someone’s struggling, address it as a team before escalating

If one team member is not pulling their weight, try to solve it amicably first. They may be overwhelmed by other commitments. If nothing works, contact Jon.

✔️ Marking Criteria

Your project will be assessed across three dimensions. As you now know, we must employ a high standard to grading and we need to ensure that the overall grade you receive aligns with LSE expectations around grading. You can only expect to reach beyond 70 marks (consistently ‘Very Good’) if you have covered all the basics taught in the course perfectly and you still surprised us positively by doing interesting and insightful analysis rather than just doing more stuff.

📥 Technical Implementation (0-35 marks)

Data collection, database design, processing quality, and code organisation

Marks Level Description
<14 Poor Code logic doesn’t work or has critical failures. Database schema broken, API authentication failed, security issues (hardcoded credentials), files missing, or code has so many errors it can’t run.
14-16 Weak Code runs but has multiple serious problems. Poor database design, excessive use of for loops when vectorisation would work, API credentials not secured, disorganised files, or messy code that’s hard to follow.
17-19 Fair Workflow works but has notable problems. Database functions but with concerning design issues, data quality problems not addressed, or disconnection between code sophistication and understanding.
20-24 Good Competent technical work. Database properly designed with appropriate relationships. Reasonable use of vectorised operations, proper file organisation, credentials managed securely. Code shows understanding of course techniques.
25-29 Very Good! Clean technical execution with professional practices. Well-designed database schema, efficient data processing, vectorised operations used effectively. Code sophistication matches written explanations. Team’s code quality and repository organisation is consistent throughout all notebooks and scripts.
30+ 🏆 WOW Exceptional technical implementation. Professional-grade database design, creative and efficient pandas transformations, exemplary code organisation. Nothing is over-engineered but serves a clear purpose. Feels like work from an experienced team.
🌐 Website & Communication (0-40 marks)

Narrative quality, visualisation effectiveness, insight communication

Marks Level Description
<16 Poor Website broken or insights missing. Site doesn’t render, visualisations fail to load, no coherent narrative, or fundamental misunderstanding of the data.
16-19 Weak Basic website with weak insights. Site works but looks unprofessional, visualisations don’t support claims, titles just describe rather than state findings, or interpretation is very shallow.
20-23 Fair Functional website with adequate visualisations. Site presents findings but narrative flow is weak, visualisations are acceptable but not compelling, or limited coherence between sections.
24-27 Good Professional presentation with clear insights. Website looks polished, visualisations support the narrative, findings are clearly communicated, good use of seaborn styling.
28-31 Very Good! Compelling narrative with publication-quality presentation. Strong storytelling throughout, visualisations are insightful and well-designed, appropriate statistical reasoning, conclusions well-supported by evidence.
32+ 🏆 WOW Exceptional storytelling with innovative presentation. Website would impress a professional audience, creative visualisation approaches, sophisticated narrative that engages readers, outstanding attention to detail.
👥 Team Coordination (0-25 marks)

Collaboration evidence, workload distribution, project management

Marks Level Description
<10 Poor No coordination evidence. Work appears disconnected, no evidence of collaboration, or team members clearly didn’t communicate.
10-12 Weak Minimal coordination. Some evidence of collaboration but workflow issues, severely unbalanced contributions, or code styles clash significantly.
13-15 Fair Basic coordination. Team worked together but coordination was reactive rather than planned. Acceptable workload distribution.
16-18 Good Clear approach to collaboration. Chosen workflow (sequential or parallel) was executed reasonably well. Workload distribution is sensible given team members’ strengths.
19-21 Very Good! Sophisticated coordination. Strategic specialisation across team members, good documentation of team decisions, code feels unified rather than patched together.
22+ 🏆 WOW Exceptional project management. Evidence of thoughtful planning, innovative collaboration approaches, seamless integration of contributions. Could serve as an example of good team practice.

👤 Individual Contributions

Each team member must also submit a personal reflection demonstrating their contribution to the project. For detailed requirements and marking criteria, see the Individual Contribution Guidelines.

📮 Need Help?

  • Post questions in the #help Slack channel
  • Book office hours via StudentHub for deeper conversations
  • We will offer drop-in sessions in January/February 2026 to help with any issues

You’re building on everything you’ve learned throughout the Autumn Term. Start with a clear plan, communicate with your team, and give yourself enough time to polish both the technical implementation and the website presentation. Good luck!