🏗️ Final Group Project: Specialised Tracks & Client Interaction

DS205 2024/25 Winter Term

Author

Dr Jon Cardoso-Silva

Published

08 April 2025

Welcome to the DS205 final group project! This capstone experience allows you to apply and deepen the skills acquired throughout the winter term - from data infrastructure engineering (✍️ Problem Set 1) to more advanced NLP (✍️ Problem Set 2) - by tackling a specialised data science challenge within a team.

For this very first iteration of DS205, projects fall into two distinct tracks, each with its own base repository and focus:

  1. API Development: Expanding and refining the public lse-ds205/tpi-apis repository.
  2. RAG Development: Building Retrieval-Augmented Generation (or similar embedding search) systems for specific data sources, starting from the lse-ds205/rag-fact-sheets template.

Each group has been assigned a specific challenge within one of these tracks (e.g., maintaining legacy code, serving images via API, rethinking data architecture, applying RAG to specific climate data sources). While your specific technical goals will differ, all groups will aim to:

DEADLINE: 29 May 2025 8pm UK time.

🥅 Common Learning Goals

  • Tackle Complexity: Address a challenging data science problem integrating multiple course techniques.
  • Develop Focused Solutions: Design, implement, and evaluate a system addressing your specific group brief.
  • Collaborate Effectively: Practise teamwork, project management, and code sharing using Git/GitHub workflows (Issues, PRs).
  • Apply Advanced Techniques: Demonstrate proficiency in relevant methods, potentially extending beyond the core curriculum where appropriate.
  • Evaluate Rigorously: Assess your solution’s performance and critically analyse results and limitations.
  • Communicate Professionally: Document your project thoroughly including code, process, and findings.

🚀 Getting Started: Your Specific Track

  1. Group Allocation & Brief: You should now be in your allocated groups. Familiarise yourselves with your group’s specific project brief which we discussed in Week 11 and documented here.
  2. Repository Setup:
    • API Teams: Create a fork of the lse-ds205/tpi-apis repository within your group’s agreed GitHub space/organisation.
    • RAG Teams: Create a fork of the lse-ds205/rag-fact-sheets repository.
    • Wait for Jon to provide the initial list of milestones and any further setup instructions.
    🎥 Video Guide: Setting Up Your Project Fork

    Watch this short video for a walkthrough on how to create your group’s fork, add collaborators, understand the relationship with the main (“upstream”) repository, and prepare for the final Pull Request submission (for API teams).

  3. ‘Client’ Interaction (All Groups):
    • Sylvan and Val are acting as your ‘clients’ or primary stakeholders for these projects.
    • Proactively engage with them! Schedule brief meetings, send emails outlining your plans, show mock-ups or early results, and seek feedback on whether your approach meets their (simulated or very real) needs.
    • Document these interactions and how they shaped your project (e.g., in meeting notes, GitHub Issues, or your final report). This is especially crucial for Groups 1 & 2 but valuable for all.
  4. Initial Planning:
    • Meet as a group to take a look at your specific brief and plan your approach. Make notes of anything that is too open-ended or vague and schedule a meeting with Sylvan or Jon (Jon is available from early May) to discuss more concrete ideas.
    • Break down the work, assign roles (flexibly), set up communication channels, and agree on Git workflows (e.g., branching, PR reviews within the team).
  5. Leverage Course Skills: Connect your project goals to relevant techniques from the course (APIs, data validation, scraping, embeddings, vector search, evaluation, documentation, CI/CD concepts).

💡 Tip: Your specific brief is your primary guide. Use the ‘client’ interactions to refine your understanding and direction. Documenting your process (including ideas explored and discarded) is vital, especially for the more exploratory RAG projects.

✔️ Assessment and Grading Principles

As before, assessment focuses on well-documented effort, collaborative workflow, and technical expertise. Robust data handling, validation, and clear documentation are paramount.

  • Core Requirements (Path to 70): Successfully addressing the main goals of your specific brief using techniques taught in the course, demonstrating good collaboration, and producing a well-documented, working (or demonstrably well-attempted, for RAG) solution.
  • Distinction (70+): Awarded for meaningfully and positively going beyond the core requirements or taught material. This could involve particularly insightful ‘client’ interaction leading to refined features, exceptionally robust implementation, creative problem-solving, highly polished documentation, or demonstrating a deeper technical understanding. Simply adding more code volume without clear purpose does not qualify.
  • Collaboration is Key: We expect to see evidence of active participation from all team members via GitHub Issues, Pull Requests (within the team fork, and the final one for API teams), and contributions to documentation.

(Precise marking criteria and weighting will follow at the start of Spring Term in May, but will be based on these principles and tailored to the track/project specifics.)

🚚 Deliverables

Deliverable Details
Group Fork Repository Your group’s working fork of the relevant base repository (tpi-apis or rag-fact-sheets), containing all code, configuration, necessary data (or download scripts), and commit history showing collaboration.
Pull Request Submission: A well-formed Pull Request from your group’s fork to the upstream (either lse-ds205/tpi-apis or lse-ds205/rag-fact-sheets repository), clearly describing the features you have introduced in your project.
Technical Report / REPORT.md A document (e.g., REPORT.md in your repo) detailing: Project goals, design decisions (and rationale, including rejected ideas), methodology, ‘client’ interactions and outcomes, system architecture, evaluation results, analysis, limitations, and conclusions. RAG teams should detail their fact-sheet generation process. Try to find the right balance. Document as much as you can without making it extra long.
README.md Comprehensive README in the repository root with: Project overview, clear setup/installation instructions, usage guide (how to run the API/RAG process), description of repository structure, and link to the Technical Report. Try to find the right balance. Document as much as you can without making it extra long.
Fact-Sheets (RAG Teams Only) The output of your RAG pipeline - the generated ‘fact-sheets’ for the target entities (countries, companies, etc.), likely stored in a designated output folder within your repository.

⚠️ Reproducibility & Clarity: Ensure your README and code allow the teaching team to understand, set up, and run/evaluate your project. Document dependencies clearly (requirements.txt).

🤝 Tips for Group Success

  • Communicate Often: Set up regular check-ins (even brief ones). Use shared communication tools effectively.
  • Use Version Control Wisely: Commit frequently, use meaningful commit messages, and leverage branches for developing features or experimenting. Resolve merge conflicts promptly.
  • Document as You Go: Don’t leave documentation until the end. Write READMEs, comments, and report sections incrementally.
  • Play to Strengths, but Share Knowledge: Assign tasks based on interest and expertise, but ensure everyone understands the different components of the project. Pair programming can be helpful.
  • Be Proactive: If you encounter roadblocks, research solutions, discuss them with your team, and seek help from the teaching team sooner rather than later.
  • Embrace the ‘Client’: Treat interactions with Sylvan and Val as real requirements gathering and feedback sessions. Use them to refine your project scope and ensure relevance.
  • Leverage GitHub Issues: Use Issues not just for bugs, but for planning features, discussing design choices, and tracking ‘client’ requests or feedback.
  • Review Each Other’s Work: Use Pull Requests within your group’s fork for code review before merging major changes. This improves quality and knowledge sharing.

❓ Support and Next Steps

  • Look out for the specific milestones and further setup instructions from Jon (early May)
  • Make use of drop-in sessions, office hours, and Slack for questions related to your specific brief or technical challenges. While Jon and Barry will only monitor Slack at the start of the Spring Term, you can use it share expertise and help each other out.

We’re excited to see how you tackle these specialised projects and engage with the ‘clients’! 🚀