๐ฆ Group Project (40%)
2025/26 Winter Term
This is your final group assessment for DS105W, worth 40% of your final grade. You will also submit an individual reflection worth 10% (see Individual Reflection below). Building on everything you have learned across both terms, your team will deliver a complete data project from collection to communication.
| โณ | Deadline | Tuesday 26 May 2026 at 8 pm UK time |
| ๐ | Weight | 40% of final grade (group) + 10% (individual reflection) |
| ๐ฅ | Teams | 3-4 members (formed in W09 Lab) |
| ๐ค | Submission | Via your existing GitHub Classroom repository |
๐ Overview
You choose one of two tracks:
- ๐ WFP Track (prescribed): Build a food security data pipeline for the World Food Programmeโs East and Southern Africa regional office.
- ๐ Self-Chosen Track: Design your own data project using APIs you select, store data in SQLite, and present findings on a narrative website.
Both tracks are assessed to the same standard using the same three criteria. Pick the one that suits your teamโs interests and strengths.
Every group must:
- Collect data programmatically using the
requestslibrary - Transform and clean data through documented notebook steps
- Present findings through a website or dashboard hosted on GitHub
- Use a project board with Issues to coordinate team work
- Submit individual reflections in
reflections/<github-username>.md
โ ๏ธ Team size note: Most groups have 3-4 members. A few groups have been allowed 5 members. If your group has 5 people, we will naturally expect a broader scope or deeper analysis to reflect the additional capacity.
๐ WFP Track: Food Security Data Pipeline
The Client and the Problem
The World Food Programme (WFP) East and Southern Africa Regional Bureau needs an interactive dashboard to view, filter, and download food security and displacement data across the region. Your team will build the data pipeline that feeds this dashboard.
The core challenge: food security analysis (from the IPC, the Integrated Food Security Phase Classification) produces data with overlapping reference periods. Each countryโs analysis generates a โcurrentโ period and a โprojectionโ period with different population figures. The dashboard needs to show whichever reference period applies to the current month.
๐ Virginia Leape from WFP will visit the LSE around 20 April to meet with WFP track groups and answer questions about the domain context.
Data Sources
Your pipeline collects from three APIs and one static source:
- IPC API: Food security phase classification data (population analysed, food insecure populations by IPC Phase 1-5, reference periods, analysis dates)
- UNHCR API (Refugee Data Finder / Open Data Portal): Refugee population counts per country
- IOM API: Internally Displaced Person (IDP) population counts per country
- World Bank Open Data: Total country population (indicator: โPopulation, totalโ)
API key note: You need to fill out a form to formally request access to some of these APIs. If you choose this track, do this as soon as possible. If there are delays in getting access, get in touch with Jon to discuss how to proceed.
๐จ DONโT LEAK YOUR API KEYS! Always use a .env and never write your API key anywhere in your repository or notebooks. Even if you delete and commit it again, it will remain on the Git history.
The Deliverable
Your team produces:
- A target CSV combining all four data sources into a single cleaned table (one row per country, columns defined in the data dictionary below)
- A Streamlit dashboard that reads from the CSV and provides an interactive table with filtering, sorting, and CSV export
- This will be discussed on W11 Lecture.
- Pipeline notebooks documenting every collection and transformation step
- This time you choose how to name your notebooks, but they should have a good naming convention and be well organised.
- A companion website (
docs/index.mdor equivalent on GitHub Pages) explaining your pipeline, data sources, and any decisions you made
Data Dictionary
The target CSV must contain these columns (column naming is up to your team, but the content must match):
| Field | Type | Detail |
|---|---|---|
| Country | String | Country name |
| ISO Code | String | ISO country code |
| Date of Analysis | YYYY-MM | When the IPC analysis was conducted |
| Analysis Title | String | e.g. โIPC Zambia Oct 2025โ |
| Reference Period | String | e.g. โNov 2025 - Mar 2026 (Projection)โ |
| Population Analysed | Integer | IPC โpop analysedโ figure |
| Country Population | Integer | World Bank โPopulation, totalโ |
| IPC Phase 1 | Integer | Minimal |
| IPC Phase 2 | Integer | Stressed |
| IPC Phase 3 | Integer | Crisis |
| IPC Phase 4 | Integer | Emergency |
| IPC Phase 5 | Integer | Famine |
| IPC 3+ | Integer | Sum of Phase 3 + 4 + 5 |
| Refugees Covered | Boolean | Whether refugee data is included in the IPC figures (manual field) |
| IDPs Covered | Boolean | Whether IDP data is included in the IPC figures (manual field) |
| IDP Population | Integer | From IOM API |
| Refugee Population | Integer | From UNHCR API |
โ ๏ธ Reference period logic: When building your target CSV, the reference period shown for each country should be the one whose date range contains the current month. For example, if Zambia has a โCurrentโ period covering Apr-Sep and a โProjectionโ period covering Oct-Mar, and today is in March, your pipeline should select the Projection period. This is the trickiest part of the IPC data and a good place to invest debugging time early.
โ ๏ธ โRefugees Coveredโ and โIDPs Coveredโ: These two boolean columns cannot be derived from the API. They reflect editorial decisions made by the WFP regional office about whether displacement figures are already included in the IPC analysis for a given country. Your team should include these columns in the CSV with placeholder values and document that they require manual input.
Repository Structure
Organise your repository so someone unfamiliar with the project can follow the pipeline from raw data to final outputs. A suggested layout:
your-team-repo/
โโโ .env # API keys (not tracked by git)
โโโ dashboard/ # Streamlit app
โโโ data/
โ โโโ raw/ # JSON responses from APIs
โ โโโ processed/ # Target CSV and any intermediate files
โโโ docs/ # Companion website (GitHub Pages)
โโโ notebooks/ # Pipeline notebooks (your naming convention)
โโโ reflections/ # One .md file per team member
โโโ README.md # Project overview & reproduction steps
Suggested Milestones
For this project, we have a suggested milestone plan. You do not need to submit anything at each milestone, but itโs a good idea to set internal deadlines to keep the project on track. The final deadline is Tuesday 26 May 2026 at 8 pm UK time.
| Milestone | Target Date | What to aim for |
|---|---|---|
| M1 | 20 April | IPC and at least one other API collected, raw JSON saved, first notebook complete |
| M2 | 8 May | All four sources collected, transformation notebooks drafted, target CSV taking shape |
| M3 | 15 May | Target CSV finalised, Streamlit dashboard functional, companion website drafted |
| M4 | 26 May | Everything polished, reflections written, final push before 8 pm deadline |
The WFP track involves four separate APIs, a bespoke logic for the reference period, and a Streamlit dashboard. Start early, get raw data saved as JSON before you worry about transformations, and test your reference-period logic on a few countries before scaling to the full region.
If an API is down or an application is delayed, save what you have and move on. A partial pipeline with clear documentation of what worked and what did not is better than a stalled project.
What Happens After Submission
After marking, the best dashboards from WFP track groups may be shared with Virginia and the WFP regional team as examples of what student teams can produce. We will credit your work publicly and share the GitHub repository with them. If you have any concerns about this, please get in touch with Jon.
๐ Note from Jon: I might combine best elements from across multiple projects to create a composite dashboard that I share with WFP. In which case, everyone involved will be credited and linked to their original repositories.
๐ Self-Chosen Track: Your Own Data Project
What You Build
Your team defines its own research question, collects data from APIs, stores it in a SQLite database, and presents findings on a narrative website hosted via GitHub Pages.
The major thing about your website is that you must produce a MAXIMUM of 3 distinct visualisations or table summaries to tell your story. Choose them carefully. Each visual must earn its place.
If you use Closeread or another scrollytelling format, the strict count of 3 visuals donโt apply because the narrative structure allows for more flexible presentation. Do whatever serves the story, as long as it does not sprawl.
Data Source Requirements
The same constraints from the autumn term apply:
- Your primary data must be collected using the
requestslibrary. Pre-made API wrappers (e.g.spotipy,praw) are not allowed. - Simple bulk CSV downloads are not acceptable as your primary source. They can supplement your analysis.
- Complex static datasets (e.g. OpenSanctions, World Values Survey) are allowed with Jonโs permission.
You can reuse APIs from Mini Project 1 and Mini Project 2 (OpenWeather, OpenMeteo, TfL, ONS) as long as your question is different and you use additional endpoints beyond what we required in the past.
Technical Requirements
- SQLite database with at least 2 tables connected by foreign keys
- Analysis notebooks that read from the database (not from raw files)
- Narrative website via GitHub Pages with a maximum of 3 distinct visualisations or table summaries (unless using Closeread or equivalent)
- Proper credential handling via
.envfiles (never hardcoded)
Click here for ๐ก Website Implementation Options
๐ก FOUR WAYS TO BUILD YOUR WEBSITE
You can choose any of these approaches, all deployed via GitHub Pages:
- Plain Markdown (
docs/index.md) - Simplest option, minimal setup - Jekyll Themes - More polished appearance with minimal effort
- Quarto - Follow the Quarto documentation for websites
- AI-Generated HTML - Use Claude or similar to generate custom HTML/CSS for full creative control
Jon will demonstrate AI-assisted website and dashboard creation in the W11 Lecture. All four options are equally valid.
Repository Structure
Choose your own notebook naming convention, but keep things tidy. A reasonable layout:
your-team-repo/
โโโ .env # API keys (not tracked by git)
โโโ data/
โ โโโ raw/ # JSON responses from APIs
โ โโโ processed/ # SQLite database and any intermediate files
โโโ docs/ # Website (GitHub Pages)
โโโ notebooks/ # Your naming convention
โโโ reflections/ # One .md file per team member
โโโ README.md # Research question, reproduction steps, member roles
๐ฃ๏ธ Formative Pitch (W11)
On Monday 30 March (and Tuesday 31 March for some groups), your team will present your project idea to Jon and the teaching team. This is formative only and does not count toward your grade.
What to show us (on GitHub, as a page, presentation, images, or any format you like):
- Your research question or the WFP problem as you understand it
- Your project board with initial Issues and task assignments
- Your planned approach (data sources, pipeline steps, who does what)
- One risk or open question you want feedback on
The W10 Lab is designed to help you prepare for this. Keep it short and focused; we want to give you useful feedback, not watch a polished production.
1:1 Meetings with Jon
From 5 May onward, each self-chosen track group can book a 1:1 meeting with Jon via Calendly (link will be shared on Slack). Use this to get feedback on your approach, troubleshoot data issues, or sense-check your analysis direction.
๐ How We Grade It
Your project is assessed across three criteria. The marking bands below are indicative rather than comprehensive. Given that the two tracks produce different deliverables, we are not prescribing every mark range. Instead, we describe what constitutes a Pass, Good, Really Good, and WOW outcome for each criterion.
๐ง Data Pipeline (40 marks)
Data collection, storage, transformation, and code quality
| Level | Description |
|---|---|
| Pass (40%) | Data is collected and stored. The pipeline runs, but there are gaps: missing error handling for API calls, inconsistent file organisation, or transformations that lose information without acknowledgement. Code works but would be hard for someone else to follow. |
| Good (60%) | Data collection is methodical and documented. Files are well organised, transformations are justified in the notebooks, and the pipeline reads cleanly from start to finish. Credentials are handled securely (.env, not hardcoded). Code uses vectorised pandas operations where appropriate. |
| Really Good (70%) | Clean, efficient pipeline with professional habits throughout. Database schema or CSV structure reflects thoughtful design choices. Transformation steps are well sequenced so downstream notebooks read from processed outputs rather than repeating work. Code quality is consistent across all notebooks and team members. |
| ๐ WOW | Exceptional pipeline design. Creative and efficient data transformations, exemplary code organisation, nothing over-engineered. The repository feels like work from an experienced team. |
๐ Communication (40 marks)
Narrative quality, visualisation design, and clarity of insight
Your website or dashboard tells the story of your analysis. We are looking for economy: a maximum of 3 distinct visualisations or table summaries that each earn their place. Finding a way to convey your insights without sprawling across dozens of plots is a genuine skill.
If you use Closeread or a similarly narrative-driven format, the โ3 distinct visualisationsโ constraint relaxes. In that case, do whatever serves the story well, as long as it is not overly long.
| Level | Description |
|---|---|
| Pass (40%) | Website or dashboard exists and shows results, but the narrative is thin. If on the self-track, visualisations describe data without stating findings and the reader has to work hard to understand what the project discovered. If on the WFP-track, the visuals donโt match to the brief |
| Good (60%) | If on the self-track, clear narrative flow with visualisations that support the text. Plot titles convey insights rather than describing axes. The website reads as a coherent piece of communication, not disconnected notebook outputs. If on the WFP-track, the dashboard meets the brief and the data is presented clearly, but the companion website could explain the pipeline and its decisions more thoroughly. |
| Really Good (70%) | If on the self-track, compelling storytelling with well-designed visualisations. Each visual earns its place. Appropriate hedging of claims, clear acknowledgement of limitations, and a reader who knows nothing about the data could follow the argument. If on the WFP-track, the dashboard matches the brief closely with good design choices, and the companion website clearly documents the pipeline, data sources, and any decisions your team made along the way. |
| ๐ WOW | If on the self-track, exceptional presentation that would impress a professional audience. Creative visual approaches, sophisticated narrative that engages readers, outstanding attention to the balance between depth and economy. If on the WFP-track, the dashboard feels like a polished professional tool that goes beyond the brief in thoughtful ways, and the companion documentation is exemplary. |
๐ฅ Teamwork (20 marks)
Coordination evidence, role distribution, and project management
Your project board and Issues history is the primary evidence for this criterion. We will look at how your team planned work, divided responsibilities, and resolved problems.
| Level | Description |
|---|---|
| Pass (40%) | Some evidence of coordination, but the project board is sparse or abandoned early, or โmade upโ on the last minute to appear historical. Contributions appear unbalanced without explanation. |
| Good (60%) | Project board shows a clear plan that the team followed. Workload distribution makes sense given membersโ strengths. The repository feels like a group effort rather than disconnected pieces. |
| Really Good (70%) | Sophisticated coordination with evidence of strategic specialisation. Good use of Issues and branches to track decisions. Code feels unified throughout. |
| ๐ WOW | Exceptional project management. Evidence of thoughtful planning, clear role specialisation, and seamless integration of contributions. Could serve as an example of good team practice. |
๐ค Individual Reflection (10%)
Each team member submits a personal reflection as a Markdown file in your project repository:
your-team-repo/
โโโ reflections/
โโโ <github-username-1>.md
โโโ <github-username-2>.md
โโโ <github-username-3>.md
The reflection has two components:
Evidence of Contribution (70%)
Show what you did. The evidence should speak for itself. Link to specific commits, Issues, Pull Requests, or branches that demonstrate your contribution. You do not need to write at length (in fact, donโt!). Short, clear pointers to your work are better than long paragraphs describing it.
Include:
- Links to your most significant commits or PRs
- Brief descriptions of what each contribution involved
- Any technical decisions you made and why
Learning Integration (30%)
Write up to two paragraphs about how feedback from earlier assessments (W04 Practice, Mini-Project 1, Mini-Project 2) shaped your approach in this project. Be specific: quote actual feedback, explain what you changed, and point to where the change shows up in your group project work.
To achieve 70/100 or above:
- Provide clear evidence of substantial contributions (commit links, PR references, Issue threads)
- Show genuine learning integration with specific feedback examples and behavioural changes
- Write in your own voice (AI-generated reflections lack the specificity we are looking for)
๐ฎ Need Help?
- Post questions in the
#helpSlack channel - Check the โ Contact Hours page for support times
- Book office hours via StudentHub for deeper conversations
- Use the ๐ค Claude Tutor for technical questions about your pipeline
You are building on everything from both terms. Start with a clear plan, communicate with your team regularly, and give yourself enough time to polish both the pipeline and the presentation. Good luck!
