πŸ“ Problem Set 1: Data Infrastructure Engineering (20%)

2024/25 Winter Term

Author
Published

27 February 2025

πŸ₯… Learning Goals
By the end of this assignment, you will: i) Design and implement a complete data collection solution, ii) Apply API development or web scraping techniques at scale, iii) Transform complex data into well-structured formats, iv) Practice professional documentation standards

πŸ“€ Submission Details

πŸ“… Due Date: 7 March 2025, 8pm UK time

πŸ“€ Submission Method: Push your work to your allocated GitHub repository.

This assignment offers two options, allowing you to focus on either API development or advanced web scraping. Choose the option that best aligns with your interests and career goals.

πŸ’‘ Tip: Start early and make regular commits. This helps track your progress and ensures you have a working solution by the deadline.

⚠️ Note: Late submissions will receive penalties according to LSE’s policy.

Overview

You will choose between:

  1. API Development: Build a comprehensive API for the Transition Pathway Initiative’s Corporate Assessment data
  2. Advanced Web Scraping: Create a complete data collection solution for the Climate Action Tracker website

πŸ“š Preparation

  1. Click on this GitHub Classroom link 1 to create your designated repository.

  2. Choose one of the two options below and follow its specific requirements.

Option A: Corporate Assessment API

Build upon your experience with the ASCOR API (πŸ“ W02-W03 Practice Exercise), design an API for the Transition Pathway Initiative’s Corporate Assessment data.

What we’re looking for

Requirement Details
1. Data Structure Design - Design a clear and efficient data structure
- Document your design choices
- Data structure should be future-proof (if we decide to add different endpoints in the future)
2. API Implementation - Create clear endpoint documentation
- Implement robust data validation (πŸ†)
- Include error handling
- Add unit tests (πŸ†)
3. Research and Documentation - Demonstrate understanding of the data
- Document use cases and potential users of the data you’re serving (πŸ†)
- Consider future extensions (πŸ†)

The items marked with a πŸ† are optional and will put you on the path to earn >70+ marks (distinction) if implemented well.

Option B: Climate Action Tracker Scraper

Build upon your experience with web scraping (πŸ“ W04-W05 Practice Exercise), create a comprehensive data collection solution for the Climate Action Tracker website.

What we’re looking for

Requirement Details
1. Data Collection - Scrape all available data from country pages (including text)
- Use your discretion to determine what is relevant and what is not
- Handle both structured (data that can be put into a table) and unstructured content (raw text, images, etc.) appropriately
- Download and organise related media (πŸ†)
2. Data Organisation - Design a logical folder structure
- Implement proper file naming
- Create a data dictionary document (πŸ†)
3. Research and Documentation - Demonstrate understanding of the data
- Document your scraping strategy
- Consider and document the ethical implications of your scraping (πŸ†)
- Handle rate limiting professionally (πŸ†)

The items marked with a πŸ† are optional and will put you on the path to earn >70+ marks (distinction) if implemented well.

πŸ“‹ Common Requirements

Requirement Details
1. Repository Structure - Clear folder organization of the repository
- Well-organised README (informative yet concise, not overly verbose)
- Complete documentation
- Follow standard Python project layout for the type of project you are building
2. Code Quality - Clean, readable code
- Proper error handling
- Efficient implementations
- Follow PEP 8 style guide (πŸ†)
3. Testing - Basic error case handling
- Performance considerations
- Comprehensive unit tests (πŸ†)
- Integration tests (πŸ†)

The items marked with a πŸ† are optional and will put you on the path to earn >70+ marks (distinction) if implemented well.

βœ”οΈ Marking Guide

In line with the unwritten but widely-used UK marking conventions, grades must be awarded as follows:

  • 40-49: Basic implementation with significant room for improvement (typically missing core requirements)
  • 50-59: Working implementation but one that meets only the very basic requirements (it looks incomplete)
  • 60-69: Good implementation demonstrating solid understanding with small caveats and minor improvements possible
  • 70+: Excellent implementation going beyond expectations, showing creativity and depth of understanding without being overly verbose or over-engineered

Note from Jon: I find this artificial β€˜cap’ at 70+ marks silly and unnecessary and it clashes with what I understand to be the pedagogical purposes of an undergraduate course that is all about demonstrating hands-on experience. If I can show that your work is of a high standard and clearly demonstrates that you are truly and meaningfullyengaged with the material beyond a shallow level, I’ll be happy to award disctions.

Core Requirements (0-70 marks)

Component Details Marks
Technical Implementation - Working solution
- Proper error handling
- Clean, documented code
30
Data Management - Appropriate data structures
- Efficient processing
- Logical organisation
25
Documentation - Clear README
- Usage examples
- Implementation details
15

Path to Distinction (70+ marks)

To achieve a distinction, submissions must demonstrate excellence in following the most challenging parts of the course materials, or in going beyond the course materials to apply the techniques in new and creative ways in a meaningful way (just doing more stuff doesn’t qualify):

Enhancement Examples Additional Marks
Technical Excellence - Robust data validation
- Comprehensive test suite
- Professional rate limiting
+10
Architecture & Performance - Future-proof design
- Efficient processing
- Optimised data structures
+8
Research & Documentation - Data dictionary
- Use case analysis
- Ethical considerations
+7
Innovation - Novel features
- Creative solutions
- Meaningful extensions
+5

πŸ‘‚ Show you can act on feedback

When you get feedback on your work, I’ll give you a list of things that you can do to improve your work and get at most 15 extra marks. If you implement those within the new deadline, I’ll award the extra marks. The new deadline will be given in the feedback message.

For example, if you get 60 marks and I give you a list of 5 things that you can do to improve your work, your grade could go up to at most 75.

Feedback

You will receive:

  • Detailed feedback on your implementation
  • Suggestions for improvement
  • Justification for marks awarded and specific suggestions for improvements that could earn you up to 15 additional marks

πŸ’‘ Tip: Start early and make regular commits. This helps track your progress and ensures you have a working solution by the deadline.

Footnotes

  1. Visit the Moodle version of this page to get the link. The link is private and only available for formally enrolled students.β†©οΈŽ