ℹ️ Course Information
All you need to know about DS205 (2024/25)
Welcome to LSE DS205 - Advanced Data Manipulation, a LSE Data Science Institute course. This is where you’ll learn to master advanced data engineering techniques and tackle real-world challenges in collaboration with the Transition Pathway Initiative Centre.
What is DS205 about?
DS205 is designed to advance your data manipulation and engineering skills to a professional-grade level. The course emphasises automation, efficiency, and scalability in data workflows, while maintaining a focus on ethical and practical implications in real-world applications.
You will engage in activities like:
- Building APIs for effective data sharing.
- Developing web scrapers to gather data when APIs are unavailable.
- Cleaning and structuring complex datasets.
- Applying modern NLP tools and data processing pipelines.
🥅 Intended Learning Outcomes
By the end of this course, you should be able to:
- Use
pandas
to automate and optimise data cleaning and data processing workflows. - Build APIs using FastAPI for structured data retrieval.
- Develop scalable web scraping workflows with the Scrapy and Selenium packages.
- Profficiently use GitHub workflows for professional collaboration and version control.
- Apply NLP techniques for retrieval of information from text data.
- Apply appropriate pre-trained deep learning models from the HuggingFace library to unstructured data.
- Collaborate effectively on shared codebases and contribute to projects with a real-world impact.
👥 Our Team
Name: | Dr Jon Cardoso-Silva |
Links: | LSE, GitHub, LinkedIn, 📧 |
Role: |
Assistant Professor (Education) LSE Data Science Institute |
Office Hours: | 🗓️ Thursdays, 11am-1pm (StudentHub) |
Current Focus: | Leading DS105/DS205 development and researching GenAI in education |
COURSE LEADER | LECTURER
Alexander Soldatkin
DPhil Candidate
Oxford School of Global and Area Studies
📧
CLASS TEACHER
Dr Barry Ledeatte
AI Learning Consultant
Also teaches DS105W
📧
TEACHING SUPPORT
Sara Luxmoore
Research Officer
LSE Data Science Institute and LSE Cities
📧
TEACHING SUPPORT
Terry Zhou
3rd-Year BSc in Politics and Data Science
Undergraduate Research Assistant at DSI 1
(tz1211?)
CODE MAINTAINER
Kevin Kittoe
Teaching & Assessment Administrator
📧
Handles course access, submissions, extensions and admin queries.
ADMIN
Our Industry Partners
The Transition Pathway Initiative Centre evaluates companies’ readiness for transition to a low-carbon economy. Their work involves extensive analysis of messy, unstructured data.
👉🏻 Everything you produce in this course has the potential to help TPI automate their data processing workflows.
Key Collaborators:
Valentin Jahn
Deputy Director Research & Operations
Sylvan Lutz
Policy Officer – ASCOR Analyst
📟 Communication Channels
Throughout the Winter Term, beyond regular class sessions, you can contact teaching staff via Slack, office hours, or dedicated support sessions:
Slack: Our primary hub for daily course discussions, resource sharing, and quick questions in
#help
. We prioritise questions posted to public channels over direct messages.🆘 Weekly Support Sessions: Every Wednesday, 12:00 pm - 2:00 pm in person at the DSI Visualisation Studio (COL.1.06). Led by Sara Luxmoore. No booking required—just drop in for help with exercises or technical issues.
🧑🏻💼 Office Hours: Book 15-minute slots via StudentHub:
- Jon: Thursdays, 11:00 am - 1:00 pm
- Alex: Wednesdays, 3:00 pm - 5:00 pm
- Barry: Fridays, 2:00 pm - 4:00 pm
📧 Email: For formal requests (extensions, class changes), contact (managed by Kevin).
Contact Hours
Here’s our weekly schedule of support and teaching activities:
Day | Activity | Time | Staff | Type |
---|---|---|---|---|
Monday | Lecture | 10:00 - 12:00 | Dr Jon Cardoso-Silva | 🗣️ In-person |
Slack Support | 13:30 - 15:00 | Dr Jon Cardoso-Silva | 💬 Online | |
Tuesday | Slack Support | 11:30 - 12:30 | Dr Jon Cardoso-Silva | 💬 Online |
Labs | Afternoon | Alex Soldatkin | 💻 In-person | |
Wednesday | Drop-in Sessions | 12:00 - 14:00 | Sara Luxmoore | 🛟 In-person (COL.1.06) |
Office Hours | 15:00 - 17:00 | Alex Soldatkin | 👥 In-person | |
Thursday | Office Hours | 11:00 - 13:00 | Dr Jon Cardoso-Silva | 👥 In-person |
Friday | Office Hours | 14:00 - 16:00 | Dr Barry Ledeatte | 👥 In-person |
Slack Support | 12:00 - 13:30 | Dr Barry Ledeatte | 💬 Online |
Key to Icons:
- 👥 In-person: Face-to-face interaction in designated office space
- 💬 Online: Support via Slack channels
- 🛟 Drop-in: Flexible support sessions—no booking required
- 🧑🏫 In-person Lecture / 💻 In-person Labs: Formal teaching sessions
⌚️ Class Details
📅 Lecture
- ⏰ Monday: 10:00 - 12:00
- 📍 KSW.1.01
- 👤 Dr Jon Cardoso-Silva
📅 Class Group 1
- ⏰ Tuesday: 15:00 - 16:30
- 📍 OLD.1.20
- 👤 Alex Soldatkin
📅 Class Group 2
- ⏰ Tuesday: 16:30 - 18:00
- 📍 OLD.1.20
- 👤 Alex Soldatkin
Teaching Format: We meet weekly for lectures on Monday mornings, followed by hands-on lab sessions on Tuesday afternoons. These sessions run throughout term, except for Reading Week (Week 6).
✍️ Assessment Structure
“How will I be assessed in this course?”
Your grade in this course consists of two main components, COURSEWORK, worth 60% and GROUP PROJECT, worth 40%. It is those two components that show up in your student record, but in reality, they are made up of several smaller parts:
20% | Individual |
✍️ Problem Set 1: Web Scraping & API Development |
Release: ~Week 04 Due: 5 March 2025, 8pm |
40% | Individual |
✍️ Problem Set 2: RAG System Implementation |
Release: ~Week 06 Due: 26 March 2025, 8pm |
40% | Group Work |
👥 Final Project: TPI Data Pipeline Development |
Details: Spring Term Due: May/June 2025 (TBC) |
Weekly formative exercises in Weeks 01-04 will prepare you for the summative assessments. These include hands-on practice with GitHub workflows, API development, and web scraping techniques.
🤖 The use of Generative AI in this course
By students
In this course, we adopt Position 3: Full authorised use of generative AI, as per the LSE’s positions on Generative AI.
This means you can make unrestricted use of Generative AI (GenAI) tools like ChatGPT, GitHub Copilot, Grammarly AI, etc. for all aspects of the course, including assessments.
We view GenAI as a double-edged sword when it comes to learning. I find that students who know the content of the course well enough make a resourceful use of GenAI and learn a lot from it. However, when students are behind or are not engaging with the material, it is easier to over-rely on GenAI (without realising it) and miss out on the learning experience. This typically becomes evident when we mark the assessments, as the style of the code produced deviates significantly from the coding style used in the course.
By Teachers
We extensively use GenAI to clean up the format of lecture notes and the layout of the course materials. AI chatbots also help us brainstorm the ideas for a lecture. For example, we might feed Claude or ChatGPT with the lecture notes from previous years, along with the learning outcomes of the course and notes of what worked and what didn’t last time and ask it to obtain an improved version of the lecture notes.
This helps us focus on the content and pedagogy, while the AI takes care of the formatting. GenAI serves as a tool to bring existing thoughts to life, not as a lazy way to generate new content.
When marking:
When marking assignments, we usually find GenAI productive for providing more detailed and useful feedback to our students, based on our free-text notes.
Here’s how we typically structure feedback:
- Initial Feedback: We write a ‘brain dump’ of notes while reviewing your work (e.g., running your code, evaluating edge cases). This is in free text format.
- Structuring: After providing Claude or ChatGPT with a list of common mistakes we expect and a structure of template and snippets of solutions and links, we provide our free notes to these tools and they return feedback in the neat Markdown format we expect.
- Final Review: We read and further review the formatted notes to ensure feedback aligns with the course standards and is actionable.
In other words, we use large language models for what they do best: language formatting and structuring.
These tools help save time on structuring feedback while maintaining the personal touch and pedagogical value.
Footnotes
Terry has been working at the DSI for the past two years as a UG research assistant and has proved experienced in web scraping, API development and RAG systems. He will help with code review and to manage integration with TPI.↩︎