๐Ÿ—“๏ธ Week 01
Welcome to the Course + the ASCOR dataset

DS205 โ€“ Advanced Data Manipulation

20 Jan 2025

Welcome to DS205!

Course Lead

Dr Jon Cardoso-Silva ๐Ÿ“ง
@jonjoncardoso
Assistant Professor
LSE Data Science Institute


Current Focus:

  • PhD in Computer Science
  • Experienced in software engineering, data science and data engineering
  • Leading DS105 and DS205 course development
  • Investigating the impact of GenAI impact on higher education ( GENIAL project)

Office Hours:
Thursdays, 11:00-13:00
Book via StudentHub

Recent Recognition:
LSESU Teaching Award for Feedback & Communication (2023)

Teaching Support

Alexander Soldatkin ๐Ÿ“ง
DPhil Candidate
Oxford Global & Area Studies

CLASS TEACHER

Dr Barry Ledeatte ๐Ÿ“ง
AI Learning Consultant
Also teaches DS105W

TEACHING SUPPORT

Administrative Support

Kevin Kittoe
Teaching & Assessment Administrator (DSI)

ADMINISTRATIVE SUPPORT

Contact ๐Ÿ“ง DSI.ug@lse.ac.uk for:

  • Course access issues
  • Assignment submissions
  • Extension requests
  • Administrative queries

Key Information:

  • All extension requests must follow LSEโ€™s extension policy
  • Email response time: 24-48 hours
  • Include โ€˜[DS205]โ€™ in email subject lines

Technical Support

Terry Zhou
@tz1211
Code Maintainer & Research Assistant
3rd-Year BSc in Politics and Data Science

CODE MAINTAINER

Terry has been working with me for the past ~2 years and has practical experience in:

  • Web scraping at scale
  • API development
  • RAG system implementation
  • GitHub workflow management

As code maintainer, heโ€™ll:

  • Review code contributions
  • Guide technical implementation
  • Support integration with TPI systems

Our Industry Partners

Transition Pathway Initiative Centre

The TPI Centre evaluates companiesโ€™ readiness for transition to a low-carbon economy. Their work involves a lot of analysis of data that is often messy and unstructured.

๐Ÿ‘‰๐Ÿป Everything you produce in this course has the potential to help TPI automate their data processing workflows.

Key Collaborators:

Valentin Jahn ๐Ÿ“ง
Deputy Director Research & Operations

Sylvan Lutz ๐Ÿ“ง
Policy Officer โ€“ ASCOR Analyst

What can you expect to learn in DS205?

Why this course exists:

  • To simplify advanced concepts from DS105.
  • Work with real-world partners like the Transition Pathway Initiative.
  • Prepare you for professional-level software development for data manipulation.

๐Ÿฅ… Intended Learning Outcomes

By the end of this course, you should be able to:

  • Use pandas to automate and optimise data cleaning and data processing workflows.
  • Build APIs using FastAPI for structured data retrieval.
  • Develop scalable web scraping workflows with the Scrapy and Selenium packages.
  • Profficiently use GitHub workflows for professional collaboration and version control.
  • Apply NLP techniques for retrieval of information from text data.
  • Apply appropriate pre-trained deep learning models from the HuggingFace library to unstructured data.
  • Collaborate effectively on shared codebases and contribute to projects with a real-world impact.

โœ๏ธ Assessment Structure

20% Individual โœ๏ธ Problem Set 1:
Web Scraping & API Development
Release: ~Week 04
Due: 5 March 2025, 8pm
40% Individual โœ๏ธ Problem Set 2:
RAG System Implementation
Release: ~Week 06
Due: 26 March 2025, 8pm
40% Group Work ๐Ÿ‘ฅ Final Project:
TPI Data Pipeline Development
Details: Spring Term
Due: May/June 2025 (TBC)

Weekly formative exercises in Weeks 01-04 will prepare you for the summative assessments. These include hands-on practice with GitHub workflows, API development, and web scraping techniques.

๐Ÿ“‘ Key Information

๐Ÿ“Ÿ Communication

  • Slack is our main point of contact. The invitation link will be available on Moodle.
  • ๐Ÿ“ง Email: Reserved for formal requests (extensions, appeals)
  • ๐Ÿ‘ฅ Office Hours: Book via StudentHub
  • ๐Ÿ†˜ Drop-in Support: COL.1.06 (DSI Studio) - See calendar

Donโ€™t like your laptop for coding?

We have a dedicated cloud environment on Nuvolos
Visit the Nuvolos - First Time Access to learn how to get access to the DS205 environment.



๐Ÿ““ Check the full syllabus

Read the syllabus for week-by-week information on how we will cover the course content and assessments.

Our partnership with the Transition Pathway Initiative (TPI)

Letโ€™s hear from Valentin and Sylvan

You will TPIโ€™s slides on the course Moodle page later ๐Ÿ“ฉ

Coffee Break โ˜•

After the break:

  • Letโ€™s browse the ASCOR dataset
  • Basic pandas operations (live coding demo)
  • What awaits you in the ๐Ÿ’ป W01 lab and ๐Ÿ“‹ W01 Practice

๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป Live Coding

Watch me as I load the ASCOR dataset and perform some basic operations with pandas. Take notes and ask questions as we go along.

I wonโ€™t provide a step-by-step guide before the live coding session, as you will be replicating these tasks in the class later. A model solution will be available after the Tuesdayโ€™s ๐Ÿ’ป W01 lab.