LSE DS202W (2025) - Data Science for Social Scientists

2025/25 Winter Term

Author
📣 NOTE (updated 17 September 2025):

The syllabus has been revised to take into account the experiences of students on the 2023/2025 sessions of the DS202W/DS202W as well as 2025/2025 session of the DS202W course (needs and difficulties).

📋 NOTE:

Ideally, you should also be taking the pre-sessional courses offered by LSE Digital Skills Lab in the first weeks of the Winter Term:

Check this page every week to see more info on how to study for the course.

Part 01

The first half of the course focuses on the fundamentals of machine learning algorithms, with an emphasis on supervised learning.


🗓️ Week 01
20 Jan 2025 -
24 Jan 2025

💻 Lab

Python 3/Pandas recap

To fully prepare for this lab, we highly recommend you go through the setup steps outlined in section 1 of the 📋 Getting Ready page.

👩🏻‍🏫 Lecture

Introduction, Course Logistics & Python 3 programming

📖 Revise

Click to see if you’re caught up
  • Ensure you have Python installed on your computer
  • Ensure you have an IDE (VSCode) installed on your computer.
  • Install the pandas library
  • Skim the textbook references mentioned in the slides to find out more about the topics covered in the lecture.

🛟 Support

Click here to see how to get help this week

We love hearing from you! Truly! Don’t hesitate to contact us for help.

In this first week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 22 January 2025 from 12.30-2.30 pm. Also, check for availability of office hours of some of your class teachers. Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 22 January 2025 from 10.00-13.30am.

  • 📧 E-mail: Not sure if this course is for you? Or you have a valid reason to request a change of class? For these and other administrative queries, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 02
27 Jan 2025 -
31 Jan 2025

💻 Lab

Practice data manipulation with pandas

👩🏻‍🏫 Lecture

Supervised Learning: Introduction to Regression Algorithms

  • What is Supervised Learning? What is Regression?
  • Algorithm: Linear Regression (simple and multiple)

🛟 Support

Click here to see how to get help this week

We’re steadily adding to your data scientist knowledge toolbox. If things start to feel confusing in any way, don’t hesitate to contact us for help!

In this second week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 29 Jan 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 29 Jan 2025 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as class change, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 03
03 Feb 2025 -
07 Feb 2025

💻 Lab

Linear regression (simple and multiple), a scikit-learn tutorial

👩🏻‍🏫 Lecture

Supervised Learning: Fundamentals of Classification

  • What is classification?
  • Classification vs regression
  • Algorithm: Logistic Regression
  • From binary to multi-class classification
  • K-nearest neighbours

📚 Homework (W04 lab prep)

Tutorial on scikit-learn pipelines

Spend some time working on this homework. This feature will become useful from now onwards!

📣 Assignment Reveal

To help you familiarise yourself with the style of the summative assignments, we will announce a formative (practice) assignment this week.

  • This assignment will be about basic Python and pandas data manipulations as well as regression. The precise requirements will be announced in the lecture.
  • You will submit your assignment via GitHub Classroom.

🛟 Support

Click here to see how to get help this week

We’re starting to pick up speed here. If things are confusing in any way, don’t hesitate to contact us for help!

In this third week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 05 Feb 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 05 Feb 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 04
10 Feb 2025 -
14 Feb 2025

💻 Lab

How to solve a classification problem: logistic regression and k-nearest neighbours

👩🏻‍🏫 Lecture

Supervised Learning: Resampling methods

  • How to evaluate a model?
  • What is overfitting?
  • What is resampling?
  • Method: The Bootstrap
  • Method: Train-Test Split
  • Method: Cross-Validation

⌛ Deadline

Your first formative will be due a day before the lecture!

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming formative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this fourth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 12 Feb 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 12 Feb 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 05
17 Feb 2025 -
21 Feb 2025

💻 Lab

Resampling, model evaluation and an introduction to tree-based models

👩🏻‍🏫 Lecture

Supervised Learning: Non-linear algorithms and ensemble methods

  • What is non-linearity?
  • Why can’t linear models capture non-linearity?
  • Algorithm: Support Vector Machines
  • Algorithm: Decision Trees
  • Algorithm: Random Forests

📣 Assignment Reveal

To help you practice further for summative assignments, we will announce a second formative (practice) assignment this week.

  • This assignment will be a problem set focusing on hyperparameter tuning, resampling, model evaluation and comparison. The precise requirements will be announced in the lecture.
  • This assignment should help you practice (and revise!) the concepts related to supervised learning you’ve learnt so far. Consider that a good preparation for your upcoming summative.
  • It should also help you with practicing writing justifications and explanations for your modeling choices and model results (something which is extremely important in this course!)
  • You will submit your assignment via GitHub Classroom.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this fifth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 19 Feb 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 19 Feb 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 06
24 Feb 2025 -
28 Feb 2025

📚 Review and formative time

Reading Week

There are no classes or lectures this week.

Use this time to:

  • review what you learnt so far
  • work on the formative you were given in week 5

🛟 Support

Click here to see how to get help this week TBC

Part 02


In the second half of the course, the focus shifts to unsupervised learning.



🗓️ Week 07
03 Mar 2025 -
07 Mar 2025

💻 Lab

Dimensionality reduction: a tutorial

👩🏻‍🏫 Lecture

Unsupervised Learning: Introduction and Dimensionality reduction

  • What is unsupervised learning? How does it differ from supervised learning?
  • What is dimensionality reduction, and why is it useful?
  • Algorithm: PCA
  • Algorithm: UMAP

📣 Assignment Reveal

Your Summative 01, worth 30% of your final grade will be announced in the lecture of this week.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this seventh week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 05 Mar 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 05 Mar 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 08
10 Mar 2025 -
14 Mar 2025

💻 Lab

Unsupervised Learning: Obtaining Insights via Clustering

👩🏻‍🏫 Lecture

Unsupervised Learning: Clustering

  • What is clustering?
  • Algorithm: k-means
  • Variants of k-means
  • Algorithm: DBSCAN
  • What is the distinction between clustering (e.g k-means) and dimensionality reduction (e.g PCA)?

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this eighth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 12 Mar 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 12 Mar 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

  • 🆘 Drop-in sessions: We will host drop-in sessions on Week 08 to help support you with your Summative 01 (due in Week 9).

🗓️ Week 09
17 Mar 2025 -
21 Mar 2025

💻 Lab

Anomaly detection – A tutorial

👩🏻‍🏫 Lecture

Unsupervised Learning: Anomaly detection

  • Unsupervised learning goes beyond clustering: Outliers/anomalies can be important too! Examples of anomaly detection use cases
  • Algorithm: Anomaly detection through clustering (e.g., DBSCAN)
  • Algorithm: Tree-based anomaly detection with isolation forests
  • Algorithm: Anomaly detection through density estimation (Local outlier factor (LOF))

📣 Assignment Reveal

In this week’s lecture, we will announce your Summative 02, worth 30% of your final grade.

⌛ Deadline

Your Summative 01 will be due the day before the lecture. The topic of the summative will be announced in the lecture of Week 07.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this nineth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 19 Mar 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 19 Mar 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

Part 03


Finally, you will be introduced to the basics of text mining, and then we will look at some applications of the algorithms we’ve learned so far.



🗓️ Week 10
24 Mar 2025 -
28 Mar 2025

💻 Lab

A tutorial of text mining with Python

👩🏻‍🏫 Lecture

Applications: Text as Data & Topic Modelling

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this tenth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignments) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 26 Mar 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 26 Mar 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 11
31 Mar 2025 -
04 Apr 2025

💻 Lab

Decision-making time: a case study that ties everything you’ve learnt together!

👩🏻‍🏫 Lecture

Applications: Predictive Modelling on Tabular Data, a walkthrough

⌛ Deadline

Your Summative 02 will be due sometime in spring term. The exact deadline will be announced in the lecture of Week 09.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this last week of term (time flies!), the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 02 Apr 2025 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Wednesday, 02 Apr 2025 from 10.00-11.30am.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

  • 🆘 Drop-in sessions: We will host drop-in sessions on Week 11 to help support you with your Summative 02 (due in Spring Term).

Part 04


Finally, it’s time to put everything you’ve learnt in practice in the final group project that spans two weeks in Winter term.



🗓️ Group project
17 Apr 2025 -
01 May 2025

🚀 Launch (17 Apr 2025)

  • “Speed-dating” with datasets : you’re presented with datasets (with associated research questions) and asked to rank them by order of preference by 4pm
  • Groups (no more than 4 students per group) will be assigned by the end of the day. So stay tuned!

🔎 First check-in session (21 Apr 2025)

  • Quick drop-in session to check your research plans are realistic and you’re not going off-track!

🔎 Second check-in session (25 Apr 2025)

  • Drop-in session to discuss the analysis you’ve conducted so far and check you’re still on track

⌛ Deadline (01 May 2025)

Your final project report will be due on May 01 at 5pm.