LSE DS202W (2025/26) - Data Science for Social Scientists
2025/26 Winter Term
- The syllabus has been revised to take into account the experiences of students on the previous sessions of the DS202A/DS202W course (needs and difficulties).
📋 NOTE:
Ideally, you should also be taking the pre-sessional courses offered by LSE Digital Skills Lab in the first weeks of the Winter Term:
- Look for the Training and Development System and search for Python Pre-Sessional Workshops
Check this page every week to see more info on how to study for the course.
Part 01
The first half of the course focuses on the fundamentals of machine learning algorithms, with an emphasis on supervised learning.
🗓️ Week 01
19 Jan 2026 -
23 Jan 2026
💻 Lab
Python 3/Pandas recap
To fully prepare for this lab, we highly recommend you go through the setup steps outlined in section 1 of the 📋 Getting Ready page.
👩🏻🏫 Lecture
Introduction, Course Logistics & Python 3 programming
📖 Revise
Click to see if you’re caught up
- Ensure you have Python installed on your computer
- Ensure you have an IDE (VSCode) installed on your computer.
- Install the
pandaslibrary - Skim the textbook references mentioned in the slides to find out more about the topics covered in the lecture.
🛟 Support
Click here to see how to get help this week
We love hearing from you! Truly! Don’t hesitate to contact us for help.
In this first week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 19 January 2025 from 16.30-18.30. Also, check for availability of office hours of some of your class teachers.
📧 E-mail: Not sure if this course is for you? Or you have a valid reason to request a change of class? For these and other administrative queries, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 02
26 Jan 2026 -
30 Jan 2026
💻 Lab
Practice data manipulation with pandas
👩🏻🏫 Lecture
Supervised Learning: Introduction to Regression Algorithms
- What is Supervised Learning? What is Regression?
- Algorithm: Linear Regression (simple and multiple)
🛟 Support
Click here to see how to get help this week
We’re steadily adding to your data scientist knowledge toolbox. If things start to feel confusing in any way, don’t hesitate to contact us for help!
In this second week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 26 Jan 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as class change, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 03
02 Feb 2026 -
06 Feb 2026
💻 Lab
Linear regression (simple and multiple), a scikit-learn tutorial
👩🏻🏫 Lecture
Supervised Learning: Fundamentals of Classification
- What is classification?
- Classification vs regression
- Algorithm: Logistic Regression
- From binary to multi-class classification
- K-nearest neighbours
📣 Assignment Reveal
To help you familiarise yourself with the style of the summative assignments, we will announce a formative (practice) assignment this week.
- This assignment will be about basic Python and
pandasdata manipulations as well as regression. The precise requirements will be announced in the lecture. - You will submit your assignment via GitHub Classroom.
🆘 Drop-in session
A drop-in session will be organized this week to help you understand the nuts and bolts of GitHub and Quarto Markdown. Exact date and time to announced soon (check Slack for information)
🛟 Support
Click here to see how to get help this week
We’re starting to pick up speed here. If things are confusing in any way, don’t hesitate to contact us for help!
In this third week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 02 Feb 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 04
9 Feb 2026 -
13 Feb 2026
💻 Lab
How to solve a classification problem: logistic regression and k-nearest neighbours
👩🏻🏫 Lecture
Supervised Learning: Resampling methods
- How to evaluate a model?
- What is overfitting?
- What is resampling?
- Method: The Bootstrap
- Method: Train-Test Split
- Method: Cross-Validation
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming formative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this fourth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 09 Feb 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 05
16 Feb 2026 -
20 Feb 2026
💻 Lab
Resampling, model evaluation and an introduction to tree-based models
👩🏻🏫 Lecture
Supervised Learning: Non-linear algorithms and ensemble methods
- What is non-linearity?
- Why can’t linear models capture non-linearity?
- Algorithm: Support Vector Machines
- Algorithm: Decision Trees
- Algorithm: Random Forests
- Algorithms: Gradient Boosted Trees
⌛ Deadline
Your first formative will be due at the beginning of the week!
📚 Homework
Problem set: hyperparameter tuning, resampling, model evaluation and comparison
- This should help you practice (and revise!) the concepts related to supervised learning you’ve learnt so far. Consider that a good preparation for your upcoming summative.
- Solutions will be provided right after Reading Week.
📚 Tutorial
Tutorial on scikit-learn pipelines
Spend some time working on this tutorial. This feature will become useful from now onwards (you can use it in your homework and upcoming summatives)!
📣 Assignment Reveal
Your Summative 01, worth 30% of your final grade will be announced in the lecture of this week.
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this fifth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 16 Feb 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 06
23 Feb 2026 -
27 Feb 2026
📚 Review and assignment time
Reading Week
There are no classes or lectures this week.
Use this time to:
- review what you learnt so far
- work on the homework you were given in week 5
- start working on the first summative released in week 5 and due in week 8
🛟 Support
Click here to see how to get help this week
TBCPart 02
In the second half of the course, the focus shifts to unsupervised learning.
🗓️ Week 07
02 Mar 2026 -
06 Mar 2026
💻 Lab
Dimensionality reduction: a tutorial
👩🏻🏫 Lecture
Unsupervised Learning: Introduction and Dimensionality reduction
- What is unsupervised learning? How does it differ from supervised learning?
- What is dimensionality reduction, and why is it useful?
- Algorithm: PCA
- Algorithm: MCA
- Algorithm: FAMD
- Algorithm: UMAP
- Algorithm: autoencoders
🆘 Drop-in session
Drop-in sessions will be organized this week to help you with the first summative. Exact date and time to announced soon (check Slack for information)
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this seventh week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 02 Mar 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 08
09 Mar 2026 -
13 Mar 2026
💻 Lab
Unsupervised Learning: Obtaining Insights via Clustering
👩🏻🏫 Lecture
Unsupervised Learning: Clustering
- What is clustering?
- Algorithm: k-means
- Variants of k-means
- Algorithm: DBSCAN
- What is the distinction between clustering (e.g k-means) and dimensionality reduction (e.g PCA)?
⌛ Deadline
Your Summative 01 will be due this week. The topic of the summative will be announced in the lecture of Week 05.
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this eighth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 09 Mar 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 09
16 Mar 2026 -
20 Mar 2026
💻 Lab
Anomaly detection – A tutorial
👩🏻🏫 Lecture
Unsupervised Learning: Anomaly detection
- Unsupervised learning goes beyond clustering: Outliers/anomalies can be important too! Examples of anomaly detection use cases
- Algorithm: Anomaly detection through clustering (e.g., DBSCAN)
- Algorithm: Tree-based anomaly detection with isolation forests
- Algorithm: Anomaly detection through density estimation (Local outlier factor (LOF))
- Algorithm: One-class SVM
- Algorithm: Autoencoders
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this nineth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 16 Mar 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
Part 03
Finally, you will be introduced to the basics of text mining, and then we will look at some applications of the algorithms we’ve learned so far.
🗓️ Week 10
23 Mar 2026 -
27 Mar 2026
💻 Lab
A tutorial of text mining with Python
👩🏻🏫 Lecture
Applications: Text as Data & Topic Modelling
📣 Assignment Reveal
In this week’s lecture, we will announce your Summative 02, worth 30% of your final grade.
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this tenth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignments) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 23 Mar 2026 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 11
30 Mar 2026 -
03 Apr 2025
💻 Lab
Decision-making time: a case study that ties everything you’ve learnt together!
👩🏻🏫 Lecture
Applications: Predictive Modelling on Tabular Data, a walkthrough
⌛ Deadline
Your Summative 02 will be due sometime in spring term. The exact deadline will be announced in the lecture of Week 09.
🆘 Drop-in sessions
Drop-in sessions will be organized this week to help you with Summative 02 (due in Spring Term). Exact dates and times to be announced soon (check Slack for information)
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this last week of term (time flies!), the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#helpchannel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Monday, 30 Mar 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🆘 Drop-in sessions: We will host drop-in sessions on Week 11 to help support you with your Summative 02 (due in Spring Term).
Part 04
Finally, it’s time to put everything you’ve learnt in practice in the final group project that spans two weeks in Winter term.
🗓️ Group project
04 May 2026 -
18 May 2026
🚀 Launch (04 May 2026)
- “Speed-dating” with datasets : you’re presented with datasets (with associated research questions) and asked to rank them by order of preference by 4pm
- Groups (no more than 4/5 students per group) will be assigned by the end of the day. So stay tuned!
🔎 First check-in session (08 May 2026)
- Quick drop-in session to check your research plans are realistic and you’re not going off-track!
🔎 Second check-in session (13 May 2025)
- Drop-in session to discuss the analysis you’ve conducted so far and check you’re still on track
⌛ Deadline (18 May 2025)
Your final project report will be due on May 18 at 5pm.
