LSE DS202A - Data Science for Social Scientists
2025/26 Autumn Term
The syllabus has been revised to take into account the experiences of students on the previous sessions of the DS202A/DS202W course (needs and difficulties).
📋 NOTE:
Ideally, you should also be taking the pre-sessional courses offered by LSE Digital Skills Lab in the first weeks of the Autumn Term:
Check this page every week to see more info on how to study for the course.
Part 01
The first half of the course focuses on the fundamentals of machine learning algorithms, with an emphasis on supervised learning.
🗓️ Week 01
29 Sep 2025 -
03 Oct 2025
💻 Lab
R/RStudio + tidyverse recap
To fully prepare for this lab, we highly recommend you go through the setup steps outlined in section 1 of the 📋 Getting Ready page.
👩🏻🏫 Lecture
Introduction, Course Logistics & R programming
📖 Revise
Click to see if you’re caught up
- Ensure you have R installed on your computer
- Ensure you have an IDE (RStudio or VSCode) installed on your computer.
- Install tidyverse
- Revisit base R vs
tidyverse
syntax equivalence - Skim the textbook references mentioned in the slides to find out more about the topics covered in the lecture.
🛟 Support
Click here to see how to get help this week
We love hearing from you! Truly! Don’t hesitate to contact us for help.
In this first week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 02 October 2025 from 16.30-18.30. Also, check for availability of office hours of some of your class teachers.
📧 E-mail: Not sure if this course is for you? Or you have a valid reason to request a change of class? For these and other administrative queries, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 02
06 Oct 2025 -
10 Oct 2025
💻 Lab
Practice data manipulation with dplyr
and tidyr
👩🏻🏫 Lecture
Supervised Learning: Introduction to Regression Algorithms
- What is Supervised Learning? What is Regression?
- Algorithm: Linear Regression (simple and multiple)
📣 Assignment Reveal
To help you familiarise yourself with the style of the summative assignments, we will announce a formative (practice) assignment this week.
- This assignment will be about
dplyr
andtidyr
as well as regression. The precise requirements will be announced in the lecture. - You will submit your assignment via GitHub Classroom.
🛟 Support
Click here to see how to get help this week
We’re steadily adding to your data scientist knowledge toolbox. If things start to feel confusing in any way, don’t hesitate to contact us for help!
In this second week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 09 October 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as class change, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 03
13 Oct 2025 -
17 Oct 2025
💻 Lab
Linear regression (simple and multiple), a tidymodels tutorial
👩🏻🏫 Lecture
Supervised Learning: Fundamentals of Classification
- What is classification?
- Classification vs regression
- Algorithm: Logistic Regression
- From binary to multi-class classification
- K-nearest neighbours
📚 Homework (W04 lab prep)
Tutorial on tidymodels
recipes and workflows
Spend some time working on this homework before week 4’s lab. It would help you prepare for it!
🆘 Drop-in session
A drop-in session will be organized this week to help you understand the nuts and bolts of GitHub and Quarto Markdown.
🛟 Support
Click here to see how to get help this week
We’re starting to pick up speed here. If things are confusing in any way, don’t hesitate to contact us for help!
In this third week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 16 October 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 04
20 Oct 2025 -
24 Oct 2025
💻 Lab
How to solve a classification problem: logistic regression and k-nearest neighbours
👩🏻🏫 Lecture
Supervised Learning: Resampling methods
- How to evaluate a model?
- What is overfitting?
- What is resampling?
- Method: The Bootstrap
- Method: Train-Test Split
- Method: Cross-Validation
⌛ Deadline
Your first formative will be due this week!
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming formative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this fourth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 23 October 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 05
27 Oct 2025 -
31 Oct 2025
💻 Lab
Resampling, model evaluation and an introduction to tree-based models
👩🏻🏫 Lecture
Supervised Learning: Non-linear algorithms and ensemble methods
- What is non-linearity?
- Why can’t linear models capture non-linearity?
- Algorithm: Support Vector Machines
- Algorithm: Decision Trees
- Algorithm: Random Forests
📚 Homework
Problem set: hyperparameter tuning, resampling, model evaluation and comparison
- This should help you practice (and revise!) the concepts related to supervised learning you’ve learnt so far. Consider that a good preparation for your upcoming summative.
- Solutions will be provided right after Reading Week.
📣 Assignment Reveal
Your Summative 01, worth 30% of your final grade will be announced in the lecture of this week.
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this fifth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 30 October 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 06
03 Nov 2025 -
07 Nov 2025
📚 Practice and homework time
Reading Week
There are no classes or lectures this week.
Use this time to:
- review what you learnt so far
- do the homework you were given in week 5 (it will help you review and practice what you learnt so far)
- start on your upcoming W08 summative
🛟 Support
Click here to see how to get help this week
TBCPart 02
In the second half of the course, the focus shifts to unsupervised learning.
🗓️ Week 07
10 Nov 2025 -
14 Nov 2025
💻 Lab
Dimensionality reduction: a tutorial
👩🏻🏫 Lecture
Unsupervised Learning: Introduction and Dimensionality reduction
- What is unsupervised learning? How does it differ from supervised learning?
- What is dimensionality reduction, and why is it useful?
- Algorithm: PCA
- Algorithm: MCA
- Algorithm: FAMD
- Algorithm: UMAP
- Algorithm: autoencoders
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this seventh week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 13 November 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🆘 Drop-in sessions: We will host drop-in sessions on Week 07 to help support you with your Summative 01 (due in Week 8).
🗓️ Week 08
17 Nov 2025 -
21 Nov 2025
💻 Lab
Unsupervised Learning: Obtaining Insights via Clustering
👩🏻🏫 Lecture
Unsupervised Learning: Clustering
- What is clustering?
- Algorithm: k-means
- Variants of k-means
- Algorithm: DBSCAN
- What is the distinction between clustering (e.g k-means) and dimensionality reduction (e.g PCA)?
⌛ Deadline
Your Summative 01 will be due this week. The topic of the summative will be announced in the lecture of Week 05.
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this eighth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 20 November 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 09
24 Nov 2025 -
28 Nov 2025
💻 Lab
Anomaly detection – A tutorial
👩🏻🏫 Lecture
Unsupervised Learning: Anomaly detection
- Unsupervised learning goes beyond clustering: Outliers/anomalies can be important too! Examples of anomaly detection use cases
- Algorithm: Anomaly detection through clustering (e.g., DBSCAN)
- Algorithm: Tree-based anomaly detection with isolation forests
- Algorithm: Anomaly detection through density estimation (Local outlier factor (LOF))
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this nineth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab or lecture) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 27 November 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
Part 03
Finally, you will be introduced to the basics of text mining, and then we will look at some applications of the algorithms we’ve learned so far.
🗓️ Week 10
01 Dec 2025 -
05 Dec 2025
💻 Lab
A tutorial of quanteda
👩🏻🏫 Lecture
Applications: Text as Data & Topic Modelling
📣 Assignment Reveal
In this week’s lecture, we will announce your Summative 02, worth 30% of your final grade.
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this tenth week, the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignments) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 04 December 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🗓️ Week 11
08 Dec 2025 -
12 Dec 2025
💻 Lab
Decision-making time: a case study that ties everything you’ve learnt together!
👩🏻🏫 Lecture
Applications: Predictive Modelling on Tabular Data, a walkthrough
⌛ Deadline
Your Summative 02 will be due sometime the week following the end of the term. The exact deadline will be announced in the lecture of Week 10.
🛟 Support
Click here to see how to get help this week
Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!
In this last week of term (time flies!), the best ways to get help are:
Slack: Post any question you might have about the course (lab, lecture or assignment) in the
#help
channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Thursday, 11 December 2025 from 16.30-18.30.
Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.
🆘 Drop-in sessions: We will host drop-in sessions on Week 11 to help support you with your Summative 02 (due in Week 11+1).
Part 04
Finally, it’s time to put everything you’ve learnt in practice in the final group project that spans two weeks in Winter term.
🗓️ Group project
26 Jan 2025 -
09 Feb 2025
🚀 Launch (29 Jan 2025)
- “Speed-dating” with datasets : you’re presented with datasets (with associated research questions) and asked to rank them by order of preference by 4pm
- Groups (no more than 4 students per group) will be assigned by the end of the day. So stay tuned!
🔎 First check-in session (28 Jan 2025)
- Quick drop-in session to check your research plans are realistic and you’re not going off-track!
🔎 Second check-in session (4 Feb 2025)
- Drop-in session to discuss the analysis you’ve conducted so far and check you’re still on track
⌛ Deadline (09 Feb 2025)
Your final project report will be due on Feb 09 at 5pm.