LSE DS202A (2023) - Data Science for Social Scientists
2023/24 Autumn Term
We’ve revised the syllabus to align with updates announced in lectures. Initially, Week 02 was set for Linear Regression, but it turned out to be a refresher on R/RStudio and the tidyverse. Consequently, we’ve shifted the entire schedule (Week 02’s original content moved to Week 03, Week 03’s to Week 04, and so on). The last lecture, originally focused on ‘Social Media Data,’ has been removed from the schedule but we will release relevant material for inspiration/self-study about this topic in due time.
Check this page every week to see more info on how to study for the course.
Part 01
The first half of the course focuses on the fundamentals of machine learning algorithms, with an emphasis on supervised learning.
Lecturer:

🗓️ Week 01
25 Sep 2023 -
29 Sep 2023
💻 Lab
(There is no lab on the first week. Instead, we recommend you check out the 📋 Getting Ready page to make sure you’re ready for the course.)
🧑🏫 Lecture
Introduction, Course Logistics & R programming
📖 Revise
Click to see if you’re caught up
- Ensure you have R installed on your computer
- Ensure you have an IDE (RStudio or VSCode) installed on your computer.
- Install tidyverse
- Revisit base R vs
tidyverse
syntax equivalence - Skim the textbook references mentioned in the slides to find out more about the topics covered in the lecture.
🗓️ Week 02
02 Oct 2023 -
06 Oct 2023
💻 Lab
Practice data manipulation with dplyr
and tidyr
🧑🏫 Lecture
R/RStudio + tidyverse recap
- Files, paths and working directories
- Markdown basics
- A walkthrough of the tidyverse
- Tips on how to write code from scratch, step-by-step
📣 Assignment Reveal
To help you familiarise yourself with the style of the assignments, we will announce a formative (practice) assignment this week.
- This assignment will be about
dplyr
andtidyr
. The precise requirements will be announced in the lecture. - You will submit via GitHub Classroom
🗓️ Week 03
09 Oct 2023 -
13 Oct 2023
💻 Lab
Linear regression, a tidymodels tutorial
🧑🏫 Lecture
Supervised Learning: Introduction to Regression Algorithms
- What is Supervised Learning? What is Regression?
- Algorithm: Linear Regression (simple and multiple)
⌛ Deadline
Your first formative will be due a day before the lecture!
📣 Assignment Reveal
Your Summative 01, worth 10% of your final grade will be announced in the lecture of this week.
🗓️ Week 04
16 Oct 2023 -
20 Oct 2023
💻 Lab
Tidymodel recipes and workflows - a tutorial
🧑🏫 Lecture
Supervised Learning: Fundamentals of Classification
- What is classification?
- Algorithm: Logistic Regression
- Algorithm: Naive Bayes
- From binary to multi-class classification
🆘 Drop-in sessions
We will host drop-in sessions on Week 04 to help support you with your Summative 01.
⌛ Deadline
Your Summative 01 will be due sometime this week. The exact deadline will be announced in the lecture of Week 04.
🗓️ Week 05
23 Oct 2023 -
27 Oct 2023
💻 Lab
Parameter tuning with tidymodels
🧑🏫 Lecture
Supervised Learning: Resampling methods
- How to evaluate a model?
- What is overfitting?
- What is resampling?
- Method: Train-Test Split
- Method: Cross-Validation
- Method: The Bootstrap
- Method: Hyperparameter Tuning
📣 Assignment Reveal
Your Summative 02, worth 20% of your final grade will be announced in the lecture of this week.
🗓️ Week 06
30 Oct 2023 -
04 Nov 2023
🆘 Drop-in sessions
There is no lecture or lab this week. Instead, we will hold drop-in sessions to help you with your Summative 02. The exact times and dates will be announced in the lecture of Week 05.
Part 02
In the second half of the course, the focus shifts to unsupervised learning.
Lecturer:

🗓️ Week 07
06 Nov 2023 -
11 Nov 2023
💻 Lab
Decision trees and further parameter tuning
🧑🏫 Lecture
Supervised Learning: Non-linear algorithms and ensemble methods
- What is non-linearity?
- Why can’t linear models capture non-linearity?
- Algorithm: Decision Trees
- Algorithm: Random Forests
- Algorithm: Support Vector Machines
- Overview of other algorithms: k-Nearest Neighbours, Neural Networks, etc.
⌛ Deadline
Your Summative 02 will be due sometime this week. The exact deadline will be announced in the lecture of Week 05.
🗓️ Week 08
13 Nov 2023 -
17 Nov 2023
💻 Lab
Comparing models and model evaluation
🧑🏫 Lecture
Unsupervised Learning: Clustering and Anomaly detection part 1
- What is unsupervised learning? How does it differ from supervised learning?
- What is clustering?
- Algorithm: k-means
- Algorithm: DBSCAN
- Outliers/anomalies can be important too! Examples of anomaly detection use cases
- Algorithm: Anomaly detection through clustering (e.g., k-means)
🗓️ Week 09
20 Nov 2023 -
24 Nov 2023
💻 Lab
Unsupervised Learning: Obtaining Insights via Clustering
🧑🏫 Lecture
Unsupervised Learning: Anomaly Detection part 2 and Dimensionality reduction
- Algorithm: Anomaly detection through density estimation (Local outlier factor (LOF))
- What is dimensionality reduction, and why is it useful?
- Algorithm: PCA
Algorithm: tSNE(this would have been too much content for one lecture, so we’ve decided to drop it)Algorithm: UMAP(this would have been too much content for one lecture, so we’ve decided to drop it)
📣 Assignment Reveal
This week’s lecture will announce your Summative 03, worth 30% of your final grade.
Part 03
Finally, you will be introduced to the basics of text mining, and then we will look at some applications of the algorithms we’ve learned so far.
Lecturers:


🗓️ Week 10
27 Nov 2023 -
01 Dec 2023
💻 Lab
Tutorial of dimensionality reduction and anomaly detection
🧑🏫 Lecture
Applications: Text as Data & Topic Modelling
🗓️ Week 11
04 Dec 2023 -
08 Dec 2023
💻 Lab
A tutorial of quanteda
🧑🏫 Lecture
Applications: Predictive Modelling on Tabular Data, a walktrough
⌛ Deadline
Your Summative 03 will be due sometime this week or the week following the end of the term. The exact deadline will be announced in the lecture of Week 09.
🆘 Drop-in sessions
We will host drop-in sessions on Week 11 to help support you with your Summative 03.