LSE DS202A (2023) - Data Science for Social Scientists

2023/24 Autumn Term

📣 NOTE (updated 15 October 2023):

We’ve revised the syllabus to align with updates announced in lectures. Initially, Week 02 was set for Linear Regression, but it turned out to be a refresher on R/RStudio and the tidyverse. Consequently, we’ve shifted the entire schedule (Week 02’s original content moved to Week 03, Week 03’s to Week 04, and so on). The last lecture, originally focused on ‘Social Media Data,’ has been removed from the schedule but we will release relevant material for inspiration/self-study about this topic in due time.

Check this page every week to see more info on how to study for the course.

Part 01

The first half of the course focuses on the fundamentals of machine learning algorithms, with an emphasis on supervised learning.

Lecturer:

Photo of Dr Jon Cardoso Silva
Dr. Jon Cardoso-Silva

🗓️ Week 01
25 Sep 2023 -
29 Sep 2023

💻 Lab

(There is no lab on the first week. Instead, we recommend you check out the 📋 Getting Ready page to make sure you’re ready for the course.)

🧑‍🏫 Lecture

Introduction, Course Logistics & R programming

📖 Revise

Click to see if you’re caught up
  • Ensure you have R installed on your computer
  • Ensure you have an IDE (RStudio or VSCode) installed on your computer.
  • Install tidyverse
  • Revisit base R vs tidyverse syntax equivalence
  • Skim the textbook references mentioned in the slides to find out more about the topics covered in the lecture.

🗓️ Week 02
02 Oct 2023 -
06 Oct 2023

💻 Lab

Practice data manipulation with dplyr and tidyr

🧑‍🏫 Lecture

R/RStudio + tidyverse recap

  • Files, paths and working directories
  • Markdown basics
  • A walkthrough of the tidyverse
  • Tips on how to write code from scratch, step-by-step

📣 Assignment Reveal

To help you familiarise yourself with the style of the assignments, we will announce a formative (practice) assignment this week.

  • This assignment will be about dplyr and tidyr. The precise requirements will be announced in the lecture.
  • You will submit via GitHub Classroom

🗓️ Week 03
09 Oct 2023 -
13 Oct 2023

💻 Lab

Linear regression, a tidymodels tutorial

🧑‍🏫 Lecture

Supervised Learning: Introduction to Regression Algorithms

  • What is Supervised Learning? What is Regression?
  • Algorithm: Linear Regression (simple and multiple)

⌛ Deadline

Your first formative will be due a day before the lecture!

📣 Assignment Reveal

Your Summative 01, worth 10% of your final grade will be announced in the lecture of this week.

🗓️ Week 04
16 Oct 2023 -
20 Oct 2023

💻 Lab

Tidymodel recipes and workflows - a tutorial

🧑‍🏫 Lecture

Supervised Learning: Fundamentals of Classification

  • What is classification?
  • Algorithm: Logistic Regression
  • Algorithm: Naive Bayes
  • From binary to multi-class classification

🆘 Drop-in sessions

We will host drop-in sessions on Week 04 to help support you with your Summative 01.

⌛ Deadline

Your Summative 01 will be due sometime this week. The exact deadline will be announced in the lecture of Week 04.

🗓️ Week 05
23 Oct 2023 -
27 Oct 2023

💻 Lab

Parameter tuning with tidymodels

🧑‍🏫 Lecture

Supervised Learning: Resampling methods

  • How to evaluate a model?
  • What is overfitting?
  • What is resampling?
  • Method: Train-Test Split
  • Method: Cross-Validation
  • Method: The Bootstrap
  • Method: Hyperparameter Tuning

📣 Assignment Reveal

Your Summative 02, worth 20% of your final grade will be announced in the lecture of this week.

🗓️ Week 06
30 Oct 2023 -
04 Nov 2023

🆘 Drop-in sessions

There is no lecture or lab this week. Instead, we will hold drop-in sessions to help you with your Summative 02. The exact times and dates will be announced in the lecture of Week 05.

Part 02

In the second half of the course, the focus shifts to unsupervised learning.

Lecturer:

Photo of Dr Ghita Berrada
Dr. Ghita Berrada

🗓️ Week 07
06 Nov 2023 -
11 Nov 2023

💻 Lab

Decision trees and further parameter tuning

🧑‍🏫 Lecture

Supervised Learning: Non-linear algorithms and ensemble methods

  • What is non-linearity?
  • Why can’t linear models capture non-linearity?
  • Algorithm: Decision Trees
  • Algorithm: Random Forests
  • Algorithm: Support Vector Machines
  • Overview of other algorithms: k-Nearest Neighbours, Neural Networks, etc.

⌛ Deadline

Your Summative 02 will be due sometime this week. The exact deadline will be announced in the lecture of Week 05.

🗓️ Week 08
13 Nov 2023 -
17 Nov 2023

💻 Lab

Comparing models and model evaluation

🧑‍🏫 Lecture

Unsupervised Learning: Clustering and Anomaly detection part 1

  • What is unsupervised learning? How does it differ from supervised learning?
  • What is clustering?
  • Algorithm: k-means
  • Algorithm: DBSCAN
  • Outliers/anomalies can be important too! Examples of anomaly detection use cases
  • Algorithm: Anomaly detection through clustering (e.g., k-means)

🗓️ Week 09
20 Nov 2023 -
24 Nov 2023

💻 Lab

Unsupervised Learning: Obtaining Insights via Clustering

🧑‍🏫 Lecture

Unsupervised Learning: Anomaly Detection part 2 and Dimensionality reduction

  • Algorithm: Anomaly detection through density estimation (Local outlier factor (LOF))
  • What is dimensionality reduction, and why is it useful?
  • Algorithm: PCA
  • Algorithm: tSNE (this would have been too much content for one lecture, so we’ve decided to drop it)
  • Algorithm: UMAP (this would have been too much content for one lecture, so we’ve decided to drop it)

📣 Assignment Reveal

This week’s lecture will announce your Summative 03, worth 30% of your final grade.

Part 03

Finally, you will be introduced to the basics of text mining, and then we will look at some applications of the algorithms we’ve learned so far.

Lecturers:

Photo of Dr Jon Cardoso Silva
Dr. Jon Cardoso-Silva
Photo of Stuart Bramwell
Dr Stuart Bramwell

🗓️ Week 10
27 Nov 2023 -
01 Dec 2023

💻 Lab

Tutorial of dimensionality reduction and anomaly detection

🧑‍🏫 Lecture

Applications: Text as Data & Topic Modelling

🗓️ Week 11
04 Dec 2023 -
08 Dec 2023

💻 Lab

A tutorial of quanteda

🧑‍🏫 Lecture

Applications: Predictive Modelling on Tabular Data, a walktrough

⌛ Deadline

Your Summative 03 will be due sometime this week or the week following the end of the term. The exact deadline will be announced in the lecture of Week 09.

🆘 Drop-in sessions

We will host drop-in sessions on Week 11 to help support you with your Summative 03.