LSE DS202A (2023) - Data Science for Social Scientists

2023/24 Winter Term

📣 NOTE (updated 10 January 2024):

The syllabus has been revised to take into account the experiences of students on the DS202A course (needs and difficulties) while, at the same time, making sure both iterations of the DS202 course (i.e DS202A and DS202W) didn’t diverge too much and remained consistent with each other.

Check this page every week to see more info on how to study for the course.

Part 01

The first half of the course focuses on the fundamentals of machine learning algorithms, with an emphasis on supervised learning.

Lecturer:

Photo of Dr Jon Cardoso Silva
Dr. Jon Cardoso-Silva

🗓️ Week 01
15 Jan 2024 -
19 Jan 2024

💻 Lab

R/RStudio + tidyverse recap

To fully prepare for this lab, we highly recommend you go through the setup steps outlined in section 1 of the 📋 Getting Ready page.

🧑‍🏫 Lecture

Introduction, Course Logistics & R programming

📖 Revise

Click to see if you’re caught up
  • Ensure you have R installed on your computer
  • Ensure you have an IDE (RStudio or VSCode) installed on your computer.
  • Install tidyverse
  • Revisit base R vs tidyverse syntax equivalence
  • Skim the textbook references mentioned in the slides to find out more about the topics covered in the lecture.

🗓️ Week 02
22 Jan 2024 -
26 Jan 2024

💻 Lab

Practice data manipulation with dplyr and tidyr

🧑‍🏫 Lecture

Supervised Learning: Introduction to Regression Algorithms

  • What is Supervised Learning? What is Regression?
  • Algorithm: Linear Regression (simple and multiple)

📣 Assignment Reveal

To help you familiarise yourself with the style of the assignments, we will announce a formative (practice) assignment this week.

  • This assignment will be about dplyr and tidyr. The precise requirements will be announced in the lecture.
  • You will submit via GitHub Classroom

🗓️ Week 03
29 Jan 2024 -
02 Feb 2024

💻 Lab

Linear regression, a tidymodels tutorial

🧑‍🏫 Lecture

Supervised Learning: Fundamentals of Classification

  • What is classification?
  • Algorithm: Logistic Regression
  • From binary to multi-class classification

⌛ Deadline

Your first formative will be due a day before the lecture!

📣 Assignment Reveal

Your Summative 01, worth 10% of your final grade will be announced in the lecture of this week.

🗓️ Week 04
05 Feb 2024 -
09 Feb 2024

💻 Lab

Tidymodel recipes and workflows - a tutorial

🧑‍🏫 Lecture

Supervised Learning: Resampling methods

  • How to evaluate a model?
  • What is overfitting?
  • What is resampling?
  • Method: Train-Test Split
  • Method: Cross-Validation
  • Method: The Bootstrap
  • Method: Hyperparameter Tuning

🆘 Drop-in sessions

We will host drop-in sessions on Week 04 to help support you with your Summative 01.

⌛ Deadline

Your Summative 01 will be due sometime this week. The exact deadline will be announced in the lecture of Week 03.

🗓️ Week 05
12 Feb 2024 -
16 Feb 2024

💻 Lab

Parameter tuning with tidymodels

🧑‍🏫 Lecture

Supervised Learning: A further exploration of resampling and classification metrics

📣 Assignment Reveal

Your Summative 02, worth 20% of your final grade will be announced in the lecture of this week.

🗓️ Week 06
19 Feb 2024 -
23 Feb 2024

🆘 Drop-in sessions

There is no lecture or lab this week. Instead, we will hold drop-in sessions to help you with your Summative 02. The exact times and dates will be announced in the lecture of Week 05.

Part 02

In the second half of the course, the focus shifts to unsupervised learning.

Lecturer:

Photo of Dr Ghita Berrada
Dr. Ghita Berrada

🗓️ Week 07
26 Feb 2024 -
01 Mar 2024

💻 Lab

Decision trees and further parameter tuning

🧑‍🏫 Lecture

Supervised Learning: Non-linear algorithms and ensemble methods

  • What is non-linearity?
  • Why can’t linear models capture non-linearity?
  • Algorithm: Support Vector Machines
  • Algorithm: Decision Trees
  • Algorithm: Random Forests

⌛ Deadline

Your Summative 02 will be due sometime this week. The exact deadline will be announced in the lecture of Week 05.

🗓️ Week 08
04 Mar 2024 -
08 Mar 2024

💻 Lab

Comparing models and model evaluation

🧑‍🏫 Lecture

Unsupervised Learning: Introduction and Clustering

  • What is unsupervised learning? How does it differ from supervised learning?
  • What is clustering?
  • Algorithm: k-means
  • Variants of k-means
  • Algorithm: DBSCAN

🗓️ Week 09
11 Mar 2024 -
15 Mar 2024

💻 Lab

Unsupervised Learning: Obtaining Insights via Clustering

🧑‍🏫 Lecture

Unsupervised Learning: Anomaly detection and Dimensionality reduction

  • Unsupervised learning goes beyond clustering: Outliers/anomalies can be important too! Examples of anomaly detection use cases
  • Algorithm: Anomaly detection through clustering (e.g., DBSCAN)
  • Algorithm: Anomaly detection through density estimation (Local outlier factor (LOF))
  • What is dimensionality reduction, and why is it useful?
  • Algorithm: PCA
  • Algorithm: UMAP
  • What is the distinction between clustering (e.g k-means) and dimensionality reduction (e.g PCA)?

📣 Assignment Reveal

In this week’s lecture, we will announce your Summative 03, worth 30% of your final grade.

Part 03

Finally, you will be introduced to the basics of text mining, and then we will look at some applications of the algorithms we’ve learned so far.

🗓️ Week 10
18 Mar 2024 -
22 Mar 2024

💻 Lab

Tutorial of dimensionality reduction and anomaly detection

🧑‍🏫 Lecture

Applications: Text as Data & Topic Modelling

🗓️ Week 11
25 Mar 2024 -
29 Mar 2024

💻 Lab

A tutorial of quanteda

🧑‍🏫 Lecture

Applications: Predictive Modelling on Tabular Data, a walkthrough

⌛ Deadline

Your Summative 03 will be due sometime this week or the week following the end of the term. The exact deadline will be announced in the lecture of Week 09.

🆘 Drop-in sessions

We will host drop-in sessions on Week 11 to help support you with your Summative 03.

:::