LSE DS202A (2024) - Data Science for Social Scientists

2024/25 Autumn Term

Author
📣 NOTE (updated 20 September 2024):

The syllabus has been revised to take into account the experiences of students on the 2023/2024 sessions of the DS202A/DS202W course (needs and difficulties).

📋 NOTE:

Ideally, you should also be taking the pre-sessional courses offered by LSE Digital Skills Lab in the first weeks of the Autumn Term:

Check this page every week to see more info on how to study for the course.

Part 01

The first half of the course focuses on the fundamentals of machine learning algorithms, with an emphasis on supervised learning.


🗓️ Week 01
30 Sep 2024 -
04 Oct 2024

💻 Lab

R/RStudio + tidyverse recap

To fully prepare for this lab, we highly recommend you go through the setup steps outlined in section 1 of the 📋 Getting Ready page.

👩🏻‍🏫 Lecture

Introduction, Course Logistics & R programming

📖 Revise

Click to see if you’re caught up
  • Ensure you have R installed on your computer
  • Ensure you have an IDE (RStudio or VSCode) installed on your computer.
  • Install tidyverse
  • Revisit base R vs tidyverse syntax equivalence
  • Skim the textbook references mentioned in the slides to find out more about the topics covered in the lecture.

🛟 Support

Click here to see how to get help this week

We love hearing from you! Truly! Don’t hesitate to contact us for help.

In this first week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 02 October 2024 from 12.30-2.30 pm. Also, check for availability of office hours of some of your class teachers. Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 01 October 2024 from 12.15-1.30pm.

  • 📧 E-mail: Not sure if this course is for you? Or you have a valid reason to request a change of class? For these and other administrative queries, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 02
07 Oct 2024 -
11 Oct 2024

💻 Lab

Practice data manipulation with dplyr and tidyr

👩🏻‍🏫 Lecture

Supervised Learning: Introduction to Regression Algorithms

  • What is Supervised Learning? What is Regression?
  • Algorithm: Linear Regression (simple and multiple)

🛟 Support

Click here to see how to get help this week

We’re steadily adding to your data scientist knowledge toolbox. If things start to feel confusing in any way, don’t hesitate to contact us for help!

In this second week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 09 October 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 08 October 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as class change, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 03
14 Oct 2024 -
18 Oct 2024

💻 Lab

Linear regression (simple and multiple), a tidymodels tutorial

👩🏻‍🏫 Lecture

Supervised Learning: Fundamentals of Classification

  • What is classification?
  • Classification vs regression
  • Algorithm: Logistic Regression
  • From binary to multi-class classification
  • K-nearest neighbours

📚 Homework (W04 lab prep)

Tutorial on tidymodels recipes and workflows

Spend some time working on this homework before week 4’s lab. It would help you prepare for it!

📣 Assignment Reveal

To help you familiarise yourself with the style of the summative assignments, we will announce a formative (practice) assignment this week.

  • This assignment will be about dplyr and tidyr as well as regression. The precise requirements will be announced in the lecture.
  • You will submit your assignment via GitHub Classroom.

🛟 Support

Click here to see how to get help this week

We’re starting to pick up speed here. If things are confusing in any way, don’t hesitate to contact us for help!

In this third week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 16 October 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 15 October 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 04
21 Oct 2024 -
25 Oct 2024

💻 Lab

How to solve a classification problem: logistic regression and k-nearest neighbours

👩🏻‍🏫 Lecture

Supervised Learning: Resampling methods

  • How to evaluate a model?
  • What is overfitting?
  • What is resampling?
  • Method: The Bootstrap
  • Method: Train-Test Split
  • Method: Cross-Validation

⌛ Deadline

Your first formative will be due a day before the lecture!

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming formative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this fourth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 23 October 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Thursday, 24 October 2024 from 3.00-4.15pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 05
28 Oct 2024 -
01 Nov 2024

💻 Lab

Resampling, model evaluation and an introduction to tree-based models

👩🏻‍🏫 Lecture

Supervised Learning: Non-linear algorithms and ensemble methods

  • What is non-linearity?
  • Why can’t linear models capture non-linearity?
  • Algorithm: Support Vector Machines
  • Algorithm: Decision Trees
  • Algorithm: Random Forests

📚 Homework

Problem set: hyperparameter tuning, resampling, model evaluation and comparison

  • This should help you practice (and revise!) the concepts related to supervised learning you’ve learnt so far. Consider that a good preparation for your upcoming summative.
  • Solutions will be provided right after Reading Week.

📣 Assignment Reveal

Your Summative 01, worth 30% of your final grade will be announced in the lecture of this week.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this fifth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 30 October 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 29 October 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 06
04 Nov 2024 -
08 Nov 2024

📚 Practice and homework time

Reading Week

There are no classes or lectures this week.

Use this time to:

  • review what you learnt so far
  • do the homework you were given in week 5 (it will help you review and practice what you learnt so far)
  • start on your upcoming W08 summative

🛟 Support

Click here to see how to get help this week TBC

Part 02


In the second half of the course, the focus shifts to unsupervised learning.



🗓️ Week 07
11 Nov 2024 -
15 Nov 2024

💻 Lab

Unsupervised Learning: Obtaining Insights via Clustering

👩🏻‍🏫 Lecture

Unsupervised Learning: Introduction and Clustering

  • What is unsupervised learning? How does it differ from supervised learning?
  • What is clustering?
  • Algorithm: k-means
  • Variants of k-means
  • Algorithm: DBSCAN

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your homework or upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this seventh week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 13 November 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 12 November 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

  • 🆘 Drop-in sessions: We will host drop-in sessions on Week 07 to help support you with your Summative 01 (due in Week 8).

🗓️ Week 08
18 Nov 2024 -
22 Nov 2024

💻 Lab

Anomaly detection – A tutorial

👩🏻‍🏫 Lecture

Unsupervised Learning: Anomaly detection

  • Unsupervised learning goes beyond clustering: Outliers/anomalies can be important too! Examples of anomaly detection use cases
  • Algorithm: Anomaly detection through clustering (e.g., DBSCAN)
  • Algorithm: Tree-based anomaly detection with isolation forests
  • Algorithm: Anomaly detection through density estimation (Local outlier factor (LOF))

⌛ Deadline

Your Summative 01 will be due the day before the lecture. The topic of the summative will be announced in the lecture of Week 05.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this eighth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 20 November 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 19 November 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 09
25 Nov 2024 -
29 Nov 2024

💻 Lab

Dimensionality reduction: a tutorial

👩🏻‍🏫 Lecture

Unsupervised Learning: Dimensionality reduction

  • What is dimensionality reduction, and why is it useful?
  • Algorithm: PCA
  • Algorithm: UMAP
  • What is the distinction between clustering (e.g k-means) and dimensionality reduction (e.g PCA)?

📣 Assignment Reveal

In this week’s lecture, we will announce your Summative 02, worth 30% of your final grade.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this nineth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab or lecture) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 27 November 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 26 November 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

Part 03


Finally, you will be introduced to the basics of text mining, and then we will look at some applications of the algorithms we’ve learned so far.



🗓️ Week 10
02 Dec 2024 -
06 Dec 2024

💻 Lab

A tutorial of quanteda

👩🏻‍🏫 Lecture

Applications: Text as Data & Topic Modelling

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with your upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this tenth week, the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignments) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 04 December 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 03 December 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

🗓️ Week 11
09 Dec 2024 -
13 Dec 2024

💻 Lab

Decision-making time: a case study that ties everything you’ve learnt together!

👩🏻‍🏫 Lecture

Applications: Predictive Modelling on Tabular Data, a walkthrough

⌛ Deadline

Your Summative 02 will be due sometime the week following the end of the term. The exact deadline will be announced in the lecture of Week 09.

🛟 Support

Click here to see how to get help this week

Another exciting week in terms of data science knowledge. If you find any new concept you learned unclear, need help with upcoming summative or want to talk to us about some other pressing issue, don’t hesitate to contact us for help!

In this last week of term (time flies!), the best ways to get help are:

  • Slack: Post any question you might have about the course (lab, lecture or assignment) in the #help channel. Ghita (as well as your class teachers) will be checking for messages every now and then throughout the week.

  • 💬 Office Hours: If you want 1 to 1 in-person support or you want to discuss anything about the course, go to StudentHub and book a 15-minute slot with Ghita on Wednesday, 11 December 2024 from 12.30-2.30 pm.

    Also, check for availability of office hours of some of your class teachers (you’ll find the timings of your class teachers’ office hours on the 📟 Communication page).

    Sara will also be running a support session at the Visualisation studio (COL 1.06) on Tuesday, 10 December 2024 from 12.15-1.30pm.

  • 📧 E-mail: For any administrative queries, such as extension requests, write , our Teaching & Assessment Support Officer at the DSI.

  • 🆘 Drop-in sessions: We will host drop-in sessions on Week 11 to help support you with your Summative 02 (due in Week 11+1).

Part 04


Finally, it’s time to put everything you’ve learnt in practice in the final group project that spans two weeks in Winter term.



🗓️ Group project
29 Jan 2025 -
12 Feb 2025

🚀 Launch (29 Jan 2025)

  • “Speed-dating” with datasets : you’re presented with datasets (with associated research questions) and asked to rank them by order of preference by 4pm
  • Groups (no more than 4 students per group) will be assigned by the end of the day. So stay tuned!

🔎 First check-in session (31 Jan 2025)

  • Quick drop-in session to check your research plans are realistic and you’re not going off-track!

🔎 Second check-in session (5 Feb 2025)

  • Drop-in session to discuss the analysis you’ve conducted so far and check you’re still on track

⌛ Deadline (12 Feb 2025)

Your final project report will be due on Feb 12 at 5pm.