LSE DS202 – Data Science for Social Scientists
19 Jan 2026
decision support systems
machine learning applications
databases
provenance
ethical AI/XAI
Write an e-mail to Kevin:

Sign up for DSI events at
lse.ac.uk/DSI/Events



Follow the seminar series: 🔗 https://www.lse.ac.uk/DSI/seminar-series
Hear from alumni or industry experts about their career paths and how they got to where they are today.
Past events:
🗓️ Navigating Data Science from Academia to Media, and Beyond (23 October 2024 - 4.30 to 6pm)
With the rise in adoption of AI/ML technologies and the increasing demand for data-driven decision-making, data science has become a vital component across many industries, including media. As data science transforms the media landscape - enhancing content personalisation, optimising conversion strategies, and improving audience engagement, it is also becoming an increasingly popular tool for addressing complex business challenges. Navigating a role in this field can be both exciting and challenging.
Tabtim Duenger, Senior Data Scientist at The Economist and Riya Chhikara, Data Scientist at the Economist, both LSE graduates1, will offer insights into their paths to entering the field of data science. They will discuss their experiences in landing their first roles, negotiating their functions and responsibilities within the media sector, and how they use these experiences and networks to continue guiding their careers.
Read more about this series of events: 🔗 Link
Hear from alumni or industry experts about their career paths and how they got to where they are today.
Upcoming events:
🗓️ Data science in transport: every career journey matters (11 February 2026 - 4.00 to 5.30pm)
The journey from academia to industry can be a daunting one. What does data science in industry look like day-to-day? How do you apply theory to business problems with tight deadlines and messy data?
This session brings together data science experts from Transport for London (TfL)1 to map out their journeys from academia to one of the world’s leading and data rich transport authorities. They will explore what actually matters in industry data science, how to position your experience, and why working on transport problems offers a mix of technical challenges and social impact.
Read more about this series of events: 🔗 Link



New field trips to be announced soon!
Sign up for DSI events at lse.ac.uk/DSI/Events
| Programme | Freq |
|---|---|
| General Course | 14 |
| BSc in Economics | 10 |
| BSc in Psychological and Behavioural Science | 4 |
| BSc in Politics and Data Science | 3 |
| BSc in Social Anthropology | 2 |
| BSc in Philosophy and Economics | 1 |
| BSc in Philosophy, Politics and Economics | 1 |
| BSc in Sociology | 1 |
| Year | Count |
|---|---|
| 1 | 17 |
| 2 | 10 |
| 3 | 11 |
| 4 | 2 |

This cohort brings:
A first data-quality issue
Official statistics feel authoritative — but embed assumptions.
Question:
Is inflation directly comparable across countries? Across decades?
A GDP value in 1995 is not strictly the same object as GDP in 2025.
GDP masks distributional differences.
Note
Data rarely lies —
but it simplifies, omits, and encodes choices.
What is this course about?
What is this course about?
Two critical principles
1. Learn to learn
2. No single “right answer”
👩🏻🏫 Lectures (first in the week)
🧑🏻💻 Labs (later in the week)
✍️ Assignments
Important
The goal is to eventually perform analysis
without step-by-step help.
Important
You must attend the lab you are enrolled in.
You cannot switch labs on the day.
Each week, you will receive a roadmap.
| Type | Description |
|---|---|
| 🧑🏻🏫 Teaching moment | Full attention required |
| 🎯 Action points | Follow steps, try first, ask if stuck |
| 👥 In pairs/groups | Learn from peers |
| 🗣️ Discussion | Interpret results |
| 📝 Submission | Submit work |
Primary sources of truth
Moodle:
Note
If something seems missing on Moodle:
→ check the website
→ check Slack
→ then ask
Datasets in this course are provided.
However:
Note
In practice, you rarely choose the data —
but you constantly ask what else it could be used for.
Programming language
IDE option
We assume basic knowledge of:
Why environment management matters
Our approach in this course
Weeks 1–4: Python 3.13
Week 5 onwards: Python 3.12
There are three official positions at LSE:
Position 1 No authorised use of generative AI in assessment (Grammar/spell-checking may be exempt)
Position 2 Limited authorised use of generative AI
Position 3 Full authorised use of generative AI 👉 This is the position adopted in this course
Source: LSE School position on generative AI, September 2024
Our policy: Responsible use (not optional)
✅ You MAY use
⚠️ You MUST
❌ You MAY NOT
Example acknowledgement
“I used ChatGPT to debug a pandas merge. It suggested pd.merge() with on='date', which produced duplicates. After checking the documentation, I changed the join type to how='left'.”
Why this matters
👉 Full policy available on course website/Moodle — read it carefully
Empirical, experience-focused learning
This is not a “spoon-feeding” course

Image created with DALL·E via Bing Chat.
Prompt: “Person climbing a mountain of books, using a compass and magnifying glass.”
We’ll do a quick live poll using Mentimeter.
👉 Go to menti.com and enter the code shown on screen
(or scan the QR code).
How we’ll use the results
70% comfortable → short refresher
We will adapt the pace accordingly.
“…a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡 insights into a problem or a phenomenon.”
Such data may be generated by humans (surveys, logs, administrative data) or machines (sensors, transaction systems, digital traces),
and may exist in many formats (text, audio, images, video, etc.).”
New data to answer old questions:
New questions enabled by new data/new technologies:
This course equips you to engage with such questions
using data, models, and justification.
👉 Traditional statistics (social sciences)
Focus: explanation
👉 Data science
Focus: exploration and prediction
It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.
This course is mostly about the ‘20%’ stage. Most of the data we will give you is already clean and ready to be modeled with machine learning.
Next week, we will discuss together what it means for a machine to learn something.
It is often said that 80% of a data science project involves:
In this course
We focus mainly on the “20%”:
But you will experience the “80%” at times as well
Most datasets you receive are pre-cleaned… …but we will try and expose you to realistic messiness as much as possible (including in assignments!).
Note
Your first formative includes a gentle introduction to the 80%. No panic — support is built in.
Question
Will the Bank of England hold, increase, or decrease interest rates at the next rate-setting meeting?
Why this matters
This was a real assignment in a previous year (last year’s W08 summative).
By the end of this course, you will be able to tackle it yourself.
This stage requires domain knowledge.
Plausible indicators
At this stage:
Later, during modelling, we:
Different indicators come from different institutions:
Immediate challenges
Warning
Same concept ≠ same data structure
The tricky bit:
For each BoE decision date, calculate 3-month average of each indicator
Example:
Why this matters:
Tip
This is the 80%: Getting data into the right shape for analysis
Why aggregate to quarters?
Monetary policy decisions reflect recent economic performance
Quarterly aggregation:
Important
Tip
Data science involves making hypotheses — not just running algorithms.
A major challenge: distribution shift
Examples:
For this, let’s have a look at the course syllabus
Before: messy reality

Note
This is where most real work happens — and where most insight is created.

Image created with DALL·E.
Prompt: “Cat drinking tea in a classroom, Renoir style.”
Coming up next
Take ~10 minutes, then we continue.
A few indicators
General-purpose language
Used across:
Designed for statistics
Excellent for:
Less common in production ML
Data types
Python lists
Python lists (cont.)
Tuples
Tuples are immutable.
To “update” them, you must create a new one.
Dictionaries
Repeating operations
Custom functions
Functions encapsulate decisions, not just code.
Custom functions definition
Let’s define functions based on the loops and list comprehensions. We’ll do some code profiling!
import cProfile
def for_loop_example():
result = []
for i in range(100000):
result.append(i * 2)
def while_loop_example():
result = []
i = 0
while i < 100000:
result.append(i * 2)
i += 1
def list_comprehension_example():
result = [i * 2 for i in range(100000)]
# Profile each function
print("Profiling for loop:")
cProfile.run("for_loop_example()")
print("\nProfiling while loop:")
cProfile.run("while_loop_example()")
print("\nProfiling list comprehension:")
cProfile.run("list_comprehension_example()")Results from the loops and list comprehension profiling
Profiling for loop:
100004 function calls in 0.022 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.013 0.013 0.021 0.021 <python-input-66>:3(for_loop_example)
1 0.001 0.001 0.022 0.022 <string>:1(<module>)
1 0.000 0.000 0.022 0.022 {built-in method builtins.exec}
100000 0.008 0.000 0.008 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Profiling while loop:
100004 function calls in 0.021 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.014 0.014 0.020 0.020 <python-input-66>:8(while_loop_example)
1 0.001 0.001 0.021 0.021 <string>:1(<module>)
1 0.000 0.000 0.021 0.021 {built-in method builtins.exec}
100000 0.006 0.000 0.006 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Profiling list comprehension:
4 function calls in 0.003 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 0.002 0.002 <python-input-66>:15(list_comprehension_example)
1 0.001 0.001 0.003 0.003 <string>:1(<module>)
1 0.000 0.000 0.003 0.003 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Why?
Your file:
python_basics.ipynb
From the terminal:
This creates:
python_basics.html
Note
The same Quarto engine renders:
One tool, many outputs.
Next labs
pandas basics for data manipulation (W02)Before then
Office hours
![]()
LSE DS202W (2025/26) – Week 01