LSE DS202 – Data Science for Social Scientists
19 Jan 2024
networks
optimisation
software engineering
machine learning applications
the impact of generative AI for education
decision support systems
machine learning applications
databases
provenance
ethical AI/XAI
Write an e-mail to Kevin:
Sign up for DSI events at lse.ac.uk/DSI/Events
Follow the seminar series: 🔗 Link
Hear from alumni or industry experts about their career paths and how they got to where they are today.
Upcoming event:
🗓️ Keeping London Moving with Data (28 February 4 - 5.30pm)
A talk about life in the data world at TfL. Jemima, Graduate Data Scientist at Transport for London (TfL) will talk about her experience as a Data Science Graduate in our inaugural programme. Lauren Sager Weinstein, Chief Data Officer, at Transport for London (TfL) will talk about how she’s leading TfL’s data strategy, and how all the components of data careers (data scientists, data developers, data product managers, and data users) can come together to deliver on our data vision: To empower our people to make better decisions with data.
Read more about this series of events: 🔗 Link
Programme | Count |
---|---|
BSc in Politics and Data Science | 19 |
BSc in Economics | 13 |
General Course | 5 |
BSc in Sociology | 3 |
BA in History | 1 |
BA in Social Anthropology | 1 |
BSc in Actuarial Science | 1 |
BSc in International Social and Public Policy with Politics | 1 |
BSc in Philosophy | 1 |
BSc in Philosophy and Economics | 1 |
Year | Count |
---|---|
1 | 16 |
2 | 23 |
3 | 7 |
What is this course about?
Focus: learn and understand the most fundamental machine learning algorithms
How: practical use of machine learning techniques and its metrics, applied to relevant data sets
What is this course about?
How will this course be taught?
How do I prepare for this course?
Important
There might be some preparatory work to do before each lab!
Always check Moodle/the webpage at least a day before coming to the lab.
Each week, you will have a roadmap of what to do.
The roadmap will typically contain the following elements:
Type of activity | Description |
---|---|
🧑🏻🏫 TEACHING MOMENT | Your class teacher deserves your full attention |
🎯 ACTION POINTS | Time to follow the steps in the roadmap. Try it for a bit, but if you get stuck, call your class teacher. |
👥 IN PAIRS/GROUPS | You will benefit from completing that task with your peers more than doing it alone |
🗣️ CLASSROOM DISCUSSION | Your class teacher will facilitate a discussion about the task |
📝 SUBMISSION | Submit your work |
👉 Now, let’s navigate our Moodle page to see the 📓 Syllabus and to talk about ✍️ Assessments & Feedback.
If you are reading this but you are not an LSE student, the same content is available on the course’s 🌐 public website
More on that later…
We assume that you have some basic knowledge of:
We assume that you have some basic knowledge of:
- If you took ST102, you should be fine.
- Nothing crazy, mostly matrix operations (simpler than MA107)
- It’s ok if you are new to R, but do reserve some extra hours in the first weeks to practice the basics.
Image created with DALL·E via Bing Chat AI bot. Prompt: “An illustration of a person trying to solve a puzzle with pieces that have different symbols and formulas on them. The person is looking at a screen that shows the 📋 Getting Ready guide and has a smile on their face.”
tidyverse
documentation instead of explaining it directly.Do you use ChatGPT, GitHub Copilot, or other AI tools?
“LSE takes challenges to academic integrity and to the value of its degrees with the utmost seriousness. The School has detailed regulations and processes for ensuring academic integrity in summative work.
Unless Departments provide otherwise in guidance on the authorised use of generative AI, its use in summative and formative assessment is prohibited. Departmental Teaching Committees are strongly encouraged to define what constitutes authorised use of Generative AI tools (if any) for students taking courses in their Department. Where they do so, they must clearly communicate this to colleagues, and to students.”
Source: LSE (2023) (Emphasis added)
Examples:
“I used ChatGPT to provide an initial solution to Question X. The code ran and worked fine, but as it was not efficient to the standards of vectorisation taught in the course, I had to edit the code myself to fix the issue.”
“I had GitHub Copilot autocomplete on when writing the code for Question X. The code produced was unnecessarily long and didn’t use the
pd.merge
command I learned in Week 08, so I went back and edited it.”
What do you think of generative AI tools?
Participating Courses:
How will it work:
Create a ChatGPT 3.5 (OR a Google Bard) account if you don’t have one already.
Open a new ‘chat window’ inside your selected chatbot and tell the AI:
‘I will use this chat for all things related to DS202W - Data Science for Social Scientists’
Sign up to GENIAL and help us find out!
Image created with DALL·E via Bing Chat AI bot. Prompt: “robots enjoying a coffee break. Circular tables, white room, pops of color, modern, cosy, clean flat design.”
Our first proper lecture will start in a few minutes.
“What really is data science? + R tips”
Don’t forget to fill out the GENIAL form:
“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡 insights into a problem or a phenomenon.
Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),
and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”
New data to answer old questions:
How do rumours spread?
New questions enabled by new data:
Is social media a threat to democracy?
We hope that in this reformulated version of the DS202 course, you will learn how to tackle similar questions that are relevant to your field of study.
You might ask:
“How is data science any different from what I have learned in other stats courses?”
👉 Traditional Statistics in the social sciences: the goal is typically explanation
👉 Data science: the focus is frequently put more on data exploration and prediction
It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.
This course is mostly about the ‘20%’ stage. Most of the data we will give you is already clean and ready to be modeled with machine learning.
Next week, we will discuss together what it means for a machine to learn something.
But first, a word about programming skills 👉
Data types
R Vectors
length()
on any variable:R Vectors (cont.)
This is straightforward:
Lists (not the same as vectors)
[[ ]]
[[1]]
) is a vector of size 1 ([1]
) that contains the number 2
, etc.Lists are more flexible than vectors (they are also slower to process)
Vectors are always flat:
yields a simple vector:
[1] 1 2 3 4 5
Obs: Python does not have vectors, only lists
Loops are not that different
Custom functions definition, compared
R has a base set of functions that come with the installation of the language
The base functions are OK - they are just not awesome.
The tidyverse
is not part of the base R installation, but it is a very popular package
It is actually a collection of several packages that make it easier to manipulate data (+ databases + plotting + modelling + etc.)
This is what we will use in this course. (We suffered tremendously teaching base R last year)
Note to Python users
Think of the tidyverse
as what pandas
is to Python
Example: reading a csv file
Example: selecting columns
The pipe operator
%>%
. But it is quite simple:%>%
, think of it as the word “then”.The pipe operator
|>
)
%>%
and |>
:Without method chaining
Example: filtering rows
Example: combing columns together
Example: grouping and summarizing
Say we have a random dataset:
# Generate a random my_data
my_data <- data.frame(col1 = sample(1:3, 100, replace = TRUE), col2 = rnorm(100))
If we want to calculate the mean of col2
for each value of col1
:
tidyverse
instead of pandas.LSE DS202W (2023/24) – Week 01 | Archive