LSE DS202 – Data Science for Social Scientists
29 Sep 2023
networks
optimisation
software engineering
data science workflow
machine learning applications
decision support systems
machine learning applications
databases
provenance
ethical AI/XAI
Write an e-mail to Kevin:
Sign up for DSI events at lse.ac.uk/DSI/Events
Follow the seminar series: 🔗 Link
Hear from alumni or industry experts about their career paths and how they got to where they are today.
Example of past event:
🗓️ Data Science Careers Panel and Networking (31 January)
A panel of alumni followed by Q&A and a networking session.
Panel:
Read more about this series of events: 🔗 Link
Programme | Count |
---|---|
BSc in Psychological and Behavioural Science | 39 |
BSc in Economics | 4 |
General Course | 4 |
BSc in Philosophy and Economics | 2 |
BSc in Politics and Data Science | 2 |
BSc in Politics and International Relations | 2 |
BA in Geography | 1 |
BA in Social Anthropology | 1 |
BSc in International Social and Public Policy | 1 |
BSc in International Social and Public Policy with Politics | 1 |
BSc in Sociology | 1 |
Erasmus Reciprocal Programme of Study | 1 |
MPhil/PhD in Psychological and Behavioural Science | 1 |
Year | Freq |
---|---|
1 | 8 |
2 | 4 |
3 | 48 |
What is this course about?
Focus: learn and understand the most fundamental machine learning algorithms
How: practical use of machine learning techniques and its metrics, applied to relevant data sets
What is this course about?
- No neural networks, no deep learning, no large-scale data
- Some but not a lot of theory, math proofs and derivations
- Lots of coding, examples and exercises
How will this course be taught?
How do I prepare for this course?
Important
There might be some preparatory work to do before each lab!
Always check Moodle/the webpage at least a day before coming to the lab.
Each week, you will have a roadmap of what to do.
The roadmap will typically contain the following elements:
Type of activity | Description |
---|---|
🧑🏻🏫 TEACHING MOMENT | Your class teacher deserves your full attention |
🎯 ACTION POINTS | Time to follow the steps in the roadmap. Try it for a bit, but if you get stuck, call your class teacher. |
👥 IN PAIRS/GROUPS | You will benefit from completing that task with your peers more than doing it alone |
🗣️ CLASSROOM DISCUSSION | Your class teacher will facilitate a discussion about the task |
📝 SUBMISSION | Submit your work |
👉 Now, let’s navigate our Moodle page to see the 📓 Syllabus and to talk about ✍️ Assessments & Feedback.
If you are reading this but you are not an LSE student, the same content is available on the course’s 🌐 public website
More on that later…
We assume that you have some basic knowledge of:
We assume that you have some basic knowledge of:
Image created with DALL·E via Bing Chat AI bot. Prompt: “An illustration of a person trying to solve a puzzle with pieces that have different symbols and formulas on them. The person is looking at a screen that shows the 📋 Getting Ready guide and has a smile on their face.”
tidyverse
documentation instead of explaining it directly.Do you use ChatGPT, GitHub Copilot, or other AI tools?
“LSE takes challenges to academic integrity and to the value of its degrees with the utmost seriousness. The School has detailed regulations and processes for ensuring academic integrity in summative work.
Unless Departments provide otherwise in guidance on the authorised use of generative AI, its use in summative and formative assessment is prohibited. Departmental Teaching Committees are strongly encouraged to define what constitutes authorised use of Generative AI tools (if any) for students taking courses in their Department. Where they do so, they must clearly communicate this to colleagues, and to students.”
Source: LSE (2023) (Emphasis added)
Examples:
“I used ChatGPT to provide an initial solution to Question X. The code ran and worked fine, but as it was not efficient to the standards of vectorisation taught in the course, I had to edit the code myself to fix the issue.”
“I had GitHub Copilot autocomplete on when writing the code for Question X. The code produced was unnecessarily long and didn’t use the
pd.merge
command I learned in Week 08, so I went back and edited it.”
What do you think of generative AI tools?
Participating Courses:
Data for Data Science
Data Science for Social Scientists
Databases
Sign up to GENIAL and help us find out!
How will it work:
W01 lab
Normal lab
W02 lab
Normal lab
W03 lab
W04 lab
Participants will be split into two groups:
W05 lab
Normal lab
W07 lab
Participants will be split into two groups at random:
W08 lab
Normal lab
W09 lab
Participants will be split into two groups at random:
W10 lab
Normal lab
W11 lab
Normal lab
Sign up to GENIAL and help us find out!
Sign up to GENIAL and help us find out!
Image created with DALL·E via Bing Chat AI bot. Prompt: “robots enjoying a coffee break. Circular tables, white room, pops of color, modern, cosy, clean flat design.”
Our first proper lecture will start in a few minutes.
“What really is data science? + R tips”
In the meantime, consider signing up for the GENIAL project:
“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡 insights into a problem or a phenomenon.
Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),
and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”
New data to answer old questions:
How do rumours spread?
New questions enabled by new data:
Is social media a threat to democracy?
We hope that in this reformulated version of the DS202 course, you will learn how to tackle similar questions that are relevant to your field of study.
You might ask:
“How is data science any different from what I have learned in other stats courses?”
👉 Traditional Statistics in the social sciences: the goal is typically explanation
👉 Data science: the focus is frequently put more on data exploration and prediction
It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.
This course is mostly about the ‘20%’ stage. Most of the data we will give you is already clean and ready to be modeled with machine learning.
Next week, we will discuss together what it means for a machine to learn something.
But first, a word about programming skills 👉
Data types
R Vectors
length()
on any variable:R Vectors (cont.)
This is straightforward:
Lists (not the same as vectors)
[[ ]]
[[1]]
) is a vector of size 1 ([1]
) that contains the number 2
, etc.Lists are more flexible than vectors (they are also slower to process)
Vectors are always flat:
yields a simple vector:
[1] 1 2 3 4 5
Obs: Python does not have vectors, only lists
Loops are not that different
Custom functions definition, compared
R has a base set of functions that come with the installation of the language
The base functions are OK - they are just not awesome.
The tidyverse
is not part of the base R installation, but it is a very popular package
It is actually a collection of several packages that make it easier to manipulate data (+ databases + plotting + modelling + etc.)
This is what we will use in this course. (We suffered tremendously teaching base R last year)
Note to Python users
Think of the tidyverse
as what pandas
is to Python
Example: reading a csv file
Example: selecting columns
The pipe operator
%>%
. But it is quite simple:%>%
, think of it as the word “then”.The pipe operator
|>
)
%>%
and |>
:Without method chaining
Example: filtering rows
Example: combing columns together
Example: grouping and summarizing
Say we have a random dataset:
# Generate a random my_data
my_data <- data.frame(col1 = sample(1:3, 100, replace = TRUE), col2 = rnorm(100))
If we want to calculate the mean of col2
for each value of col1
:
tidyverse
instead of pandas.LSE DS202 2023/24 Autumn Term | archive