🗓️ Week 07
Machine Learning I

DS101 – Fundamentals of Data Science

Dr. Jon Cardoso-Silva

LSE Data Science Institute

27 Feb 2023

What we will cover today:

  • ⏪ Recap
  • What is Machine Learning?
  • Supervised Learning
  • Algorithms
  • What’s next?

⏪ Recap

  • A model is a mathematical representation of a real-world process, a simplified version of reality.
  • We saw examples of a linear regression model.
    • It contains several assumptions that need to be met for the model to be valid.
    • Social scientists have been using linear regression models for decades.
  • It is a widely used statistical technique for exploring the relationship between a (numeric) response variable and one or more explanatory variables.

⏪ Recap: Linear Regression

We also saw how linear regression models are normally represented mathematically:

The generic supervised model:

Y=f(X)+ϵ

is defined more explicitly as follows ➡️

Simple linear regression

Y=β0+β1X+ϵ,

when we use a single predictor, X.

Multiple linear regression

Y=β0+β1X1+β2X2+…+βpXp+ϵ

when there are multiple predictors, Xp.

Note

  • As a well-studied statistical technique, we know a lot about the properties of the model.
  • Researchers can use this knowledge to assess the validity of the model, using things like confidence intervals, hypothesis testing, and many other model diagnostics.
  • End the slide with: what about the limitations of linear regression models?

Limitations of Linear Regression Models

The typical linear model assumes that:

  • the relationship between the response and the predictors is linear.
  • the error terms are independent and identically distributed.
  • the error terms have a constant variance.
  • the error terms are normally distributed.

Important

Barely any real-world process is linear.

Making Predictions

We often want to use a model to make predictions

  • either about the future
  • or about new observations.

⏪ The simple linear model from W05

Let’s revisit the simple linear model from W05:

What happens when we extrapolate?

Under what conditions would this model predict a decision power of 100%?

The thing about predictions

  • We can use a model to make predictions about the future.
  • We can also use a model to make predictions about new observations.
  • But we need to be careful about the assumptions that we make about the data.

Note

We will focus a lot more on predictions from now on.

This is because Machine Learning, in practice, is all about making predictions.

  • I highlight assumptions because this is often overlooked by ML practitioners.
  • Thinking about assumptions, or the lack of them, is a main topic of this course.

What is Machine Learning?

A definition

Machine Learning (ML) is a subfield of Computer Science and Artificial Intelligence (AI) that focuses on the design and development of algorithms that can learn from data.

How it works

https://xkcd.com/1838

How it works

  • The process is similar to the regular algorithms (think recipes) we saw in W03.
  • Only this time, the ingredients are data.

INPUT (data)

⬇️

ALGORITHM

⬇️

OUTPUT (prediction)

Types of Machine Learning

Supervised
Learning

  • Each data point is associated with a label.
  • The goal is to learn a function that maps the data to the labels.

(🗓️ Week 07)

Unsupervised
Learning

  • There is no such association between the data and the labels.
  • The focus is on similarities between data points.

(🗓️ Week 08)

Supervised Learning

Supervised Learning

X X (Input) Y Y (Output) X->Y f: X → Y


If we assume there is a way to map between X and Y, we could use SUPERVISED LEARNING to learn this mapping.

Supervised Learning

X X (Input) Y Y (Output) X->Y f: X → Y


  • The algorithm will teach itself to identify changes in Y based on the values of X​
  • A dataset of labelled data is required​
  • Prediction: presented with new X, the algorithm will be able to predict how Y would be like​

Supervised Learning

X X (Input) Y Y (Output) X->Y f: X → Y


  • If Y is a numerical value, we call it a regression problem.​
  • If Y is a category, we call it a classification problem.​

Supervised Learning

X X (Input) Y Y (Output) X->Y f: X → Y


  • There are countless ways to do that; each algorithm does it in its own way.​
  • Here are some names of basic algorithms:​
    • Linear regression​
    • Logistic regression​
    • Decision trees​
    • Support vector machines​
    • Neural networks​

One Practical Example

Suppose you want to be alerted when politicians spend their additional allowances unlawfully: travel costs, catering functions, stationery and postage costs, etc…

How would you approach this problem?

Image sources: Flickr | UK Parliament & Receipts Receipt Pay | Pixabay

What is in the receipt?

  • Date of the transaction
  • Amount of money spent
  • Description of the transaction
  • Category of the transaction
  • Address of the merchant

All of this information constitutes our input.

  • Note that you might need to transform the data first.
  • For example, you might want to extract the date from the description.

What about the output?

  • The output is something you would normally want to predict or attempt to explain​ 1
  • Also referred to as the label, “ground truth”, target or dependent variable, or simply Y​
  • Most commonly it is a numerical or categorical variable​
  1. See “To Explain or To Predict?” (Shmueli 2010)​

Structure of dataset

Example of a structured dataset for this problem
Receipt ID​ Number of items​ Average value per item​ Total Value​ Distance from constituency​ … SUSPICIOUS​
#24321234​ 3​ £ 10.47​ £ 78.00​ 200 Miles​ … No​
#24321235​ 1​ £ 100.00​ £ 100.00​ 100 Miles​ … Yes​
#24321236​ 2​ £ 50.00​ £ 100.00​ 50 Miles​ … No​
#98755645 5 £ 20.00 £ 100.00 10 Miles … No
#24321236​ 2​ £ 50.00​ £ 100.00​ 50 Miles​ … No​
…​ …​ …​ …​ …​ … …​

Side story

There was an open source crowdfunded project to perform exactly this, using data from the Brazilian congress.

https://serenata.ai/en/

Side story

Source: https://serenata.ai/en/

Side story

Source: https://serenata.ai/en/

Time for coffee ☕

After the break:

  • Example of supervised learning algorithms
  • Self-supervised learning

Algorithms

The Logistic Regression model

Consider a binary response:

Y = \begin{cases} 0 \\ 1 \end{cases}

We model the probability that Y = 1 using the logistic function (aka. sigmoid curve):

Pr(Y = 1|X) = p(X) = \frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1 X}}

Source of illustration: TIBCO

  • This is how this function looks like

The Decision Tree model

The Support Vector Machine model

The Support Vector Machine model

Say you have a dataset with two classes of points:

The Support Vector Machine model

The goal is to find a line that separates the two classes of points:

The Support Vector Machine model

It can get more complicated than just a line:

  • Explain the dataset

Neural Networks

Image from Stack Exchange | TeX

Deep Learning

  • Stack a lot of layers on top of each other, and you get a deep neural network.
  • Forget about interpretability!

Attention

  • The Transformer architecture is a deep learning model that uses attention to learn contextual relationships between words in a text.
  • It has revolutionized the field of Natural Language Processing (NLP).

What’s next?

What’s next?

  • 🗓️ Week 08: Unsupervised learning
  • 🗓️ Week 09: More on unstructured data
    • You will also explore how to measure the performance of your models
  • 🗓️ Week 10: We’ll think deeper about what it means to predict something
    • We’ll also talk about fairness and bias in machine learning

References

Shmueli, Galit. 2010. “To Explain or to Predict?” Statistical Science 25 (3). https://doi.org/10.1214/10-STS330.

LSE DS101 2022/23 Lent Term (archive)

1 / 40
🗓️ Week 07 Machine Learning I DS101 – Fundamentals of Data Science Dr. Jon Cardoso-Silva LSE Data Science Institute 27 Feb 2023

  1. Slides

  2. Tools

  3. Close
  • 🗓️ Week 07 Machine Learning I
  • What we will cover today:
  • ⏪ Recap
  • ⏪ Recap: Linear Regression
  • Limitations of Linear Regression Models
  • Making Predictions
  • ⏪ The simple linear model from W05
  • What happens when we extrapolate?
  • The thing about predictions
  • What is Machine Learning?
  • A definition
  • How it works
  • How it works
  • Types of Machine Learning
  • Supervised Learning
  • Supervised Learning
  • Supervised Learning
  • Supervised Learning
  • Supervised Learning
  • One Practical Example
  • What is in the receipt?
  • What about the output?
  • Structure of dataset
  • Side story
  • Side story
  • Side story
  • Time for coffee ☕
  • Algorithms
  • The Logistic Regression model
  • The Decision Tree model
  • The Support Vector Machine model
  • The Support Vector Machine model
  • The Support Vector Machine model
  • The Support Vector Machine model
  • Neural Networks
  • Deep Learning
  • Attention
  • What’s next?
  • What’s next?
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help