DS101 – Fundamentals of Data Science
27 Feb 2023
We also saw how linear regression models are normally represented mathematically:
The generic supervised model:
\[ Y = \operatorname{f}(X) + \epsilon \]
is defined more explicitly as follows ➡️
\[ \begin{align} Y = \beta_0 +& \beta_1 X + \epsilon, \\ \\ \\ \end{align} \]
when we use a single predictor, \(X\).
\[ \begin{align} Y = \beta_0 &+ \beta_1 X_1 + \beta_2 X_2 \\ &+ \dots \\ &+ \beta_p X_p + \epsilon \end{align} \]
when there are multiple predictors, \(X_p\).
Note
The typical linear model assumes that:
Important
Barely any real-world process is linear.
We often want to use a model to make predictions
Let’s revisit the simple linear model from W05:
Under what conditions would this model predict a decision power of 100%?
Note
We will focus a lot more on predictions from now on.
This is because Machine Learning, in practice, is all about making predictions.
Machine Learning (ML) is a subfield of Computer Science and Artificial Intelligence (AI) that focuses on the design and development of algorithms that can learn from data.
INPUT (data)
⬇️
ALGORITHM
⬇️
OUTPUT (prediction)
(🗓️ Week 07)
(🗓️ Week 08)
If we assume there is a way to map between X and Y, we could use SUPERVISED LEARNING to learn this mapping.
Suppose you want to be alerted when politicians spend their additional allowances unlawfully: travel costs, catering functions, stationery and postage costs, etc…
How would you approach this problem?
Image sources: Flickr | UK Parliament & Receipts Receipt Pay | Pixabay
All of this information constitutes our input.
Receipt ID | Number of items | Average value per item | Total Value | Distance from constituency | … | SUSPICIOUS |
---|---|---|---|---|---|---|
#24321234 | 3 | £ 10.47 | £ 78.00 | 200 Miles | … | No |
#24321235 | 1 | £ 100.00 | £ 100.00 | 100 Miles | … | Yes |
#24321236 | 2 | £ 50.00 | £ 100.00 | 50 Miles | … | No |
#98755645 | 5 | £ 20.00 | £ 100.00 | 10 Miles | … | No |
#24321236 | 2 | £ 50.00 | £ 100.00 | 50 Miles | … | No |
… | … | … | … | … | … | … |
There was an open source crowdfunded project to perform exactly this, using data from the Brazilian congress.
Source: https://serenata.ai/en/
Source: https://serenata.ai/en/
After the break:
Consider a binary response:
\[ Y = \begin{cases} 0 \\ 1 \end{cases} \]
We model the probability that \(Y = 1\) using the logistic function (aka. sigmoid curve):
\[ Pr(Y = 1|X) = p(X) = \frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1 X}} \]
Source of illustration: TIBCO
Say you have a dataset with two classes of points:
The goal is to find a line that separates the two classes of points:
It can get more complicated than just a line:
Image from Stack Exchange | TeX
LSE DS101 2022/23 Lent Term (archive)