import pandas as pd
from plotnine import ggplot, aes, geom_bar, labs, scale_fill_discrete, element_text, xlab, ylab
from plotnine.coords import coord_flip
from plotnine.themes import theme, theme_bw
= pd.read_csv("OECD_GDP_dataset_1972_2022.csv")
df
def avg_gdp_per_year(data,year):
return data.query('TIME==@year')['Value'].mean()
Introduction to Quarto and Zotero
Understanding missing data mechanisms
There are various reasons for data to be missing:
- according to Donald Rubin (Rubin 1976), data values can be Missing Completely at Random (MCAR) i.e the missingness of these values follows a totally random pattern and there isn’t anything in the data driving it that you need to be further concerned with (this is the best case scenario and can be dealt with with rather simplistic methods)
- data values can also follow the Missing at Random (MAR) mechanism. In this case, the missingness depends on other variables observed in the data you you collected but not on unseen data and not on the variable that is missing data itself.
- Finally, if the missing values depend on unobserved data and/or the variable with missing values itself, then we have a case of Not Missing At Random (MNAR) Mechanism.
Data Analysis Case Study: GDP (OECD data)
Trends in GDP per capita, 1972 to 2022
In 1972, GDP per capita was 136659.17152479032 dollars on average across the globe. By 2007, that number had increased to 136659.17152479032 dollars . In 2022 (latest year available), the GDP per capita was 2607674.1874535047 dollars.
GDP table for selected countries
Country | 2018 | 2019 | 2020 |
---|---|---|---|
United Kingdom | 47 108 | 49 220 | 45 757 |
India | 6 799 | 7 104 | 6 644 |
China | 15 466 | 16 625 | 17 177 |
Spain | 40 777 | 43 136 | 38 031 |
United States | 62 450 | 64 690 | 63 481 |
Italy | 43 428 | 45 800 | 43 150 |
France | 46 337 | 50 227 | 47 982 |
GDP barplot for selected countries
Criticism of the GDP measure
GDP is heavily criticized as measure of economic welfare (Kapoor and Debroy 2019; Coyle 2017). It is not an appropriate or good measure of human well-being and fails to capture the distribution of income across society. It is overly focused on measuring production outputs in economies where services play an increasingly larger role. Unpaid work, though clearly contributing to economies in myriads of ways, is not measured by it. Nor does it concern itself with the environmental cost or externalities of the measured economic outputs.