π Data Dictionary: Tesco Grocery 1.0
A large-scale dataset of grocery purchases in London
The content of this page was copied from the original data dictionary available in (Aiello 2020).
For each geographic aggregation (LSOA, MSOA, Ward, Borough), the authors provide a file containing the aggregated information on food purchases, enriched with information coming from the census.
For more information on how they aggregated the data, please refer to the original paper:
Aiello, Luca Maria, Daniele Quercia, Rossano Schifanella, and Lucia Del Prete. βTesco Grocery 1.0, a Large-Scale Dataset of Grocery Purchases in Londonβ. Scientific Data 7, no. 1 (18 February 2020): 57. https://doi.org/10.1038/s41597-020-0397-7. (Aiello et al. 2020)
Field | Description |
---|---|
area_id |
identifier of the area |
weight |
Weight of the average food product, in grams |
volume |
Volume of the average drink product, in liters |
energy |
Nutritional energy of the average product, in kcals |
energy_density |
Concentration of calories in the areaβs average product, in kcals/gram |
{nutrient} |
Weight of {nutrient} in the average product, in grams. Possible nutrients are: carbs, sugar, fat, saturated fat, protein, fibre. The count of carbs include sugars and the count of fats includes saturated fats |
energy_{nutrient} |
Amount of energy from {nutrient} in the average product, in kcals |
h_nutrients_weight |
Diversity (entropy) of nutrients weight |
h_nutrients_weight_norm |
Diversity (entropy) of nutrients weight, normalized in [0,1] |
h_nutrients_calories |
Diversity (entropy) of energy from nutrients |
h_nutrients_calories_norm |
Diversity (entropy) of energy from nutrients, normalized in [0,1] |
f_{category} |
Fraction of products of type {category} purchased. Possible categories are: beer, dairy, eggs, fats & oils, fish, fruit & veg, grains, red meat, poultry, readymade, sauces, soft drinks, spirits, sweets, tea & coffee, water, and wine. |
f_{category}_weight |
Fraction of total product weight given by products of type {category} |
h_category |
Diversity (entropy) of food product categories |
h_category_norm |
Diversity (entropy) of food product categories, normalized in [0,1] |
h_category_weight |
Diversity (entropy) of weight of food product categories |
h_category_weight_norm |
Diversity (entropy) of weight of food product categories, normalized in [0,1]. |
representativeness_norm |
The ratio between the number of unique customers in the area and the number of residents as measured by the census; values are min-max normalized in [0,1] across all areas |
transaction_days |
Number of unique dates in which at least one purchase has been made by one of the residents in the area. |
num_transactions |
Total number of products purchased by Clubcard owners who are resident in the area. |
man_day |
Cumulative number of man-days of purchase (number of distinct days a customer has purchased something, summed all individual customers) |
population |
Total population of residents in the area according to the 2015 census. |
male |
Total male population in the area. |
female |
Total female population in the area. |
age_0_17 |
Total number of residents between 0 and 17 years old |
age_18_64 |
Total number of residents between 18 and 64 years old. |
age_65+ |
Total number of residents aged 65 years or more. |
avg_age |
Average age of residents according to the 2015 census |
area_sq_km |
Surface of the area (km^2) |
people_per_sq_km |
Population density per km^2 |
Where applicable, measures are accompanied by their standard deviation (fields with the suffix - _std
), the 95% confidence interval for the mean (suffix - _ci95
), and the values of the 2.5th, 25th, 50th, 75th, and 97.5th percentiles (suffix - _perc{value}
)