❌ Common W05 formative issues

Author

Dr Ghita Berrada

1 Issue 1: Using absolute paths to load files

🧩 The problem:

Many submissions loaded datasets using absolute paths, for example paths pointing directly to a personal Users/... folder.

This makes your work non-reproducible: someone else running your code on a different machine will not have the same folder structure, so the code will fail immediately.

This was one of the most common technical issues across submissions.

💡 The solution:

Always use relative paths that match the repository structure.

For example, instead of writing something like:

pd.read_csv("/Users/yourname/Documents/formative/data/Gravity.csv")

you should write something like:

pd.read_csv("data/Gravity.csv")

Relative paths make your work portable, reproducible, and much easier to mark.

2 Issue 2: Importing libraries in a scattered way

🧩 The problem:

In several submissions, libraries were imported in multiple places throughout the notebook rather than clearly at the top.

This makes the workflow harder to follow, and it can create confusion about which packages are actually required to run the analysis from start to finish.

💡 The solution:

Import all required libraries at the top of the document whenever possible.

That way:

  • the reader can immediately see what is needed
  • the workflow is easier to reproduce
  • the notebook reads more cleanly.

If you add a new package later while debugging, move that import back to the top before submitting.

3 Issue 3: Ignoring warnings rather than addressing them

🧩 The problem:

Many submissions showed warnings such as:

  • DtypeWarning
  • SettingWithCopyWarning
  • deprecation warnings
  • future warnings.

These warnings were often left unexplained.

Warnings do not always mean your code is wrong, but they often indicate that something is fragile, ambiguous, or likely to break later.

💡 The solution:

Do not just scroll past warnings.

Instead, briefly indicate what they mean and whether they matter for your analysis.

For example:

  • a DtypeWarning may indicate mixed types in some columns and should prompt you to inspect whether those columns are relevant
  • a SettingWithCopyWarning may mean that your transformation is being applied to a view rather than a guaranteed copy of the data.

Even if the warning does not ultimately affect the analysis, showing that you noticed it and understood it is much better than ignoring it.

4 Issue 4: Installing packages inside the notebook

🧩 The problem:

Some notebooks contained commands such as:

!pip install openpyxl
!pip install seaborn

inside the analytical notebook itself.

While this may appear convenient during development, it is not good practice in analytical notebooks, especially when the notebook is intended to be rendered as a report.

These commands can:

  • break rendering workflows (for example when rendering Quarto documents)
  • create inconsistent environments
  • make the analysis harder to reproduce reliably.

💡 The solution:

Package installation should normally be handled outside the notebook, using a properly configured environment (for example a conda environment or requirements file).

Analytical notebooks should focus on analysis and interpretation, not environment setup.

5 Issue 5: Plot titles that only describe the chart rather than the message

🧩 The problem:

A very common issue was the use of plot titles that simply described the figure, for example:

  • Distribution of trade flows
  • Scatter plot of HHI and trade openness
  • Lorenz curve
  • Top exporters

These titles tell the reader what the plot is, but not what they should learn from it.

💡 The solution:

A good plot title should communicate the main takeaway.

For example, instead of:

Distribution of bilateral trade flows (log scale)

a more informative title would be:

Most bilateral trade flows are very small relative to a small number of dominant trade relationships

Instead of:

Relationship between export concentration and trade openness

a stronger title would be:

Countries with more concentrated exports do not necessarily have low trade openness, suggesting these indicators capture different dimensions of trade structure

Instead of:

Lorenz curve of bilateral trade flows

a better title would be:

A small share of bilateral trade relationships accounts for most total trade

The title should help the reader interpret the figure before even reading the surrounding text.

6 Issue 6: Producing plots without interpreting them

🧩 The problem:

Many plots were generated correctly, but then left largely uninterpreted.

This was especially common for:

  • histograms of bilateral trade flows
  • Lorenz-type or concentration curves
  • scatter plots between indicators
  • outcome variable distribution plots.

A plot on its own is not an argument.

💡 The solution:

Every plot should be followed by an interpretation that explains:

  • what pattern is visible
  • why that pattern matters
  • what the pattern implies for the next part of the analysis.

For example, if you show that bilateral trade flows are extremely right-skewed, do not stop at saying the distribution is skewed. Go further and explain that this means:

  • a small number of exporter–importer pairs dominate world trade
  • many country pairs trade very little or not at all
  • aggregation is useful because raw bilateral flows are highly uneven and difficult to compare directly across countries.

Interpretation is what turns code and figures into analysis.

Good point. Using real but illustrative countries that were not used in submissions is cleaner pedagogically, and it also lets you bring in the kinds of economic reasoning you want students to demonstrate (regional dynamics, crises, commodity dependence, etc.).

Below is a revised Issue 7 that incorporates those ideas and gives more concrete economic reasoning. Everything else in the guide stays exactly as it was.

7 Issue 7: Choosing example countries without properly motivating them

🧩 The problem:

Many submissions compared specific exporters to illustrate trade concentration, but the choice of countries was often not properly motivated.

In several notebooks, code was written to identify the largest and smallest exporters, but this information was not actually used when introducing the examples.

As a result, the example pair could look arbitrary.

Good comparisons are not random: they should be motivated by what you expect the data to reveal about the structure of trade.

💡 The solution:

When choosing example countries, explain both:

  1. how they were identified in the data, and
  2. why they are analytically interesting in the specific year studied.

A strong explanation should not stop at saying that one country is large and another is small. It should also invoke:

  • the economic structure of the countries
  • their regional trade relationships
  • and the economic context of the analysis year.

For example, suppose you compare Germany and Argentina.

A weak explanation would be:

Germany is a large exporter while Argentina is a smaller exporter.

A stronger explanation would look more like this:

Germany is one of the largest exporters in the dataset and trades with a very large number of partners across Europe and globally. Its exports are heavily concentrated in manufacturing sectors such as automobiles, machinery, and industrial equipment, and its trade structure reflects its deep integration into European and global value chains.

Argentina, by contrast, has an export structure that is more strongly tied to a narrower set of commodities such as soy products and agricultural goods. In the analysis year, a large share of its exports is directed toward a smaller number of trading partners that demand these commodities. In addition, Argentina’s trade patterns are shaped by recurring macroeconomic instability and episodes of sovereign debt crisis, which can influence both export composition and partner relationships.

This kind of explanation links the comparison to:

  • what the dataset shows
  • the economic structure of exports
  • and the broader economic context affecting trade relationships.

Another example of a useful, motivated comparison might be between different kinds of commodity-dependent economies.

For example:

  • a country whose exports are concentrated in strategic global commodities (such as oil or copper), which tend to be demanded by many countries

  • versus a country whose exports depend on more specialised commodities that are traded with a much smaller set of partners.

A concrete illustration of this contrast would be Chile and Niger.

  • Chile is a major exporter of copper, a commodity that is widely used in global manufacturing, construction, and energy infrastructure. As a result, Chile exports to a large number of countries, including major industrial economies such as China, the United States, Japan, and members of the European Union.

  • Niger, by contrast, exports uranium, which is used primarily in nuclear energy production. Because uranium is purchased by a much smaller set of specialised buyers, Niger’s exports tend to be directed toward a much more limited group of trading partners.

This comparison is motivated by the structure of the export data: both countries are commodity-dependent, but the nature of the commodity they export leads to very different trade networks.

This example shows that export concentration can arise for different economic reasons. Some countries export a narrow range of commodities that are nevertheless demanded globally, while others depend on commodities that are traded within a much smaller network of partners. Interpreting indicators such as HHI therefore requires understanding the underlying economic structure of exports, not just the numerical value of the index.

8 Issue 8: Using pie charts for export shares

🧩 The problem:

A few submissions used pie charts to display export shares across trading partners.

Pie charts are not a good choice here.

Why?

  • once you have more than 2 or 3 slices, angles become difficult to compare accurately
  • comparisons across two pie charts are even harder
  • small differences in shares become visually unclear.

💡 The solution:

Use bar charts instead.

For export shares across partners, clearer alternatives include:

  • horizontal bar charts
  • ranked bar charts
  • stacked bar charts in limited cases.

These allow the reader to compare partner shares much more accurately than pie charts.

9 Issue 9: Not explaining why a particular year was chosen

🧩 The problem:

Year choice was often weakly justified or not justified at all.

Some students simply picked a year and proceeded.

Others gave a generic explanation without linking it to the actual datasets or the broader context of that year.

💡 The solution:

A year choice should be justified using the actual analysis context.

Possible justifications might include:

  • better overlap between CEPII, KOF, and World Bank datasets
  • better country coverage
  • lower missingness
  • avoiding highly atypical years if that matters for the question
  • relevance of the economic context of that year.

For example, choosing 2020 requires much stronger justification than choosing a more stable pre-pandemic year, because 2020 was heavily affected by COVID disruptions to both trade and national income growth.

Similarly, if you choose 2019, 2018, or 2017, you should explain why that year is preferable given overlap and missingness across sources.

A good year choice is never just “I picked this year because it exists in all datasets.”

10 Issue 10: Mishandling missing values in Question 2

🧩 The problem:

When constructing HHI and related trade indicators, missing values were often filtered out without explanation.

This matters because the construction of export shares and concentration measures depends heavily on which flows are kept and which are removed.

In particular, different choices imply different assumptions:

  • dropping missing flows may implicitly treat them as unavailable observations
  • restricting to positive trade flows excludes zero trade relationships
  • filling missing values with zero makes a much stronger assumption than simply dropping missing values.

💡 The solution:

When computing trade indicators, explain:

  • how missing trade values are handled
  • whether zero flows are kept or removed
  • what assumption that implies.

For example:

I restrict the analysis to non-missing positive trade flows because HHI is intended to summarise the distribution of observed export shares across actual trade partners. This means the resulting measure describes concentration among observed positive export destinations rather than concentration across all possible country pairs.

That is much clearer than silently filtering rows.

11 Issue 11: Not explaining the mechanics of aggregation

🧩 The problem:

Many students correctly stated that aggregation was necessary, but did not clearly explain how the aggregation worked.

The reader therefore could not always tell:

  • how total exports were computed
  • how export shares were constructed
  • why HHI was based on squared shares
  • how the final country-level dataset was built.

💡 The solution:

Explain the aggregation pipeline step by step.

For example:

  1. sum bilateral trade flows by exporter to obtain total exports
  2. divide each bilateral flow by the exporter’s total exports to obtain export shares
  3. square and sum those shares to compute HHI
  4. collapse to one row per exporter containing the chosen country-level indicators.

This matters because Q2 is not just about getting a correct number — it is about showing that you understand how bilateral data is transformed into country-level features.

12 Issue 12: Giving generic explanations of HHI rather than contextualised ones

🧩 The problem:

Many submissions defined HHI correctly in a general sense, but the explanation stayed too abstract.

For example, students often wrote that:

  • high HHI means concentrated exports
  • low HHI means diversified exports.

This is true, but it is not enough.

💡 The solution:

Use concrete examples from your own analysis.

For instance, if one exporter has:

  • a top partner share near 70%
  • a relatively high HHI

while another exports to over 200 partners with a low HHI, then explicitly use those examples to show what concentration means in practice.

That makes the interpretation much stronger than a textbook definition alone.

13 Issue 13: Not validating the aggregation output

🧩 The problem:

A common omission was the lack of a basic validation check after computing export shares.

Without such a check, it is hard to know whether the aggregation code behaved as intended.

💡 The solution:

After computing export shares, verify that they sum to 1 for each exporter, allowing for small floating-point error.

For example:

share_check = df.groupby("iso3_o")["export_share"].sum()

If the shares do not sum to approximately 1, something has gone wrong earlier in the aggregation pipeline.

Simple validation checks like this are very good analytical practice.

14 Issue 14: Not explaining dataset merges

🧩 The problem:

In Question 3, many submissions merged CEPII-derived indicators with KOF and World Bank data without explaining:

  • why a particular merge type was used
  • what assumptions the merge makes
  • how country coverage is affected
  • how missing values interact with the merge.

This is a serious issue because the merge is one of the main reasons the final modelling dataset becomes much smaller.

💡 The solution:

When explaining a merge, cover three things:

14.1 1. The mechanics

State clearly:

  • which keys are used
  • whether you use inner, left, or another type of join
  • whether country codes had to be cleaned or filtered first.

14.2 2. The assumptions

Explain what the merge assumes.

For example, an inner join assumes that you only want countries present in all datasets. That may be reasonable, but it also means losing countries with incomplete coverage.

If some country codes are dropped beforehand, explain why. For example, aggregate regional codes such as WLD, EAS, or HIC should not be merged as if they were countries.

14.3 3. The implications

State how many countries are lost and what that means analytically.

For example, if the World Bank file has much narrower coverage than CEPII and KOF, then the final dataset will reflect the limitations of the World Bank file rather than the broader trade datasets.

That implication should be made explicit.

15 Issue 15: Weak treatment of missingness in Question 3

🧩 The problem:

Missingness in the modelling dataset was often discussed too vaguely.

Common weak patterns included:

  • only reporting total missingness
  • not showing missingness by variable
  • claiming that missingness was not a problem simply because the final merged dataset had no missing values
  • not recognising that a clean final dataset may simply be the result of a very restrictive merge.

💡 The solution:

Before modelling, discuss missingness at the right level.

That means explaining:

  • which variables had missing values
  • how many observations would be lost if rows with missing values were dropped
  • whether the missingness appears concentrated among particular countries
  • whether the final sample may therefore be selective.

A final dataset with no missing values is not automatically a strength if it was obtained by excluding a large share of countries.

16 Issue 16: Reporting country loss without analysing what it means

🧩 The problem:

Many students reported that countries were lost during the merge, but stopped there.

The more important question is: what kind of countries were lost, and what does that imply for the analysis?

💡 The solution:

Go beyond the count.

Explain whether the countries lost are disproportionately:

  • small states
  • low-income countries
  • fragile states
  • territories
  • countries with weaker statistical capacity.

If so, then the modelling sample may not be representative of the broader population of trading economies.

That matters because it changes how cautiously the model results should be interpreted.

17 Issue 17: Using scatter plots without selecting the most relevant relationships

🧩 The problem:

In Question 3, some submissions produced scatter plots for relationships that were not the most relevant for checking redundancy or overlap among modelling predictors.

This made the discussion less focused.

💡 The solution:

Prioritise scatter plots and/or a correlation matrix for the variables whose overlap is most important for modelling.

For example, the following are usually more relevant than arbitrary pairings:

  • HHI vs diversification
  • HHI vs top-partner share
  • trade openness vs KOF trade globalisation
  • predictors likely to compete for similar information.

The point is not to produce many plots, but to produce the ones that help justify modelling choices.

18 Issue 18: Misinterpreting overlap, redundancy, and complementarity

🧩 The problem:

Some submissions used words like overlap, redundancy, and complementarity somewhat loosely.

For example, low pairwise correlation was sometimes treated as sufficient proof that variables were all safe to include.

That is too simplistic.

💡 The solution:

Be careful with these terms.

  • Redundant predictors carry almost the same information
  • Complementary predictors capture different but potentially related dimensions
  • Overlap means partial shared information, not necessarily a fatal problem.

Also remember that:

  • pairwise correlations are useful
  • but they are not the whole story about multicollinearity.

A sensible interpretation should connect the observed relationships to the substantive meaning of the variables.

19 Issue 19: Treating coefficient signs too literally

🧩 The problem:

Several submissions treated unexpected coefficient signs as if they directly overturned economic theory.

For example, if a trade-policy variable had a negative coefficient, some students concluded too quickly that more open economies therefore grow less.

That is too strong.

💡 The solution:

Regression coefficients must be interpreted cautiously, especially in small cross-sectional samples.

Unexpected signs can arise for many reasons, including:

  • sample composition
  • omitted variables
  • multicollinearity
  • measurement choices
  • the specific year studied
  • instability due to small sample size.

So instead of writing:

this means that more open countries grow less

a more appropriate interpretation would be:

in this sample and specification, the association is negative, but this should be interpreted cautiously given the limited sample and the possibility of omitted-variable bias and instability.

20 Issue 20: Weak justification of predictors

🧩 The problem:

A very common issue was weak predictor justification.

Students often listed the variables used in the model but did not clearly explain why those variables should help predict the outcome in this specific dataset and context.

Explanations were often generic (for example stating that a variable is “related to growth”) without linking that reasoning to the specific modelling task.

💡 The solution:

Predictor justification should connect:

  • the substantive meaning of the variable
  • the structure of the dataset
  • the specific modelling objective

For example:

Trade openness is included because countries that are more integrated into global markets may experience faster income growth through access to larger markets and technology diffusion.

Or:

Government effectiveness is included because stronger institutions may support more stable economic growth through better policy implementation and regulatory quality.

The key point is that predictor choices should be anchored in the dataset and the economic context, rather than appearing as an arbitrary list of variables.

21 Issue 21: Weak justification of metrics

🧩 The problem:

Many submissions reported (R^2), RMSE, and MAE without properly justifying why these metrics were useful in this specific context.

💡 The solution:

Explain the metric in relation to the actual outcome variable.

For example, here the target is adjusted net national income growth measured in percentage points.

So a strong justification might say:

RMSE is useful here because it is expressed in the same units as the target variable, so it tells us the typical size of prediction errors in percentage-point growth terms. This matters because large growth prediction errors are substantively costly: they mean the model is missing economically important differences in countries’ income dynamics. RMSE is especially informative when large mistakes are more concerning than many small ones, because it penalises large errors more heavily than MAE.

That is much better than simply saying RMSE penalises large errors without explaining why that matters here.

22 Issue 22: Reporting only test metrics

🧩 The problem:

Several submissions reported only test-set metrics for the baseline model and sometimes also for regularised models.

This makes it impossible to assess whether the model is:

  • overfitting
  • underfitting
  • or simply weak on both train and test data.

💡 The solution:

Always report training and test metrics side by side, ideally using the same set of metrics.

For example:

  • train (R^2), test (R^2)
  • train RMSE, test RMSE
  • train MAE, test MAE.

Only then can you properly comment on the generalisation gap.

23 Issue 23: Misunderstanding overfitting

🧩 The problem:

Some submissions identified overfitting simply because training performance was better than test performance.

But that is not enough.

If training performance is already low, then the main issue may be low predictive power, not overfitting.

💡 The solution:

To assess overfitting properly, look at:

  • how good training performance is in absolute terms
  • how much worse test performance is.

For example:

  • high train performance + much lower test performance suggests overfitting
  • low train performance + low test performance suggests weak signal, poor predictors, or model misspecification.

This distinction is important.

24 Issue 24: Claiming model assumptions are violated based on metrics alone

🧩 The problem:

Some submissions inferred that linear regression assumptions must be broken simply because:

  • (R^2) was low
  • RMSE was high
  • coefficients looked strange.

But metrics alone cannot diagnose linear regression assumptions.

💡 The solution:

Metrics are not enough.

To discuss assumptions such as:

  • non-linearity
  • heteroskedasticity
  • unusual residual structure

you need diagnostic plots, especially:

  • residuals vs fitted values
  • residual distributions.

A low (R^2) does not by itself prove that linearity is violated.

Likewise, odd coefficients do not automatically prove heteroskedasticity or non-linearity.

25 Issue 25: Not analysing residuals

🧩 The problem:

Residual analysis was often missing altogether.

In other cases, a residual plot was shown but barely interpreted.

💡 The solution:

Residual analysis should be treated as a core part of model evaluation, not an optional extra.

A good discussion of a residual plot should ask:

  • are the residuals roughly centred around zero?
  • is there a visible pattern suggesting non-linearity?
  • does the spread of residuals increase or decrease with fitted values?
  • are there extreme residuals that may indicate influential observations?

Residual diagnostics complement metrics. They do not replace them, and metrics do not replace them either.

26 Issue 26: Arbitrary regularisation parameters

🧩 The problem:

For Ridge and LASSO, many submissions set alpha to a value such as 0.1, 0.6, or 1.0 without justification.

That makes the modelling choice look arbitrary.

💡 The solution:

Even if cross-validation has not yet been taught formally, you could still perform a parameter sweep by testing a range of values and checking how:

  • coefficients change
  • train/test performance changes.

That would already be better than choosing one value without explanation.

Best practice, once known, is to use cross-validation to tune the parameter.

If you do use cross-validation later, you should also explain:

  • what the folds mean
  • how the number of folds affects the stability of the estimate.

27 Issue 27: Not explaining why Ridge or LASSO was chosen

🧩 The problem:

Some submissions ran Ridge and/or LASSO because those methods appeared in class, but without explaining why they made sense for this dataset.

Simply applying a method because it exists is not sufficient justification.

Model choices should be connected to the characteristics of the dataset and the modelling problem.

💡 The solution:

The choice between OLS, Ridge, and LASSO should be explicitly motivated.

For example:

I used Ridge regression because the dataset contains a relatively small number of observations and several predictors that may share information. In this context, shrinking coefficients is preferable to dropping variables too aggressively as LASSO might.

This explanation works because it connects the modelling choice to:

  • the limited sample size
  • the potential overlap between predictors
  • the desire to retain information rather than eliminate predictors entirely.

By contrast, LASSO might be more appropriate when there is reason to believe that some predictors are not contributing meaningful information and variable selection would therefore be helpful.

The key point is that the model choice should be anchored in the properties of the dataset, not simply applied mechanically.

28 Issue 28: Comparing coefficients across models that were estimated on different scales

🧩 The problem:

Some submissions compared coefficients across models even though some models used scaled predictors and others did not.

This creates a problem because coefficients from scaled and unscaled models are not expressed in the same units.

When predictors are standardised:

  • coefficients measure the change in the outcome associated with a one standard deviation change in the predictor.

When predictors are not scaled:

  • coefficients measure the change in the outcome associated with a one unit change in the predictor.

These two quantities are not directly comparable.

As a result, differences in coefficient magnitude may simply reflect differences in scaling, not meaningful differences in model behaviour.

💡 The solution:

If you want to compare coefficients across models, you must ensure that they are expressed on comparable scales.

There are two main options:

  1. Scale predictors consistently across all models before fitting them.

  2. If models are estimated on different scales, explicitly acknowledge this and avoid treating the coefficients as directly comparable.

Without addressing the scaling issue, coefficient comparisons can easily lead to incorrect conclusions about which predictors are more important.

29 Issue 29: Reporting regularised model results without interpreting them

🧩 The problem:

Many notebooks ran Ridge and/or LASSO and then reported the outputs without properly discussing:

  • whether predictive performance improved
  • whether coefficients became more stable
  • whether the regularisation actually helped.

💡 The solution:

When presenting regularised models, do not stop at the output.

Explain:

  • whether performance improved on the test set
  • whether the shrinkage changed interpretation materially
  • whether the regularised model seems more appropriate than baseline OLS.

A regularised model is only useful if you explain what problem it was meant to solve and whether it did so.

30 Issue 30: Missing or underdeveloped model comparison

🧩 The problem:

A common final issue was weak comparison between baseline and regularised models.

Some students reported the numbers but did not synthesise the overall lesson.

💡 The solution:

At the comparison stage, step back and answer the big question:

  • which model predicts best?
  • which model generalises best?
  • which model is easier to interpret?
  • what do the results suggest about the underlying data?

For example, if OLS, Ridge, and LASSO all perform similarly poorly, then the main issue is probably not the estimator but the limited predictive signal and the difficulty of the problem.

That is the kind of conclusion the comparison section should deliver.

31 Final note

Across all questions, the same general principle applies:

code is not enough. You need to explain:

  • what you did
  • why you did it
  • what assumptions it relies on
  • and what the results mean in context.

The strongest submissions were not necessarily the ones with the most code, but the ones where the code, reasoning, and interpretation worked together clearly.

If you want, I can next turn this into a cleaner final Quarto page with callout boxes and a slightly more polished “Ghita-style” tone so it is ready to post directly.