๐ W08 Summative
2025/26 Winter Term
โฒ๏ธ Due Date:
- 11 March 2026 at 5pm (London time)
If you update your files on GitHub after this date without an authorised extension, you will receive a late submission penalty.
Did you have an extenuating circumstance and need an extension? Send an e-mail to ๐ง
๐ฏ Main Objectives:
- Demonstrate your ability to write a report in Quarto Markdown
- Demonstrate your ability to fit a linear/logistic regression model
- Demonstrate your ability to interpret and evaluate the performance of a linear/logistic regression model
- Demonstrate your understanding of supervised learning techniques
- Demonstrate your ability to defend your model choices
โ๏ธ Assignment Weight:
This assignment is worth 30% of your final grade in this course.
30%
โYour candidate number is a unique five digit number that ensures that your work is marked anonymously. It is different to your student number and will change every year. Candidate numbers can be accessed using LSE for You.โ
Source: LSE
๐ Instructions
Go to our Slack workspaceโs
#ds202w-centralchannel to find a GitHub Classroom link. Do not share this link with anyone outside this course!Click on the link, sign in to GitHub and then click on the green button
Accept this assignment.You will be redirected to a new private repository created just for you. The repository will be named
ds202w-2025-2026-w08-summative--yourusername, whereyourusernameis your GitHub username. The repository will be private and will contain aREADME.mdfile with a copy of these instructions.Recall what is your LSE CANDIDATE NUMBER. You will need it in the next step.
Create your own
<CANDIDATE_NUMBER>.qmdfile with your answers, replacing the text<CANDIDATE_NUMBER>with your actual LSE number.
You can create a .qmd file from a Jupyter notebook (i.e .ipynb) by going on the VSCode Terminal, making sure you are in the same directory as your Jupyter notebook (use the pwd to check which directory youโre in and cd command to change directory if needed) and then typing the following command:
quarto convert <CANDIDATE_NUMBER>.ipynbwhere <CANDIDATE_NUMBER>.ipynb is the name of the Jupyter notebook you want to convert into .qmd
Also check out the Quarto documentation to better understand the conversion from ipynb to qmd.
And check out this tutorial if you want to better understand the commands you can run on your VSCode terminal (e.g to change current directory).
You can also use the .qmd file you used in the W01 lab as a template. Just remove anything that is not relevant to this assignment.
Then, replace whatever is between the
---lines at the top of your newly created.qmdfile with the following:--- title: "DS202W - W08 Summative" author: <CANDIDATE_NUMBER> output: html self-contained: true jupyter: python3 engine: jupyter editor: render-on-save: true preview: true ---Once again, replace the text
<CANDIDATE_NUMBER>with your actual LSE CANDIDATE NUMBER. For example, if your candidate number is12345, then your.qmdfile should start with:--- title: "DS202W - W08 Summative" author: 12345 output: html self-contained: true jupyter: python3 engine: jupyter editor: render-on-save: true preview: true ---Fill out the
.qmdfile with your answers. Use headers and code chunks to keep your work organised. This will make it easier for us to grade your work. Learn more about the basics of markdown formatting here.Use the
#helpchannel on Slack liberally if you get stuck.Once you are done, click on the
Renderbutton at the top of the.qmdfile. This will create an.htmlfile with the same name as your.qmdfile. For example, if your.qmdfile is named12345.qmd, then the.htmlfile will be named12345.html.Ensure that your
.qmdcode is reproducible, that is, if we were to restart VSCode and run your notebook from scratch, from top to the bottom, we would get the same results as you did.Push both files to your GitHub repository. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment. Not sure how to use Git on your computer? You can always add the files via the GitHub web interface.
Read the section How to get help and how to collaborate with others at the end of this document.
โWhat do I submit?โ
You will submit two files:
A Quarto markdown file with the following naming convention:
<CANDIDATE_NUMBER>.qmd, where<CANDIDATE_NUMBER>is your candidate number. For example, if your candidate number is12345, then your file should be named12345.qmd.An HTML file render of the Quarto markdown file. To generate a render, the easiest way is to include these lines
editor:
render-on-save: true
preview: truein your .qmd header so that an HTML file is generated each time you preview your document (make sure you also have the Quarto extension installed in VSCode so that you do the preview by clicking on a button at the top right corner of the VSCode menu bar without having to use the Terminal!). Also, donโt forget to add the line self-contained: true to your .qmd header, otherwise none of your plots will show!
Your .qmd header should look something like this:
---
title: "โ๏ธ W08 Summative"
author: <CANDIDATE_NUMBER>
format: html
self-contained: true
jupyter: python3
engine: jupyter
editor:
render-on-save: true
preview: true
---- An HTML file render of the Quarto markdown file. Make sure your file is self-contained (i.e figures are embedded within it)
You donโt need to click to submit anything. Your assignment will be automatically submitted when you commit AND push your changes to GitHub. You can push your changes as many times as you want before the deadline. We will only grade the last version of your assignment. Not sure how to use Git on your computer? You can always add the files via the GitHub web interface.
Your document should:
Be clearly structured using section headings.
Combine code, output, and written interpretation in a coherent way.
Contain clearly labelled figures and tables where appropriate.
Include sufficient explanation for a reader to understand:
your data construction decisions,
your modelling choices,
your evaluation strategy,
and your conclusions.
Code should support your analysis โ it should not replace explanation.
๐ Your tasks
What do we actually want from you?
- Model selection matters: You should make clear, justified choices about which models to use. Resist the temptation to try every model you know! State your modelling hypotheses clearly, justify your choices, and select only a few models to explore (more models does not mean a better grade!).
- Dimensionality reduction: You may use PCA, UMAP, or similar techniques if you think it helps your modelling. You must justify their use. These are optional.
- Justification is crucial: Simply presenting code without explanation will not get you high marks. In fact, simply lining up code with no or little explanation is a fail. You must justify your modelling choices (e.g., why a model is suitable for the dataset and problem context, how parameters were set), and interpret results and metrics in context.
- Evaluation: For all models, evaluate performance and justify your choice of metrics or evaluation strategy. Do not assume a single metric suffices for all conclusions.
Part 1: Predicting Next-Month Gold Returns (50 marks)
Gold is often described as a hedge against inflation and a โsafe havenโ during periods of financial stress. Because gold does not generate cash flows (unlike bonds or equities), its valuation is closely linked to real interest rates, inflation expectations, global uncertainty, currency movements, and financial conditions. That makes gold an interesting forecasting target: its dynamics may reflect macroeconomic conditions, risk sentiment, currency movements, and global real activity.
In this part, you will build models to predict next-month gold returns using macro-financial indicators.
1.1 Data
You are provided with two datasets:
๐ฅ Main monthly dataset
Contains monthly gold prices and a set of macro-financial predictors.
๐ฅ Weekly financial conditions dataset
Contains the Chicago Fed National Financial Conditions Index (weekly frequency).
You must integrate both datasets into a single modelling dataset.
1.2 Indicators
The economic and financial indicators available are as follows:
| Indicator (dataset variable) | Meaning | Source |
|---|---|---|
Gold price (gold_price)
|
Monthly gold futures price (USD). This is used to construct the return series that you will predict. | Investing.com โ Gold Futures Historical Data |
US 10-Year Treasury Yield (us_10y_yield)
|
The yield investors receive on 10-year US government bonds. Higher yields increase the opportunity cost of holding non-yielding assets like gold. | FRED: GS10 |
10-Year Breakeven Inflation (us_10y_breakeven_inflation)
|
Market-implied inflation expectations derived from Treasury Inflation-Protected Securities (TIPS). | FRED: T10YIEM |
Trade-Weighted US Dollar Index (usd_broad_index)
|
Broad index of US dollar strength against major trading partners. Since gold is priced in USD, exchange rate movements can affect demand and pricing. | FRED: TWEXBGSMTH |
Brent Crude Oil Price (brent_price)
|
Global oil benchmark price (USD). Often used as a proxy for inflationary pressures and global demand conditions. | FRED: POILBREUSDM |
MOVE Index (MOVE_index)
|
Implied volatility in US Treasury markets. A proxy for bond market uncertainty and risk. | Investing.com โ ICE BofA MOVE |
MSCI World (msci_world)
|
Equity performance across developed markets. Often interpreted as a proxy for global risk appetite. | Investing.com โ MSCI World |
MSCI Emerging Markets (msci_emerging_markets)
|
Equity performance across emerging markets. | Investing.com โ MSCI EM |
S&P GSCI (sp_gsci)
|
Broad commodity index capturing global commodity price movements. | Investing.com โ S&P GSCI |
Commodity Market Volatility (commodity_equity_volatility)
|
News-based measure of equity market volatility specific to commodity markets. | FRED: EMVCOMMMKT |
US Inflation Sentiment (us_inflation_expectations)
|
Consumer survey-based measure of expected price developments in the United States. | FRED: CSINFT02USM460S |
Euro Area Inflation Sentiment (euro_area_inflation_expectations)
|
Consumer survey-based measure of expected price developments in the Euro Area. | FRED: CSINFT02EZM460S |
World Trade Volume (world_trade_volume)
|
Global merchandise trade index (fixed base 2021=100). Proxy for global demand conditions. | CPB World Trade Monitor |
World Industrial Production (world_industrial_production)
|
Global industrial production index. Proxy for real economic activity. | CPB World Trade Monitor |
Global Supply Chain Pressure Index (gscpi)
|
Composite index measuring global supply chain disruptions. | Federal Reserve Bank of New York โ GSCPI |
| National Financial Conditions Index (to construct) | Weekly index summarising credit conditions, leverage, and risk in US financial markets. Must be aggregated to monthly frequency. | Federal Reserve Bank of Chicago โ NFCI |
1.3 Construction and Finalisation of the Dataset (5 marks)
1.3.1 Construct the Target Variable
Let \(P_t\) denote the gold price in month \(t\).
Define the monthly return:
\[ r_t = 100 \times \frac{P_t - P_{t-1}}{P_{t-1}} \]
Your prediction target is:
\[ r_{t+1} \]
That is, the return realised in the month following the predictors.
Alignment in practice
Each row of your modelling dataset corresponds to month \(t\). The predictors observed in month \(t\) are paired with the return realised in month \(t+1\).
Example:
- January 2020 price = 1550
- February 2020 price = 1600
\[ r_{\text{Feb 2020}} = 100 \times \frac{1600 - 1550}{1550} = 3.23 \]
The row containing January 2020 predictors is associated with 3.23 (i.e with \(r_{\text{Feb 2020}}\)) as the outcome.
Construct this target variable and clearly verify that the alignment is correct.
1.3.2 Construct Real Yield
Gold is frequently analysed relative to real interest rates, not nominal ones. The real yield reflects the inflation-adjusted return on government bonds and captures the opportunity cost of holding a non-yielding asset like gold.
Construct the real yield as:
\[ \text{Real Yield}_t = \text{GS10}_t - \text{T10YIEM}_t \]
Briefly explain why this variable may be economically relevant for gold prices.
1.3.3 Integrate Financial Conditions
The NFCI dataset is observed at weekly frequency.
Aggregate the NFCI series to monthly frequency, merge it with the main dataset, and justify your aggregation decision.
1.3.4 Dataset Assessment
Provide a concise assessment of your final modelling dataset:
- Time coverage
- Missing values
- Potential modelling implications
1.4 Data Exploration (10 marks)
Organise your code and markdown clearly.
Your exploration must include:
- Plot the evolution of gold prices over time.
- Construct and plot monthly gold returns.
- Identify the five months with the largest absolute returns and comment briefly on possible economic context.
- Compute correlations between gold returns and at least five predictors of your choice. Present them clearly (table or heatmap).
- Produce at least one multi-variable visualisation involving gold returns and two predictors (e.g. colour, size, or faceting). Interpret what it suggests.
Your interpretation should focus on economic meaning and modelling implications.
1.5 Modelling (23 marks)
1.5.1 Train/Test Split
Choose and justify a time-based train/test split appropriate for forecasting.
1.5.2 Baseline Model
Build a baseline linear regression model to predict next-month gold returns.
Interpret its coefficients and evaluate its performance. Justify your modelling and evaluation choices.
1.5.3 Alternative Approach
Develop and evaluate an alternative modelling approach.
An alternative may involve:
- different feature engineering decisions (e.g., lags, transformations, scaling, interaction terms),
- a different regression model suitable for continuous outcomes,
- a different validation strategy within the training period.
The objective is not to maximise performance through extensive experimentation. A small number of well-justified modelling decisions is preferable to many loosely motivated models.
Explain clearly:
- why the alternative approach was chosen,
- how it differs from the baseline,
- whether and why it improves (or does not improve) performance.
1.6 Discussion (12 marks)
Discuss:
- How predictable gold returns appear to be in your sample.
- Which predictors seem most economically meaningful.
- The limitations of your modelling strategy.
- What you would explore next if given additional time or data.
Your discussion should reflect on modelling decisions, uncertainty, and economic interpretation.
Part 2: Predicting Sovereign Credit Ratings (50 marks)
Sovereign credit ratings reflect assessments of a countryโs ability and willingness to meet its debt obligations. These ratings are issued by credit rating agencies such as Standard & Poorโs, Moodyโs, and Fitch. While agencies may differ slightly in methodology, ratings broadly reflect macroeconomic performance, institutional quality, fiscal sustainability, and external vulnerability.
In this part, you will work with Fitch sovereign ratings and build models to predict whether a country will be classified as Investment Grade (IG) in the following year.
The dataset is structured as a countryโyear panel.
2.1 Data and Indicators
The dataset merges:
- World Bank governance indicators
- World Bank macroeconomic indicators
- IMF World Economic Outlook data
- IMF Real Effective Exchange Rate data
- Fitch sovereign ratings
Each row corresponds to a countryโyear observation.
The indicators available are summarised below.
| Indicator (dataset variable) | Meaning | Source |
|---|---|---|
Sovereign Rating (rating)
|
Letter rating assigned by Fitch (e.g. AAA, BBB, BB+, etc.). Ratings reflect the agencyโs assessment of default risk. | Fitch Ratings |
Control of Corruption (CC.EST)
|
Perception of the extent to which public power is exercised for private gain. Higher values indicate better governance quality. | World Bank โ Worldwide Governance Indicators |
Government Effectiveness (GE.EST)
|
Quality of public services and policy implementation. | World Bank โ Worldwide Governance Indicators |
Regulatory Quality (RQ.EST)
|
Ability of the government to formulate and implement sound policies. | World Bank โ Worldwide Governance Indicators |
Rule of Law (RL.EST)
|
Confidence in contract enforcement and property rights. | World Bank โ Worldwide Governance Indicators |
GDP Growth (NY.GDP.MKTP.KD.ZG)
|
Annual percentage growth rate of GDP at constant prices. | World Bank |
Current Account Balance (% GDP) (BN.CAB.XOKA.GD.ZS)
|
Net exports plus net income and transfers, expressed as % of GDP. | World Bank |
Trade Openness (% GDP) (NE.TRD.GNFS.ZS)
|
Sum of exports and imports as percentage of GDP. | World Bank |
Gross Government Debt (% GDP) (GGXWDG_NGDP)
|
General government gross debt as percentage of GDP. | IMF World Economic Outlook |
Structural Balance (% GDP) (GGSB_NPGDP)
|
Cyclically-adjusted fiscal balance as % of GDP. | IMF World Economic Outlook |
Inflation (PCPI)
|
Annual consumer price inflation (period average). | IMF World Economic Outlook |
Real Effective Exchange Rate (REER_IX_RY2010_ACW_RCPI)
|
Index measuring international competitiveness relative to trading partners. | IMF REER Database |
2.2 Outcome Construction and Panel Alignment (5 marks)
Fitch ratings are expressed as ordered letter grades. A sovereign is considered Investment Grade (IG) if its rating is BBBโ or above.
2.2.1 Construct a binary variable \(is_{{ig}_t}\) indicating whether a country is investment grade in year t.
2.2.2 Construct the forecasting target:
\[ is_{{ig}_{t+1}} \]
Each countryโyear observation at time t must be paired with the IG status of the same country in year t+1.
2.2.3 Clearly explain how you handled:
- countries without consecutive observations,
- countries entering or exiting the sample.
2.3 Data Exploration (10 marks)
Q2.3.1 Examine the distribution of Fitch ratings across the sample. How concentrated are ratings at the top or bottom of the scale?
Q2.3.2 Identify rating switches over time:
- Which countries experienced rating upgrades or downgrades?
- How frequent are rating changes overall?
Q2.3.3 Do all rating changes result in a change in Investment Grade status? Provide examples and discuss.
Q2.3.4 Compare average values of at least five macroeconomic indicators between IG and non-IG countries. Interpret the differences.
Q2.3.5 Produce at least one multi-variable visualisation involving IG status and two predictors. Interpret it.
2.4 Modelling (23 marks)
Q2.4.1
Choose and justify a time-based train/test split.
Q2.4.2 Baseline model
Build a baseline logistic regression model to predict next-year Investment Grade status.
Interpret its coefficients and evaluate its performance. Justify your modelling and evaluation choices.
Q2.4.3 Alternative approach
Develop and evaluate an alternative modelling approach.
Justify your modelling and evaluation choices.
The goal is not to maximise raw performance at all costs, but to demonstrate coherent reasoning in model construction, evaluation, and interpretation.
2.5 Discussion (12 marks)
Discuss:
- How predictable sovereign IG status appears to be.
- Whether macroeconomic variables alone seem sufficient to explain rating dynamics.
- The implications of treating panel observations as independent.
- Limitations of your modelling approach.
- What extensions (methodological or data-related) you would consider next.
Weโve seen many models until now and you might be tempted to try and show us every single model you know, in particular in the questions calling upon you to improve model performance. Donโt!
Resist the siren calls๐งโโ๏ธ and make resolute model choices. Model selection is a skill! So, DO NOT TRY EVERY SINGLE MODEL UNDER THE SUN to tackle to the questions. State your modeling hypotheses clearly, justify your choices and only choose a couple of models to try and solve the questions.You are obviously allowed to use dimensionality reduction techniques e.g PCA/MCA/FAMD/UMAP (W07) if you think it might help with your modeling (again justify their use if you do use them!). But you donโt have to use them. This summative is mainly about supervised learning techniques.
Simply lining up code without explanation will not get you high grades. We expect you to justify your modeling choices (e.g why did you choose to use a particular model in the particular context of the problem youโre solving? why is it uniquely suitable for the dataset/problem context? How did you set its parameters?) and to explain the model results and metrics in the context of the problem youโre dealing with.
โ๏ธ How we will grade your work
Following the instructions carefully and competently across both parts should result in an overall mark in the 60โ69 range. To obtain 70% or above, your work must demonstrate strong modelling judgement, correct handling of temporal structure, careful preprocessing, and clear interpretation. Only if you go above and beyond what is asked of you in a meaningful way will you get scores of 70% or above. Simply adding more code1 or text will not get you a higher score; you need to add interesting insights or analyses to get a distinction.
โ ๏ธ You will incur a penalty if you only submit a .qmd file and not also a properly rendered .html file alongside it!
Following the instructions carefully and competently across both parts should result in an overall mark in the 30โ34 range (per 50-mark part) โ i.e., around a low-to-mid 60 overall.
Part 1: Gold Return Prediction (50 marks)
0โ19 marks
Serious fail.
This band includes work with one or more of the following:
- Incorrect construction of next-month returns.
- Incorrect construction of real yield.
- NFCI not aggregated properly or merged incorrectly.
- Clear data leakage (e.g., using future information in predictors).
- Incorrect or inappropriate train/test split (e.g., inappropriate split in time-series context).
- Baseline linear regression missing or fundamentally mis-specified.
- Inappropriate evaluation metrics for forecasting.
- No interpretation of coefficients.
- Submission mostly code with minimal explanation even if technically correct.
- Plots missing, unlabeled, or uninterpreted.
- Major formatting issues that hinder comprehension of the content
If the modelling design invalidates the exercise (e.g., leakage or incorrect target) or if we canโt understand what youโre trying to do, the mark will fall in this range.
20โ24 marks
Weak pass.
- Target and real yield constructed but with weaknesses or unclear alignment.
- Minor leakage risks or temporal inconsistencies.
- Train/test split present but poorly justified.
- Evaluation superficial or weakly justified.
- Coefficient interpretation incomplete or partially incorrect.
- Alternative approach minimal or not meaningfully distinct.
- Organisation or clarity issues.
- Noticeable formatting or organisation issues.
Demonstrates partial understanding but significant modelling weaknesses.
25โ34 marks
Solid pass to merit.
- Target, real yield, and NFCI correctly constructed.
- No serious leakage.
- Appropriate time-based train/test split.
- Baseline linear regression correctly specified.
- Evaluation metrics appropriate and justified.
- Coefficients interpreted correctly in economic terms.
- Alternative approach implemented and evaluated coherently.
- Minor preprocessing imperfections or limited depth of justification.
- Discussion addresses predictability and limitations, but not deeply.
- Plots labelled and interpreted.
- Interpretations could use more depth (e.g contextual information) or over-claim based on the evidence.
This band reflects technically correct, coherent modelling, but limited analytical depth or modelling sophistication.
35โ39 marks
Strong distinction-level work.
- Careful handling of temporal alignment and forecasting logic.
- Explicit awareness of leakage risks and modelling assumptions.
- Clean preprocessing and feature construction.
- Evaluation thoughtfully chosen and clearly justified.
- Coefficient interpretation precise and economically meaningful.
- Alternative modelling approach meaningfully different and well motivated.
- Discussion critically examines predictability and model limitations.
- Clear, well-structured presentation.
This band reflects strong modelling judgement and analytical maturity โ not just correctness.
40โ50 marks
Exceptional work.
- No leakage, mis-specification, or alignment issues.
- Modelling design reflects deep understanding of forecasting constraints.
- Alternative approach adds genuine analytical value (not just a different algorithm).
- Evaluation demonstrates nuanced understanding of forecast difficulty.
- Interpretation shows insight into economic mechanisms and limitations.
- Discussion critically engages with model uncertainty and structural issues.
- Writing is precise, concise, and professional throughout.
- Evidence of model comparison reasoning grounded in forecasting theory rather than trial-and-error experimentation.
This band reflects intellectual control of the modelling problem, not merely strong technical execution.
Marks above 45 will be rare and reserved for work demonstrating exceptional analytical clarity and modelling discipline throughout.
Part 2: Sovereign Ratings Classification (50 marks)
0โ19 marks
Serious fail.
Includes one or more of:
- Incorrect construction of investment-grade outcome.
- Misinterpretation of ratings transitions.
- Data leakage (e.g., using contemporaneous or future ratings in predictors).
- Incorrect handling of temporal structure.
- Logistic regression missing or fundamentally mis-specified.
- Inappropriate classification metrics.
- No interpretation of coefficients.
- Submission largely code without explanation.
- Plots missing, unlabeled, or uninterpreted.
- Major formatting issues that affect the comprehension of the content.
20โ24 marks
Weak pass.
- Investment-grade variable constructed but poorly explained.
- Some preprocessing or alignment weaknesses.
- Logistic regression implemented but weakly justified.
- Evaluation superficial or partially inappropriate.
- Coefficient interpretation limited or partially incorrect.
- Alternative approach weak or incoherent.
- Noticeable formatting or organisation issues.
Shows partial understanding but significant modelling weaknesses.
25โ34 marks
Solid pass to merit.
- Investment-grade target correctly constructed.
- No serious leakage.
- Logistic regression correctly specified.
- Appropriate classification metrics selected and justified.
- Coefficients interpreted correctly (log-odds and/or probability terms).
- Alternative modelling approach implemented and evaluated coherently.
- Minor modelling or justification weaknesses.
- Discussion addresses panel structure and limitations at a basic level.
- Plots labelled and interpreted.
- Interpretations could use more depth (e.g contextual information) or over-claim based on the evidence.
Technically correct and coherent, but without strong depth of modelling insight.
35โ39 marks
Strong distinction-level work.
- Careful treatment of temporal structure in panel setting.
- Clear awareness of independence assumptions and limitations.
- Thoughtful modelling design.
- Precise interpretation of logistic coefficients.
- Evaluation reflects understanding of imbalance and trade-offs.
- Alternative approach meaningfully distinct and justified.
- Discussion critically examines modelling assumptions and structural issues.
- Clear and professional presentation.
Reflects strong analytical judgement and modelling maturity.
40โ50 marks
Exceptional work.
- No leakage, mis-specification, or alignment errors.
- Clear understanding of prediction vs structural interpretation.
- Insightful handling of panel structure and modelling assumptions.
- Evaluation demonstrates nuanced understanding of classification trade-offs.
- Interpretation connects results meaningfully to economic reasoning.
- Discussion critically engages with rating dynamics and modelling limits.
- Writing precise, rigorous, and disciplined throughout.
- Clear distinction between predictive modelling and structural inference, with disciplined interpretation of results.
Marks above 45 will be rare and reserved for work demonstrating exceptional analytical clarity and modelling discipline throughout.
๐ Getting help
You can post general coding questions on Slack but should not reveal code that is part of your solution.
For example, you can ask:
- โDoes anyone know how I can create a logistic regression in
scikit-learnwith aPipeline?โ - โHas anyone figured out how to do time-aware cross-validation??โ
- โI tried using something like
pd.query("Date>'1997-05-06'")but then I got an errorโ (Reproducible example) - โDoes anyone know how I can create a new variable that is the sum of two other variables?โ
You are allowed to share โaestheticโ elements of your code if they are not part of the core of the solution. For example, suppose you find a really cool new way to generate a plot. You can share the code for the plot, using a generic df as the data frame, but you should not share the code for the data wrangling that led to the creation of df.
If we find that you posted something on Slack that violates this principle without realising it, you wonโt be penalised for it - donโt worry, but we will delete your message and let you know.
๐ฏ Collaborating with others
You are allowed to discuss the assignment with others, work alongside each other, and help each other. However, you cannot share or copy code from others โ pretty much the same rules as above. You should also be careful about having distinct analytical pipelines.
๐ค Using AI help?
You can use Generative AI tools such as ChatGPT when doing this research and search online for help. If you use it, however minimal use you made, you are asked to report the AI tool you used and add an extra section to your notebook to explain how much you used it.
Note that while these tools can be helpful, they tend to generate responses that sound convincing but are not necessarily correct. Another problem is that they tend to create formulaic and repetitive responses, thus limiting your chances of getting a high mark. When it comes to coding, these tools tend to generate code that is not very efficient or old and does not follow the principles we teach in this course.
To see examples of how to report the use of AI tools, see ๐ค Our Generative AI policy.
Footnotes
Hint: donโt just write code, especially uncommented chunks of code. It wonโt get you very far. Submissions consisting largely of code with little interpretation will receive a low mark, even if the code runs. You need to explain the code results, interpret them and put them in context.โฉ๏ธ
