Mastering recessions part I: Unveiling the importance of recession risk

June 6, 2023

Dessert First

In the ever-changing landscape of the economy, the topic of recession risk looms large, capturing the attention of both investors and economists alike. As macro observers closely monitor various indicators, there is a growing consensus that recession risk has become increasingly prominent in recent times.

To shed light on this pressing issue, we have devised a machine-learning approach to estimate recession risk using logistic modeling. By leveraging historical data and a multitude of predictors, our model enables us to gauge the likelihood of an impending economic downturn. Excitingly, our initial foray into recession risk modeling has yielded promising results as can be observed in the figure below. The model currently points to heightened recession risk for the next 6 months.

Building on this, we are also thrilled to announce that our next blog post will delve into the creation of a systematic allocation model based on our recession risk estimation enhanced with other factors. However, our journey will not end here. In future explorations, we plan to venture into testing other models, including powerful tree ensembles, to further enhance our predictions.

Preliminary recession analysis

  • In economics, a recession is generally defined as a significant decline in economic activity that lasts for a sustained period. While there is no universally agreed-upon definition, they are typically characterized by a contraction in big economic indicators such as gross domestic product, employment, industrial production, and consumer spending. The National Bureau of Economic Research (NBER) is a widely recognized authority in the United States that determines the start and end of official recession dates. 

  • The chart below shows the evolution of the S&P500 price index on a log scale together with an overlay of official recession periods according to the NBER definition. Since 1927, we have observed a recession around 16% of the time, or every 6.25 years. However, since 1979 - which will be the start of our recession model - recessions have become more scarce and have only occurred around 11% of the time or every 9 years. They last on average 12 months. 

  • Equity returns are substantially lower during recession months. Since 1927, The average monthly log return is -0.66% during recessions compared to 1.2% outside of recessions. This difference seems statistically significant (t-stat of -4.3)  at the 5% level.  However, this average has been skewed because of a few big bear markets.  Some recessions have even witnessed positive equity returns. It’s one thing to call a recession, it’s a whole other endeavor to estimate the market's reaction to it! So you better think twice before letting fear of recessions dominate your portfolio decisions. 

  • When we look more closely and dissect recessions in detail (table below), we notice that the S&P500 index often times already starts declining 6 months before the official start of a recession. The biggest decline occurs only during the first half of the recession. In the second half, the market usually already starts a remarkable recovery. Hence the saying: markets are forward-looking by nature. 

  • So what if we rely on the NBER to get us out of the market? The problem is that the NBER takes its time to define a recession. During the past 6 recessions, it took the NBER committee an average of 8 months after the first negative GDP reading to announce the start of a recession. In 2008 it took them a year. The NBER also determines the end of recessions, and this often takes them even longer (a year on average). This leads to the hilarious situation, where the start date of a recession is often announced when in reality the recession is already over and the markets are in recovery modus.

Do we have to act on recessions?

For investors without a data-driven approach, it’s probably safe and wise to do absolutely nothing and just ride it out. First of all, calling recessions is quite difficult. Secondly, we know from the past that even if a recession occurs, the damage might not be that large at all. Finally, when are you ever going to get back in? Relying on the official NBER publication dates to make allocation decisions might be severely damaging to your investment portfolio. 

We devise a purely data-driven methodology that beats the NBER approach. We implement a rule-based framework that will guide us through recessionary and potentially dangerous periods. We need two construct two models:

  1. Firstly, we require a model that computes recession probabilities for the future. Preferably, the model starts warning us at least 6 months before the start of an official recession (which the NBER will only call much later). We would like to feed our model numerous candidate recession predictors from which it can choose the best ones to make actual predictions.

  1. We need a rule-based trading model that takes us out of the market if things really go haywire. We have a couple of options:
  • We switch to a safe asset as soon as the probability of recession crosses a certain threshold. But what if the market hardly declines or even rises as we have seen in the past?
  • We switch to the trend modus as soon as the probability of recession crosses a certain threshold. In that case, the recession probability just acts as a warning signal to stay on high alert. But it takes an actual negative price trend to switch to a safer asset. This method might make more sense as estimating the market damage from recessions is next to impossible. The downside of trend models of course could be choppy performance in a zig-zagging bear market and a lagged reaction to market declines. This could potentially be solved partially by using Hidden Markov Regimes (we wrote about that here) but for the objective of this article, we will stick with a shorter-term trend signal.

In this blog post, we will only focus on building our recession prediction model. In a future post, we will be presenting a systematic allocation model using our estimated recession probabilities.

Logistic regression model to the rescue (ELI5)

If you are familiar with machine learning, you can safely skip this section.

For the objective of trying to predict the probability of future recessions, we will use a logistic regression model. In future blog posts, we’ll explore if we can improve this model by using other machine-learning techniques (i.e. tree ensembles).

Imagine you are on an adventure, and you want to know if you'll encounter a rainy day six months from now. To help you make a guess, we can use some special tools called "logistic regression models" that use information about the weather patterns in the past to predict the chances of rain in the future. In our case, instead of rain, we want to predict something called a "recession." A recession is like a time when the economy slows down, and people might have a harder time finding jobs or buying things they want. So we want to find out if a recession will happen six months from now.

To make this prediction, we use monthly predictors called "macroeconomic and market-based time series." It's like looking at a big puzzle made up of different pieces that represent the economy. The financial media and some investors tend to focus on just a few popular predictors because the brain has a hard time processing lots of potential predictors. But there are so many potentially interesting predictors that might contain useful information. Luckily, a computer doesn’t suffer from information overload. It can process lots of predictors quite easily. That’s why we will feed it dozens of candidate predictors. These pieces can include things like the overall growth of the country, how much people are earning, how many jobs are available, and other important factors that show how the economy and the financial markets are doing.

Now imagine you have a magic box that takes all this information about the economy and predicts whether or not a recession will happen. Inside the magic box, there are special mathematical equations that analyze past data and learn patterns. It tries to understand how different puzzle pieces fit together to give us a clue about the future. Maybe it finds a couple of interesting patterns such as a higher future recession occurrence when cyclical manufacturing jobs start to decline or when consumer spending starts to slow down. It will take these patterns to try to compute better recession probabilities than random ones.

The magic box uses these patterns to calculate the chances of a recession happening in six months. It doesn't give a simple "yes" or "no" answer like a regular calculator. Instead, it gives us a number between 0 and 1, which represents the probability or likelihood of a recession happening. If the number is close to 0, it means a recession is unlikely, but if it's closer to 1, it means there's a higher chance of a recession.

Using a logistic regression model with macroeconomic time series is a sound option because it allows us to use historical information about the economy to make predictions about the future. By looking at how the puzzle pieces of the economy fit together in the past, we can try to understand what might happen next. Of course, it's important to remember that predictions are not always perfect, just like our weather forecast isn't always right. But by using this special tool, we can have a better idea of what might happen and make more informed decisions.

So, just like a weather forecaster who uses past weather patterns to predict future rain, we can use a logistic regression model with macroeconomic time series to help us guess if a recession will happen six months from now. It's like having a little helper who uses math and history to give us a clue about what the future might hold.

Time to get to work: data & preprocessing

Before running the logistic model we create a base table containing all the predictors and the target variable.

  • We use Python for modeling. 
  • We select monthly data (533 observations) from 1970-01 till 2022-11. 
  • We select a total of 53 candidate predictors. This sampling is based on mentions in several academic papers on recession forecasting and empirical use by global macro strategists. We employ predictors that cover all aspects of economic activity and use market-based indicators as well. If you would like to consult the full detail of our candidate predictors, feel free to reach out to us. To give the reader a first idea we list only one predictor per domain below:
  1. Markets - Oil: The year-on-year percent change in the real oil price.
  2. Markets - Rates: The 12-month difference in the Baa spread.
  3. Markets - Financial conditions: The Chicago Fed National Financial Conditions Index.
  4. Macroeconomic - Composite: The Dieli Enhanced Aggregate Spread.
  5. Macroeconomic - Prices: The PPI Finished Goods Index.
  6. Macroeconomic - Production: ISM Manufacturing New Orders Index.
  7. Macroeconomic - Labour: The Conference Board Employments Trend Index.
  8. Macroeconomic - Consumer: The year-on-year percent change in Heavy Weight Truck Sales.
  9. Macroeconomic - Housing: The year-on-year change in Housing Permits.
  10. Macroeconomic - Monetary: The 12-month difference in the Real Effective Funds Rate.
  • We import most time series from Fred - the economic database of the US St Louis Federal Reserve - and Bloomberg terminal. All series are shifted for correct point-in-time usage. Note that we avoid data leakage introduced by macro publication dates by appropriately lagging our predictors. We also test each predictor for stationarity. Some series are transformed into percent changes or absolute changes to make them stationary if needed. 
  • Our target feature at every month t is 1 if any month during the next 6 months was an official NBER recession month. Else, the target is 0.

 Training & evaluation

Training data

We use walk-forward training with an expanding monthly training data window and monthly evaluation. We train the logit model for the first time in 1999-12 using the training data available at that time between 1970-01 and 1999-06 (19,5 years). We then expand our training set with one month until we reach our end date of 2022-11. An expanding training window makes sure our model can potentially learn better when new extra information becomes available before making the next prediction again.

Using all 53 predictors to predict would surely lead to overfitting. We, therefore, implement forward stepwise selection during each training window: 

  • After a training period, we want to know what the best combination of predictors would have been. So each training period, we look for the first best predictor out of the 53 candidate predictors. We then search for the next best predictor in combination with the first one. We subsequently search for the third-best predictor in combination with the first two. We iterate through all candidate predictors to end up with a sorted list of predictors. (This is also known as forward stepwise selection.)
  • To determine the quality (“best”) of predictor combinations we use the Area Under the Percision-Recall Curve.  Typically, the AUC score is used for evaluating logit models. However, in the case when it’s more important to prioritize positive findings (focus on finding as many recession periods as possible) it’s advisable to use the AUPRC.
  • We set a threshold of 10 which means we will only select the 10 best predictors from the list.
  • Using the 10 best predictors only - resulting from the forward stepwise selection during the training period - we then retrain the model on the same training window. We use this logistic regression equation output to predict in month t (using the predictor values available in month t) the probability of recession within 6 months' time.
  • In summary: Our initial training set starts from 1979-01 (start date) to 1999-06 (expanding end date). The training set expands monthly thereafter. During each training window we select the combination of the 10 best predictors, retrain the model on those 10 predictors only and predict in the 6th month following the expanding end date, using the available features in that month.

Out-of-sample test data

  • We predict the probability of a recession happening in 6 months' time for the first time in 1999-12, using the output of our initial training period (1970-01 till 1999-06 ex). We then roll a month to predict again in 2000-01, using the output of our expanding training window (1970-01 - 1999-07). We iterate through all months until 2022-11. 

Latest out of sample data

  • For the last 6 months 2022-12 till 2023-05, we predict the probability of a recession happening within 6 months time using the last logistic regression output from 2021-11. These are real-time forecasts as the occurrence of a recession is not known yet. 


After running the complete model we check popular evaluation methods and conclude the model is of decent quality.

  • The AUPRC of our model is 0.49. This score  indicates that the logistic model has a reasonably good performance in predicting recessions out-of-sample. The baseline of our AUPRC is equal to the target incidence of 0.16, so obtaining an AUPRC of 0.49 means the model performed about 3x better than the baseline prediction.   
  • If we set the threshold to 50% (meaning that we equate the recession probability to 100% as soon as 50% is reached and 0% otherwise):
  • The Precision score is 51%, meaning the model correctly predicted (out-if-sample) 51% of the positive recession months among all predictions of positive recession months. 
  • The Recal score (sensitivity or true positive rate) is 67%, meaning the model correctly predicted 67% of all actual recession months.
  • Remember that the target incidence is only 16% in our sample. Recessions are quite rare, so these scores are quite satisfying to us for a starting model which we haven’t tried to improve yet (that’s for another blog post).
  • Finally, we present the cumulative gains curve. We are only interested in the predictive quality of predicting actual recessions (orange line). The orange line shows that for example, the 20% of the out-of-sample observations (months) with the highest predicted probability of recession contain 80% of the actual recession months. In a baseline model, this would only be 20%.

 Visualisation of recession predictions

The chart below presents the output of the model and plots the predicted probabilities of recession within 6 months versus the actual NBER recession months. The chart shows the fitted probabilities for the initial in-sample training period in red and then plots the probabilities for our actual out-of-sample testing period in black. We observe that the probability of recession started rising noticeably before the onset of the last 3 recessions in our out-of-sample data. Finally, in green, we plot the real-time forecasts for the last 6 months. The model warns of a higher than 50% probability of a US recession within 6 months since April 2023.


To shed light on the current recession fear, we have devised a machine-learning approach to estimate recession risk using logistic modeling. By leveraging historical data and a multitude of predictors, our model enables us to gauge the likelihood of an impending economic downturn. Excitingly, our initial foray into recession risk modeling has yielded promising results so far. The model currently points to heightened recession risk for the next 6 months.

Building on this, in our next blog post we will delve into the creation of a systematic allocation model based on our recession risk estimation enhanced with other factors. However, our journey will not end here. In future explorations, we plan to venture into testing other models, including powerful tree ensembles, to further enhance our predictions.