# Multivariate GARCH with Python and Tensorflow

One primary limitation of GARCH is the restriction to a single dimensional time-series. In reality, however, we are typically dealing with multiple time-series.

## Introduction

In an earlier article, we discussed how to replace the conditional Gaussian assumption in a traditional GARCH model. While such gimmicks are a good start, they are far from being useful for actual applications.

One primary limitation is the obvious restriction to a single dimensional time-series. In reality, however, we are typically dealing with multiple time-series. Thus, a multivariate GARCH model would be much more appropriate.

Technically, we could fit a separate GARCH model for each series and handle interdependencies afterwards. As long as correlations between the time-series can be presumed constant, this can be a valid and straightforward solution. Once correlation becomes dynamic, however, we could lose important information that way.

As a motivating example, consider stock market returns of correlated assets. It is a commonly observed phenomenon that asset returns' correlation tends to increase heavily during times of crisis. In consequence, ignoring such dynamics would be rather unreasonable given such convincing evidence.

Multivariate GARCH models, namely models for dynamic conditional correlation (DCC), are what we need in this case. The DCC model dates back to the early 2000s, starting with a seminal paper by Robert Engle. For this article, we will closely work with his notation.

## From GARCH to multivariate GARCH and DCC

Remember that, for univariate Normal GARCH, we have the following formulas:

For a deeper look at GARCH and its predecessor ARCH, I recommend reading the original papers (ARCH, GARCH).

Over the years, numerous extensions have been proposed to address the shortcomings of this base model - for example

• FIGARCH to model long memory of shocks in the conditional variance equation
• EGARCH for asymmetric effects of positive and negative shocks in the conditional variance
• ...and various approaches to make the conditional variance term non-linear

As we will see, all these variations of univariate GARCH can be used in a multivariate GARCH/DCC model.

### General introduction to multivariate GARCH

First, let us introduce a bi-variate random variable

with covariance matrix

In addition, we define

It can easily be seen that this matrix generalizes the squared observation term from the univariate GARCH model.

We could now generalize this to higher variate random variables and higher lag dependencies. For convenience, however, let us stick with the above.

Our goal then is to find an explicit formula to model the covariance matrix' dependency on the past. For this, we follow the tradition of GARCH models. I.e., we condition covariance linearly on past covariances and past realizations of the actual random variables.

Notice that the obvious linear transformation

would be reasonable but highly inefficient for higher dimensions. After all, for a lag-5 model, we would already have `375` free variables. In relation to daily time-series, this is more than a year worth of data.

As a first restriction, it makes sense to avoid redundancies due to the symmetry of the covariance matrix. We introduce the following operation:

Put simply, we stack all elements of the matrix into a vector while removing duplicates. This allows the following simplification to our initial multivariate GARCH model:

For the order 5-lag model, this specification reduces the amount of free variables  to `45`. As this is still quite high, we could impose some restrictions on our model matrices, for example

i.e. the matrices a diagonal. Going back, again, to the lag-5 model, we would now be down to `15` free variables.

### Multivariate GARCH with constant and dynamic correlation

Another class of multivariate GARCH specifications has been proposed by Bollerslev and Engle. The core idea is to splitconditional covariance into conditional standard deviations and conditional correlations:

Now, the conditional standard deviations can be modelled as the square roots of independent GARCH models. This leaves room for choosing any GARCH model that is deemed appropriate.

The correlation component can be presumed constant (= Constant conditional correlation, CCC) or auto-regressive (= Dynamic conditional correlation, DCC). For the latter, we can do the following:

As you can see, the un-normalized conditional correlation now follows an error-correction like term. Finally, to reduce the amount of free parameters, we can replace the matrices by scalars to get

On the one hand this formulation is less expressive than before. On the other hand, ensuring stationarity is much easier from a programmatic point of view.

## Using Python and Tensorflow to implement DCC

Let us start with the full implementation and then look at the details:

While the code is quite lengthy, its primary purpose is only twofold:

1. Calculate the in-sample distribution (get_conditional_dists(...)) - is needed for optimization via maximum likelihood. This function calculates the likelihood values of each observation given the MGARCH model.
2. Forecast the out-of sample distribution (sample_forecast(...)) - as the formulas for the model as a whole are quite complex, it's difficult to calculate the forecast distributions in closed form. However, we can, more or less, easily sample from the target distribution. With a sufficiently large sample, we can estimate all relevant quantities of interest (e.g. forecast mean and quantiles).

Notice that here, we have specified the conditional distribution as a multivariate Gaussian. Given the theory from above, this is nevertheless not a necessity. A multivariate T-distribution, for example, could work equally well or even better. Obviously, though, a Gaussian is always nice to work with.

Now, the remaining functions are basically just helpers to maintain some structure. I decided to not break down the key functions down further in order to keep the calculations in one place. If we were unit testing our model, it would actually be sensible to split things up into better testable units.

As we want to use the Keras API for training, we need to customize the training procedure (train_step(...)). Contrary to typical Keras use-cases, our training data is not split between input and output data. Rather, we only have one set of data, namely the time-series observations.

Finally, each training step needs to process all training observations at once. (no mini-batching). Also the observations must always remain in order (no shuffling).

This yields the following generic training loop:

``model.fit(ts_data, ts_data, batch_size=len(ts_data), shuffle=False, epochs = 300, verbose=False)``

## Multivariate GARCH in Python - an example

We can now test our model on a simple example and see what happens. Given Python's seamless interaction with Yahoo Finance, we can pull some data for DAX and S&P 500: Log-returns of DAX and S&P 500

The typical volatility clusters are visible for both time-series. To see what happens with correlation between both stocks over time, we can plot the 60-day rolling correlation: 60 day rolling correlation - DAX vs. S&P 500

It appears as if correlation between both indices has dropped since the beginning of the pandemic. Afterwards, correlation seems to fluctuate in cycles.

All in all, the pattern looks like a discretized version of an Ornstein-Uhlenbeck process. The error correction formulation in our model should be able to capture this behaviour accordingly.

After splitting the data into train and test set (last 90 observations), we can fit the model. Then we take samples from the (90 days ahead) forecast distribution as follows (this takes some time):

Now, we are particularly interested in the conditional correlation fit and forecasts: In-sample forecast correlations as inferred by the multivariate GARCH model (top); in-sample correlation from MGARCH and rolling correlation (bottom)

The forecasted correlation (blue) captures the actual correlation (red) under our model quite well. Obviously though, the true correlation is unknown. Nevertheless, our model matches the rolling correlation quite well, even out-of sample. This implies that our approach is - at least - not completely off.

Being able to reliably forecast correlations might be interesting for statistical arbitrage strategies. While those strategies typically use price movements, correlations could be an interesting alternative.

From here, we could also look at price and volatility forecasts as well. To keep this article from becoming bloated, I'll leave it to the interested reader to do this. You can find the relevant notebook here - feel free to extend with your own experiments.

## Conclusion

Today, we took a look at multivariate extensions to GARCH-type models. While a 'naive' extension is quite straightforward, we need to be careful not to overparameterize our model. Luckily, there already exists research on useful specifications that mostly avoid this issue.

For deeper insights, it is likely interesting to consider non-linear extensions to this approach. The trade-off between overfitting and flexibility will possible be even more relevant here. If you want to head into that direction, you might want to have a look at some results from Google Scholar.

 Bollerslev, Tim. Modelling the coherence in short-run nominal exchange rates: a multivariate generalized ARCH model. The review of economics and statistics, 1990, p. 498-505.

 Engle, Robert. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics 20.3, 2002, p. 339-350.

 Lütkepohl, Helmut. New introduction to multiple time series analysis. Springer Science & Business Media, 2005.