As commonly known, LSTMs (Long short-term memory networks) are great for dealing with sequential data. One such example are multivariate time-series data. Here, LSTMs can model conditional distributions for complex forecasting problems.
For example, consider the following conditional forecasting distribution:
Notice that we predict the cholesky decomposition of the conditional covariance matrix. This ensures that the resulting covariance matrix is positive semi-definite. Now, this approach would allow us to model quite complex dynamical problems.
On the other hand, however, the degrees of freedom in this model will rapidly explode with increasing dimensionality
D of the multivariate time-series. After all, we need
(D^2+D)/2 LSTM outputs for the covariance structure alone. This can clearly lead to overfitting quite easily.
Another disadvantage is the assumption of a conditionally Gaussian time-series. As soon as our time-series is not a vector of real-numbers, this model does not work anymore.
Thus, a potential solution should satisfy two properties:
- Allow to parsimoniously handle high-dimensional time-series
- Work with conditionally non-Gaussian time-series
LSTMs with Gaussian Copula
As a potential solution, we could separate the dependency among the time-series from their marginal distribution. Hence, let us presume constant conditional dependency between the time-series but varying conditional marginals. This indicates that a Copula model might be a good approach - for simplicity, we use a Gaussian Copula.
Since the basics of the Gaussian Copula have been discussed in this previous article, we won't repeat them here.
In summary, our model looks as follows:
This allows us to deal with arbitrary continuous marginal distributions. In fact, we could even work with mixed continuous marginal distributions. In order to achieve sparsity in the copula parameter matrix, we could, for example, add a regularization term as is typically done when estimating high-dimensional covariance matrices.
The only drawback now is the assumption of a constant dependency over time. If this contradicts the data at hand, we might need to model the copula parameter in an auto-regressive manner as well. A low-rank matrix approach could preserve some parsimony then.
To show how this could be implemented in the case of Gaussian marginals, I have created a quick Jupyter notebook with tensorflow. Regarding the Copula part, the tensorflow example on Gaussian Copulas has a ready-made implementation using tensorflow probability bijectors.
 Hochreiter, Sepp; Schmidhuber, Jürgen. Long short-term memory. Neural computation, 1997, 9.8, p. 1735-1780.
 Nelsen, Roger B. An introduction to copulas. Springer Science & Business Media, 2007.