Simple Time-Series Models

(This article is still W.I.P. - expect updates in the future)

As the name implies, this section deals with very simplistic models. In fact, it might be tempting to dismiss the upcoming ideas as too simplistic for real-world time-series problems. We will see, however, why this might not necessarily be the case.

First, we start with uncorrelated noise, which is indeed still too simple for most applications. The concept of integrating stochastic noise to trending and seasonal patterns, however, can be quite powerful.

Models with pure i.i.d. noise

The simplest (probabilistic) way to model a time-series is via plain independly, identically, distributed randomness:

Formula for i.i.d. noise.

This implies that all our observations follows the same distribution at any point in time (identically distributed). Even more importantly, we presume no interrelation between observations at all (independently distributed).

Probably your first question is if this model isn't too simplistic to be useful for real-world problems. Certainly, an actual time-series is unlikely to have no statistical relationship with its own past.

While those concerns are true by all means, we can nevertheless deduce the following logical consequence:

Any time-series model that is more complex than a pure-noise model should also produce better forecasts than a pure-model noise.

In a sentence, we can use fitted random noise as a benchmark model. There is arguably no simpler approach to create baseline benchmarks than this one. Even smoothing techniques will likely require more parameters to create a fully probabilistic model.

Besides this rather obvious use-case there is another potential application for i.i.d. noise. Due to their simplicity, noise models can potentially be use for very small datasets. Consider this: If big, complex models require large datasets to prevent overfitting, then simple models require only a handful.

Of course, it is debatable what dataset size can be seen as 'small'.

Integrated i.i.d. noise

Now, things are becoming more interesting. While raw i.i.d. noise cannot account for auto-correlation between observations, integrated noise can. Before we do a demonstration, let us introduce the difference operator:

Difference operator for linear time-series.

At first, the difference operator might seem unnecessary and superfluous. You will see soon, however, that it allows us to write fairly complex models with considerably compact notation.

Definition of an integrated time-series

With the difference operator in our toolbox, we can now define an integrated time-series:

Definition of an integrated time-series.

There are several ideas in this definition that we should clarify further:

First, you probably noticed the concept of exponentiating the difference operator. You can simply think of this as performing the differentiation multiple times. For the squared difference operator, this would look as follows:

Squared difference operator for linear time-series.

Such equation could produce a time-series with yearly seasonality and some linear trend.

Third, it is common convention to simply write

We will happily adopt this convention here. Also, we call such time-series simply integrated without referencing its order or seasonality.

Obviously, we also need to re-transform a difference representation back to its original, integrated representation. In our notation, this means we invert the difference transformation, i.e.

Inverse difference operator for linear time-series.

must hold for arbitrary difference transformations. If we expand this formula, we get

Retransforming a differenced time-series for forecasting.

These simplification follow from the fact the difference operator is a linear operator (we won't cover the details here). Technically, the last equation is just a fancy way to say that the next observation equals the last observation plus the change between both time steps.

In a forecasting problem, we will typically have a prediction for this change,

Let's denote this as

to stress that it is not the actual change but a predicted one. Thus, the forecast for the integrated time-series is

Forecasting with differenced time-series.

Afterwards, we apply this logic recursively as far into the future as our forecast should go:

Forecasting h steps with differenced time-series.
White noise time-series (left) and corresponding integrated time-series (right).
White noise time-series (left) and corresponding integrated time-series (right). Both time-series are related via the simple difference operator and its inverse.

Integrated noise for seemingly complex patterns

By now, you can probably imagine what is meant by an integrated noise model. In fact, we can come up with countless variants of an integrated noise model by just chaining some difference operators with random noise. One possibility would be a simply integrated time-series, i.e.

It is an interesting exercise to simulate data from such a model using a plain standard normal distribution.

As it turns out, samples from this time-series appear to exhibit linear trends with potential changepoints. However, it is clear that these trends and changepoints occur completely random. Without further information from external regressors that could explain such trend shifts, it is impossible to predict them deterministically.

This implies that local trend models are very dangerous to forecast such time-series. Typically, local trend models will simply extrapolate the latest trend into the future. If trend changes can happen randomly at any point in time, such extrapolation does not make much sense though.

You can see an example of such phenomena below. While there appears to be a trend change at around t=50, this change is purely random. The upward trend after t=50 also stalls at around t=60. Imagine how your model would have performed if you extrapolated the upward trend after t=60.

import numpy as np
import matplotlib.pyplot as plt


plt.figure(figsize = (16,5))
plt.plot(np.cumsum(np.random.normal(size = 100)),color="blue")
Integrated time-series with shifting linear trends.
Generating a time-series with shifting linear trends by integrating standard normal noise.

Of course, the saying goes 'never say never', even in those setting. However, you should really know what you are doing if you then use such models anyway.