The Algo Engineer

Online Linear Regression using a Kalman Filter

Linear regression is useful for many financial applications such as finding the hedge ratio between two assests in a pair trade. In a perfect world, the realtionship between assests would remain constant along with the slope and intercet of a linear regression. Unfortutanely this is usually the exception rather than the rule. In this post, I’m going to show you how to use a Kalman filter for online linear regression that calculates the time-varying slope and intercept. The Python module, pykalman, is used to easily construct a Kalman filter. The complete iPython notebook used to do the analysis below is available here.

For this example, I’m going to use two related ETF’s, the iShares MSCI Australia (EWA) and iShares MSCI Canada (EWC). We can use the DataReader function from pandas to download the daily adjusted closing prices for the EWA and EWC ETF’s from Yahoo.

from pandas.io.data import DataReader
secs = ['EWA', 'EWC']
data = DataReader(secs, 'yahoo', '2010-1-1', '2014-8-1')['Adj Close']

The correlation between the two assests adjusted closing prices can be visualized using a scatter plot with each point colored by date. Clearly, the relationship between the ETF’s changes between 2010 and 2014 and can’t be described accurately by a simple linear regression with constant slope and intercept.

price_corr

Before we get started on the Kalman filter, recall the equation for a linear regression

where and is the adjusted closing price of EWC and EWA respectively and and is the slope and intercept. Rewriting in in vector form gives

The Kalman filter is a linear state-space model that operates recursively on streams of noisy input data to produce a statistically optimal estimate of the underlying system state. The general form of the Kalman filter state-space model consits of a transition and observation equation

where and are the hidden state and observation vectors at time . and are the trasition and observation matrices. and are Guassian noise with zero mean.

For our application, we assume that the hidden state variable, , is the slope and intercept of the linear regression denoted by the vector above. We also assume the slope and intercept follow a random walk by setting equal to the identity matrix. Our transition equation now looks like

This simply says that for the next timestep is the current plus some noise.

The next step is to fit our model to the observation equation of the Kalman filter. To do this, we make the EWC adjusted closing prices the observations, , and the observation martix, , is a 1x2 vector consisting of the EWA adjusted closing price in the first column and ones in the second column as in the vector above. This is simply a linear regression between the two assests. For pykalman, the observation matrix obs_mat is constructed using

obs_mat = np.vstack([data.EWA, np.ones(data.EWA.shape)]).T[:, np.newaxis]

and looks like

array([[[ 19.36,   1.  ]],
       [[ 19.42,   1.  ]],
       [[ 19.49,   1.  ]],
       ..., 
       [[ 26.02,   1.  ]],
       [[ 26.24,   1.  ]],
       [[ 26.42,   1.  ]]])

The last thing we need to specify is the noise terms and . We set the observation covariance, , to unity. We treat the transition covariance, , as a parameter that can be adjusted to control how quickly the slope and intercept change.

delta = 1e-5
trans_cov = delta / (1 - delta) * np.eye(2)

Now, we can instantiate the KalmanFilter class from the pykalman module

kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
                  initial_state_mean=np.zeros(2),
                  initial_state_covariance=np.ones((2, 2)),
                  transition_matrices=np.eye(2),
                  observation_matrices=obs_mat,
                  observation_covariance=1.0,
                  transition_covariance=trans_cov)

and calculate the filtered state means and covariances

state_means, state_covs = kf.filter(data.EWC.values)

Finally, we can plot the slope and intercept to see how they change over time slope_intercept

A more interesting way to visualize this is to overlay every fifth regression line on the EWA vs EWC scatter plot so we can clearly see the how the regression line adjusts over time price_corr_regress