Heirarchical Time Series Using PyMC

3 min readJun 16, 2023

In the world of statistical modeling, one powerful approach to account for group differences while understanding overall trends is hierarchical (or multilevel) modeling. This approach allows parameters to vary by groups and captures both within-group and between-group variations. In the context of time series data, these group-specific parameters can represent different patterns over time for different groups.

Today, we will take a deep dive into building a hierarchical time series model using PyMC, a Python library for probabilistic programming.

Let’s start with generating some artificial time-series data for multiple groups, each with its own intercept and slope.

import numpy as np
import matplotlib.pyplot as plt
import pymc as pm

# Simulating some data
np.random.seed(0)
n_groups = 3  # number of groups
n_data_points = 100  # number of data points per group
x = np.tile(np.linspace(0, 10, n_data_points), n_groups)
group_indicator = np.repeat(np.arange(n_groups), n_data_points)
slope_true = np.random.normal(0, 1, size=n_groups)
intercept_true = np.random.normal(2, 1, size=n_groups)
y = slope_true[group_indicator]*x + intercept_true[group_indicator] + np.random.normal(0, 1, size=n_groups*n_data_points)

We’ve now got time-series data for three different groups. Each group has its own time trend defined by a unique intercept and slope.

colors = ['b', 'g', 'r']  # Define different colors for each group

plt.figure(figsize=(10, 5))

# Plot raw data for each group
for i in range(n_groups):
    plt.plot(x[group_indicator == i], y[group_indicator == i], 'o', color=colors[i], label=f'Group {i+1}')

plt.title('Raw Data with Groups')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

The next step is to build our hierarchical model. Our model will have group-specific intercepts (alpha) and slopes (beta). The intercepts and slopes are drawn from normal distributions with hyperparameters mu_alpha, sigma_alpha, mu_beta, and sigma_beta. These hyperparameters represent the group-level means and standard deviations of the intercepts and slopes, respectively.

with pm.Model() as hierarchical_model:
    # Hyperpriors
    mu_alpha = pm.Normal('mu_alpha', mu=0, sigma=10)
    sigma_alpha = pm.HalfNormal('sigma_alpha', sigma=10)
    mu_beta = pm.Normal('mu_beta', mu=0, sigma=10)
    sigma_beta = pm.HalfNormal('sigma_beta', sigma=10)
  
    # Priors
    alpha = pm.Normal('alpha', mu=mu_alpha, sigma=sigma_alpha, shape=n_groups)  # group-specific intercepts
    beta = pm.Normal('beta', mu=mu_beta, sigma=sigma_beta, shape=n_groups)  # group-specific slopes
    sigma = pm.HalfNormal('sigma', sigma=1)

    # Expected value
    mu = alpha[group_indicator] + beta[group_indicator] * x

    # Likelihood
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)

    # Sampling
    trace = pm.sample(2000, tune=1000)

We’ve now defined and sampled from our model. Lets check the model estimates for the different parameters:

# Checking the trace
pm.plot_trace(trace,var_names=['alpha','beta'])
plt.show()

The final step is to visualize the raw data and the model’s predictions:

# Posterior samples
alpha_samples = trace.posterior['alpha'].values
beta_samples = trace.posterior['beta'].values

# New x values for predictions
x_new = np.linspace(0, 10, 200)

plt.figure(figsize=(10, 5))

# Plot raw data and predictions for each group
for i in range(n_groups):
    # Plot raw data
    
    plt.plot(x[group_indicator == i], y[group_indicator == i], 'o', color=colors[i], label=f'Group {i+1} observed')
    x_new = x[group_indicator == i]
    # Generate and plot predictions
    alpha = trace.posterior.sel(alpha_dim_0=i,beta_dim_0=i)['alpha'].values
    beta = trace.posterior.sel(alpha_dim_0=i,beta_dim_0=i)['beta'].values
    y_hat = alpha[..., None] + beta[..., None] * x_new[None,:]
    y_hat_mean = y_hat.mean(axis=(0, 1))
    y_hat_std = y_hat.std(axis=(0, 1))
    plt.plot(x_new, y_hat_mean, color=colors[i], label=f'Group {i+1} predicted')
    plt.fill_between(x_new, y_hat_mean - 2*y_hat_std, y_hat_mean + 2*y_hat_std, color=colors[i], alpha=0.3)

plt.title('Raw Data with Posterior Predictions by Group')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

As you can see from the plot, the hierarchical time series model has done a good job capturing the individual trends in each group. Moreover, the shaded region gives a measure of uncertainty around the predictions.

In conclusion, hierarchical models provide a powerful framework for capturing group-level variations in time-series data. They allow us to share statistical strength among groups, provide partial pooling of information, and offer a nuanced understanding of the data structure. Using libraries like PyMC, implementing these models becomes fairly straightforward, paving the way for robust and interpretable time series analyses.

Heirarchical Time Series Using PyMC

Written by Charles Copley