Heirarchical Time Series Using PyMC

Charles Copley
3 min readJun 16, 2023

--

In the world of statistical modeling, one powerful approach to account for group differences while understanding overall trends is hierarchical (or multilevel) modeling. This approach allows parameters to vary by groups and captures both within-group and between-group variations. In the context of time series data, these group-specific parameters can represent different patterns over time for different groups.

Today, we will take a deep dive into building a hierarchical time series model using PyMC, a Python library for probabilistic programming.

Let’s start with generating some artificial time-series data for multiple groups, each with its own intercept and slope.

import numpy as np
import matplotlib.pyplot as plt
import pymc as pm

# Simulating some data
np.random.seed(0)
n_groups = 3 # number of groups
n_data_points = 100 # number of data points per group
x = np.tile(np.linspace(0, 10, n_data_points), n_groups)
group_indicator = np.repeat(np.arange(n_groups), n_data_points)
slope_true = np.random.normal(0, 1, size=n_groups)
intercept_true = np.random.normal(2, 1, size=n_groups)
y = slope_true[group_indicator]*x + intercept_true[group_indicator] + np.random.normal(0, 1, size=n_groups*n_data_points)

We’ve now got time-series data for three different groups. Each group has its own time trend defined by a unique intercept and slope.

colors = ['b', 'g', 'r']  # Define different colors for each group

plt.figure(figsize=(10, 5))

# Plot raw data for each group
for i in range(n_groups):
plt.plot(x[group_indicator == i], y[group_indicator == i], 'o', color=colors[i], label=f'Group {i+1}')

plt.title('Raw Data with Groups')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

The next step is to build our hierarchical model. Our model will have group-specific intercepts (alpha) and slopes (beta). The intercepts and slopes are drawn from normal distributions with hyperparameters mu_alpha, sigma_alpha, mu_beta, and sigma_beta. These hyperparameters represent the group-level means and standard deviations of the intercepts and slopes, respectively.

with pm.Model() as hierarchical_model:
# Hyperpriors
mu_alpha = pm.Normal('mu_alpha', mu=0, sigma=10)
sigma_alpha = pm.HalfNormal('sigma_alpha', sigma=10)
mu_beta = pm.Normal('mu_beta', mu=0, sigma=10)
sigma_beta = pm.HalfNormal('sigma_beta', sigma=10)

# Priors
alpha = pm.Normal('alpha', mu=mu_alpha, sigma=sigma_alpha, shape=n_groups) # group-specific intercepts
beta = pm.Normal('beta', mu=mu_beta, sigma=sigma_beta, shape=n_groups) # group-specific slopes
sigma = pm.HalfNormal('sigma', sigma=1)

# Expected value
mu = alpha[group_indicator] + beta[group_indicator] * x

# Likelihood
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)

# Sampling
trace = pm.sample(2000, tune=1000)

We’ve now defined and sampled from our model. Lets check the model estimates for the different parameters:

# Checking the trace
pm.plot_trace(trace,var_names=['alpha','beta'])
plt.show()

The final step is to visualize the raw data and the model’s predictions:

# Posterior samples
alpha_samples = trace.posterior['alpha'].values
beta_samples = trace.posterior['beta'].values

# New x values for predictions
x_new = np.linspace(0, 10, 200)

plt.figure(figsize=(10, 5))

# Plot raw data and predictions for each group
for i in range(n_groups):
# Plot raw data

plt.plot(x[group_indicator == i], y[group_indicator == i], 'o', color=colors[i], label=f'Group {i+1} observed')
x_new = x[group_indicator == i]
# Generate and plot predictions
alpha = trace.posterior.sel(alpha_dim_0=i,beta_dim_0=i)['alpha'].values
beta = trace.posterior.sel(alpha_dim_0=i,beta_dim_0=i)['beta'].values
y_hat = alpha[..., None] + beta[..., None] * x_new[None,:]
y_hat_mean = y_hat.mean(axis=(0, 1))
y_hat_std = y_hat.std(axis=(0, 1))
plt.plot(x_new, y_hat_mean, color=colors[i], label=f'Group {i+1} predicted')
plt.fill_between(x_new, y_hat_mean - 2*y_hat_std, y_hat_mean + 2*y_hat_std, color=colors[i], alpha=0.3)

plt.title('Raw Data with Posterior Predictions by Group')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

As you can see from the plot, the hierarchical time series model has done a good job capturing the individual trends in each group. Moreover, the shaded region gives a measure of uncertainty around the predictions.

In conclusion, hierarchical models provide a powerful framework for capturing group-level variations in time-series data. They allow us to share statistical strength among groups, provide partial pooling of information, and offer a nuanced understanding of the data structure. Using libraries like PyMC, implementing these models becomes fairly straightforward, paving the way for robust and interpretable time series analyses.

--

--