# Using PyMC for A/B testing experiments

Let’s consider a hypothetical case where a company is testing two different website designs: Design A and Design B, and wants to determine which one leads to more user conversions.

Firstly, we collect our data. For simplicity’s sake, let’s say 1,000 users are shown Design A, with 150 users converting (taking the desired action on the website). Meanwhile, 1,300 users are shown Design B, with 210 users converting.

We want to use Bayesian Analysis to compare the conversion rates of both designs.

`import pymc as pm`

# Observed data

n_A = 1000

n_B = 1300

obs_A = 150

obs_B = 210

with pm.Model() as model:

# Prior distributions for the probabilities p_A and p_B

p_A = pm.Beta('p_A', alpha=2, beta=2)

p_B = pm.Beta('p_B', alpha=2, beta=2)

# Deterministic delta function to calculate the difference in p_A and p_B

delta = pm.Deterministic('delta', p_A - p_B)

# Observed data is modeled as a Binomial distribution

obs_A = pm.Binomial('obs_A', n=n_A, p=p_A, observed=obs_A)

obs_B = pm.Binomial('obs_B', n=n_B, p=p_B, observed=obs_B)

# Perform Markov Chain Monte Carlo sampling

trace = pm.sample(draws=2000, tune=1000, cores=2)

The prior distributions for `p_A`

and `p_B`

are modeled as Beta distributions. We've chosen `alpha=2`

and `beta=2`

as it yields a prior that is uniform-ish, but doesn't rule out extreme values of `p_A`

and `p_B`

.

The `delta`

function is deterministic and represents the difference between `p_A`

and `p_B`

.

The observed data is modeled as a Binomial distribution, where `n_A`

and `n_B`

are the number of trials (the number of users shown each design), and `p_A`

and `p_B`

are the unknown true probabilities of conversion for each design.

Finally, we perform MCMC sampling over this model, to infer the posterior distributions of `p_A`

, `p_B`

, and `delta`

.

We can visualize and examine the results using PyMC’s built-in functions:

`pm.plot_forest(trace, kind='ridgeplot',var_names=['p_A', 'p_B', 'delta'],combined=True)`

The `plot_posterior`

function will display the posterior distributions for `p_A`

, `p_B`

and `delta`

. This gives us a probabilistic understanding of the conversion rates of Design A and Design B, and their difference.

For instance, if most of the `delta`

distribution lies below zero (as we see above), we can be reasonably confident that Design B has a higher conversion rate. If `delta`

spans both positive and negative values, it's likely there's no significant difference between the two designs.

In conclusion, Bayesian A/B testing powered by PyMC offers a robust and flexible framework for analyzing experiments and making data-driven decisions. Instead of merely providing a point estimate or a binary outcome, it provides a full probability distribution over the parameters of interest. This allows us to quantify the uncertainty associated with our estimates, thus giving a richer understanding of the results.

In our example, the posterior distributions of `p_A`

, `p_B`

, and `delta`

not only give insights into the conversion rates of Design A and Design B, but also their comparative efficacy. By visualizing these distributions, we gain a nuanced view of the conversion dynamics, a step-up from traditional A/B testing methods.

As companies and data scientists strive to become increasingly data-driven, Bayesian methods facilitated by tools like PyMC are set to become even more critical. Whether it’s conversion rate optimization, customer behavior analysis, or any other realm where uncertainty reigns, Bayesian analysis enables us to tackle these challenges head-on.

This flexibility, combined with Python’s simplicity, makes PyMC an invaluable tool for anyone looking to harness the power of Bayesian analysis. So why wait? Start your Bayesian journey today, and unlock a new level of understanding from your data.