Pyro is a deep probabilistic programming framework based on PyTorch. TensorFlow has its own PPL branch with an Edward taste and there is the inevitable PyMC3 as well but Pyro feels very natural and the API more direct than the aforementioned.

The example below is simplistic but shows what I mean by ‘natural’.

import torch
import pyro
import pyro.distributions as dist

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sea
%matplotlib inline

You get immediate results with Pyro or should one say ‘eager execution’:

normal = dist.Normal(0, 1) 
x = normal.sample()  
print(f"sample: {x}")
print(f"log probability: {normal.log_prob(x)}")  

Defining a variable is different than PyMC3. A normally distribute variable ‘size’ would be:

x = pyro.sample("size", normal)

By combining probabilistic variables you can create a simplistic weather model liks so:

def weather():
    cloudy = pyro.sample('cloudy', dist.Bernoulli(0.3))
    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]
    temp = pyro.sample('temp', dist.Normal(mean_temp, scale_temp))
    return cloudy, temp.item()

The logic is straightforward and you can get a few states:

for _ in range(3):

Let’s model the sales of ice-cream in function of the weather:

def ice_cream_sales():
    cloudy, temp = weather()
    expected_sales = 200. if cloudy == 'sunny' and temp > 80.0 else 50.
    ice_cream = pyro.sample('ice_cream', dist.Normal(expected_sales, 10.0))
    return ice_cream

Unlike other PPL frameworks you can get an idea directly from this definition

print(f"Possible sales of ice-cream: {ice_cream_sales():.2f}")

The key-idea behind PPL is to have a marginal distribution of the relevant variable. Marginal means, in simple terms, getting a probability distribution with all de inner-dependencies integrated out.

To do this in Pyro you first define the posterior:

posterior = pyro.infer.Importance(ice_cream_sales, num_samples=100)

and feed it to the EmpiricalMarginal:

marginal = pyro.infer.EmpiricalMarginal(

To see it in a histogran use something like this:

plt.hist([marginal().item() for _ in range(1000)], range=(5.0, 100.0))

Considering the weather model and how the sales depends on it, the most likely value is around 50.

Both the way the story is assembled and how concrete the result is strikes me as fantastically clear. Bayesian reasoning and PPL can be abstract at first but this example brings it all together for me.

If we look at a linear regression in PyMC3 versus Pyro you get something like the following.

%matplotlib inline
import pymc3 as pm
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
palette = 'muted'
pd.set_option('display.precision', 2)

N = 100
alfa_real = 2.5
beta_real = 0.9
eps_real = np.random.normal(0, 0.5, size=N)

x = np.random.normal(10, 1, N)
y_real = alfa_real + beta_real * x
y = y_real + eps_real

plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(x, y, 'b.')
plt.xlabel('$x$', fontsize=16)
plt.ylabel('$y$', fontsize=16, rotation=0)
plt.plot(x, y_real, 'k')
plt.subplot(1, 2, 2)
plt.xlabel('$y$', fontsize=16)
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=1)
    epsilon = pm.HalfCauchy('epsilon', 5)

    mu = pm.Deterministic('mu', alpha + beta * x)
    y_pred = pm.Normal('y_pred', mu=mu, sd=epsilon, observed=y)

    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(10000, step, start)

The same in Pyro look like this:

import torch
import pyro
from pyro.distributions import Normal, HalfCauchy
from pyro.infer.mcmc import NUTS, MCMC
from pyro.infer import EmpiricalMarginal
import seaborn as sns
import matplotlib.pyplot as plt

def model(X, y):
    b_est = pyro.sample("beta", Normal(0,1))
    a_est = pyro.sample("alpha", Normal(0,1))
    s2_est = pyro.sample("variance", HalfCauchy(1))
    y_hat = pyro.sample("yhat", Normal(loc=b_est * X + a_est , scale=s2_est), obs=y)
    return y_hat

X = torch.tensor(range(100)).float()
b =torch.tensor(3.0).float()
a =torch.tensor(2.0).float()
noise = torch.randn((100,)).float()
y = X * b + a + noise
nuts_kernel = NUTS(model, adapt_step_size=True)
mcmc_run = MCMC(nuts_kernel, num_samples=1000, warmup_steps=400).run(X, y)

posterior_beta = EmpiricalMarginal(mcmc_run, "beta")
posterior_alpha = EmpiricalMarginal(mcmc_run, "alpha")
fig, ax = plt.subplots(1,2,figsize=(15,5))
sns.distplot(posterior_beta.get_samples_and_weights()[0], ax=ax[0])
sns.distplot(posterior_alpha.get_samples_and_weights()[0], ax=ax[1])