hmclearn: Poisson Regression Example

Introduction

This vignette demonstrates fitting a Poisson regression model via Hamiltonian Monte Carlo (HMC) using the hmclearn package.

library(hmclearn)

For a count response, we let

\[ p(y | \mu) = \frac{e^{-\mu}\mu^y}{y!}, \]

with a log-link function

\[ \begin{aligned} \boldsymbol\mu &:= E(\mathbf{y} | \mathbf{X}) = e^{\mathbf{X}\boldsymbol\beta}, \\ \log \boldsymbol\mu &= \mathbf{X}\boldsymbol\beta. \end{aligned} \] The vector of responses is \(\mathbf{y} = (y_1, ..., y_n)^T\). The covariate values for subject \(i\) are \(\mathbf{x}_i^T = (x_{i0}, ..., x_{iq})\) for \(q\) covariates plus an intercept. We write the full design matrix as \(\mathbf{X} = (\mathbf{x}_1^T, ..., \mathbf{x}_n^T)^T \in \mathbb{R}^{n\times(q+1)}\) for \(n\) observations. The regression coefficients are a vector of length \(q + 1\), \(\boldsymbol\beta = (\beta_0, ..., \beta_q)^T\).

Derive log posterior and gradient for HMC

We develop the likelihood

\[ \begin{aligned} f(\mathbf{y} | \boldsymbol\mu) &= \prod_{i=1}^n \frac{e^{-\mu_i}\mu_i^{y_i}}{y_i!}, \\ f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) &= \prod_{i=1}^n \frac{e^{-e^{\mathbf{x}_i^T\boldsymbol\beta}}e^{y_i\mathbf{x}_i^T\boldsymbol\beta}}{y_i!}, \\ \end{aligned} \]

and log-likelihood

\[ \begin{aligned} f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) &= \sum_{i=1}^n -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta - \log y_i!, \\ &\propto \sum_{i=1}^n -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta. \end{aligned} \]

We set a multivariate normal prior for \(\boldsymbol\beta\)

\[ \begin{aligned} \boldsymbol\beta &\sim N(0, \sigma_\beta^2 \mathbf{I}), \end{aligned} \]

with pdf, omitting constants

\[ \begin{aligned} \pi(\boldsymbol\beta | \sigma_\beta^2) &= \frac{1}{\sqrt{\lvert 2\pi \sigma_\beta^2 \rvert }}e^{-\frac{1}{2}\boldsymbol\beta^T \boldsymbol\beta / \sigma_\beta^2}, \\ \log \pi(\boldsymbol\beta | \sigma_\beta^2) &= -\frac{1}{2}\log(2\pi \sigma_\beta^2) - \frac{1}{2}\boldsymbol\beta^T \boldsymbol\beta / \sigma_\beta^2, \\ &\propto -\frac{1}{2}\log \sigma_\beta^2 - \frac{\boldsymbol\beta^T\boldsymbol\beta}{2\sigma_\beta^2}. \end{aligned} \]

Next, we derive the log posterior, omitting constants,

\[ \begin{aligned} f(\boldsymbol\beta | \mathbf{X}, \mathbf{y}, \sigma_\beta^2) &\propto f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) \pi(\boldsymbol\beta | \sigma_\beta^2), \\ \log f(\boldsymbol\beta | \mathbf{X}, \mathbf{y}, \sigma_\beta^2) & \propto \log f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) + \log \pi(\beta | \sigma_\beta^2), \\ &\propto \sum_{i=1}^n \left( -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta\right) - \frac{1}{2}\beta^T\beta/\sigma_\beta^2, \\ &\propto \sum_{i=1}^n \left( -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta\right) - \frac{\boldsymbol\beta^T\boldsymbol\beta}{2\sigma_\beta^2}, \\ &\propto \mathbf{y}^T\mathbf{X}\boldsymbol\beta - \mathbf{1}_n^T e^{\mathbf{X}\boldsymbol\beta} - \frac{\boldsymbol\beta^T\boldsymbol\beta}{2\sigma_\beta^2}. \end{aligned} \]

We need to derive the gradient of the log posterior for the leapfrog function in HMC.

\[ \begin{aligned} \nabla_{\boldsymbol\beta}\log f(\boldsymbol\beta|\mathbf{X}, \mathbf{y}, \sigma_\beta^2) &\propto \sum_{i=1}^n\left( -e^{\mathbf{x}_i^T\boldsymbol\beta}\mathbf{x}_i + y_i\mathbf{x}_i\right) - \boldsymbol\beta / \sigma_\beta^2, \\ &\propto \sum_{i=1}^n \mathbf{x}_i\left( -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i\right) - \boldsymbol\beta / \sigma_\beta^2, \\ &\propto \mathbf{X}^T (\mathbf{y} - e^{\mathbf{X}\boldsymbol\beta}) - \boldsymbol\beta/ \sigma_\beta^2. \end{aligned} \]

Poisson Regression Example Data

The user must define provide the design matrix directly for use in hmclearn. Our first step is to load the data and store the design matrix \(X\) and dependent variable vector \(y\).

We load drug usage data and create the design matrix \(X\) and dependent vector \(y\). This example also appears in Agresti (2015), and we compare results to his.

data(Drugs)

# design matrix
X <- model.matrix(count ~ A + C + M + A:C + A:M + C:M , data=Drugs)
X <- X[, 1:ncol(X)]

# independent variable is count data
y <- Drugs$count

Comparison model - Frequentist

To compare results, we first fit a standard linear model using the frequentist function glm.

# matrix representation
f <- glm(y ~ X-1, family=poisson(link=log))
summary(f)
#> 
#> Call:
#> glm(formula = y ~ X - 1, family = poisson(link = log))
#> 
#> Deviance Residuals: 
#>        1         2         3         4         5         6  
#>  0.02044  -0.02658  -0.09256   0.02890  -0.33428   0.09452  
#>        7         8  
#>  0.49134  -0.03690  
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> X(Intercept)  5.63342    0.05970  94.361  < 2e-16 ***
#> XAyes         0.48772    0.07577   6.437 1.22e-10 ***
#> XCyes        -1.88667    0.16270 -11.596  < 2e-16 ***
#> XMyes        -5.30904    0.47520 -11.172  < 2e-16 ***
#> XAyes:Cyes    2.05453    0.17406  11.803  < 2e-16 ***
#> XAyes:Myes    2.98601    0.46468   6.426 1.31e-10 ***
#> XCyes:Myes    2.84789    0.16384  17.382  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 2.4038e+04  on 8  degrees of freedom
#> Residual deviance: 3.7399e-01  on 1  degrees of freedom
#> AIC: 63.417
#> 
#> Number of Fisher Scoring iterations: 4

Fit model using hmc

Next, we fit the poisson regression model using HMC. A vector of \(\epsilon\) values are specified to align with the data.

N <- 1e4

eps_vals <- c(rep(5e-4, 2), 1e-3, 2e-3, 1e-3, 2e-3, 5e-4)

set.seed(412)
t1.hmc <- Sys.time()
 f_hmc <- hmc(N = N, theta.init = rep(0, 7),
                    epsilon = eps_vals, L = 50,
                    logPOSTERIOR = poisson_posterior,
                    glogPOSTERIOR = g_poisson_posterior,
                    varnames = colnames(X),
                    parallel=FALSE, chains=2,
                    param=list(y=y, X=X))
t2.hmc <- Sys.time()
t2.hmc - t1.hmc
#> Time difference of 51.99019 secs

MCMC summary and diagnostics

The acceptance ratio for each of the HMC chains is sufficiently high for an efficient simulation.

f_hmc$accept/N
#> [1] 0.9804 0.9829

The posterior quantiles are summarized after removing an initial burnin period. The \(\hat{R}\) statistics provide an indication of convergence. Values close to one indicate that the multiple MCMC chains coverged to the same distribution, while values above 1.1 indicate possible convergence problems. All \(\hat{R}\) values in our example are close to one.

summary(f_hmc, burnin=3000, probs=c(0.025, 0.5, 0.975))
#> Summary of MCMC simulation
#>                   2.5%        50%      97.5%     rhat
#> (Intercept)  5.5238199  5.6327435  5.7521131 1.001527
#> Ayes         0.3379506  0.4875867  0.6251244 1.006951
#> Cyes        -2.2176007 -1.8992300 -1.6078705 1.001525
#> Myes        -6.3840770 -5.3643129 -4.4945189 1.007716
#> Ayes:Cyes    1.7495447  2.0670083  2.4061751 1.005151
#> Ayes:Myes    2.1572253  3.0200944  4.1225859 1.018774
#> Cyes:Myes    2.5174705  2.8574315  3.1983475 1.026709

Trace plots provide a visual indication of stationarity. These plots indicate that the MCMC chains are reasonably stationary.

mcmc_trace(f_hmc, burnin=3000)

plot of chunk fig1poisson

Histograms of the posterior distribution show that Bayesian parameter estimates align with Frequentist estimates.

beta.freq <- coef(f)
diagplots(f_hmc, burnin=3000, comparison.theta=beta.freq)
#> $histogram

plot of chunk fit2poisson

Source

Data originally provided by Harry Khamis, Wright State University

References

Agresti, A. (2015). Foundations of linear and generalized linear models. John Wiley & Sons. ISBN: 978-1-118-73003-4

Thomas, Samuel, and Wanzhu Tu. “Learning Hamiltonian Monte Carlo in R.” arXiv preprint arXiv:2006.16194 (2020).

Samuel Thomas

2020-10-04