`diagis`

is an R package containing functions relating
weighted samples obtained for example from importance sampling. The main
motivation for developing `diagis`

was to enable easy
computation of summary statistics and diagnostics of the weighted
IS-MCMC runs provided by `bssm`

package (Helske and Vihola 2021; Vihola, Helske,
and Franks 2020) for Bayesian state space modelling. For more
broader use, the `diagis`

package provides functions for
computing (probability) weighted means, covariances, and quantiles of
possibly multivariate samples, the running versions of these, as well as
diagnostic plot function `weight_plot`

for graphical
diagnostic of weights.

All the mean and covariance functions are written in C++ using
`Rcpp`

(Eddelbuettel and
François 2011; Eddelbuettel 2013) and
`RcppArmadillo`

(Eddelbuettel and
Sanderson 2014) packages, making these function
computationally very efficient even for large samples. The weight
diagnostic plot uses `ggplot`

(Wickham 2009) for
visually appealing graphics, while `gridExtra`

(Auguie
2016) combines the plots together.

As an illustration, consider estimating the expected value of Gamma(\(\alpha, \beta\)) distribution using importance sampling. The density of Gamma(\(\alpha, \beta\)) is \[ p(x) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} \exp(-\beta x). \] We will use another Gamma distribution with parameters \(a\) and \(b\) as our proposal distribution \(q(x)\), so the weights are of form \[ w(x) = \frac{p(x)}{q(x)} = \frac{\frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} \exp(-\beta x)}{\frac{b^{a}}{\Gamma(a)} x^{a - 1} \exp(-b x)} = \frac{\Gamma(a)\beta^{\alpha}}{\Gamma(\alpha)b^a}x^{\alpha - a}\exp(-(\beta -b)x), \] and our importance sampling estimator for \(\theta = \textrm{E}_p(X)\) is \[ \hat \theta_n = \frac{1}{n}\sum_{i=1}^n Y_i w(Y_i), \] where \(Y_i, i = 1,\ldots, n\) are drawn from \(\textrm{Gamma}(a, b)\). However, in practice we often have access only to unnormalized weights \(w_u(x) = c w(x)\), which leads to self-normalized importance sampling estimate \[ \tilde \theta_n = \frac{1}{\sum_{i=1}^n w_u(Y_i)}\sum_{i=1}^n Y_i w_u(Y_i). \]

Note that the self-normalized estimate \(\tilde \theta_n\) is not unbiased, but
still consistent. Also, we might want to use self-normalized version
even when the computation of weights \(w\) is possible (Owen 2013). Thus
the functions in `diagis`

are focused on self-normalized
importance sampling as it is more generally applicable approach.

We can in principle choose \(a\) and \(b\) arbitrarily, but weights \(w(x)\) have finite variance only if \(a < \alpha\) and \(b < \beta\). As we will soon see, infinite variance of the weights can make the importance sampling unreliable.

Let us first take \(\alpha = 2\), and \(\beta = 1\). The expected value is then \(\alpha/\beta = 2\). For proposal distribution, let us first use \(a = 1\) and \(b = 0.5\):

```
library("diagis")
set.seed(1)
x <- rgamma(10000, 1, 0.75)
w <- dgamma(x, 2, 1) / dgamma(x, 1, 0.75)
plot(w)
```

`weighted_mean(x, w)`

`## [1] 2.012761`

Our estimate is fairly close to the theoretical value, and the plot of weights \(w\) suggest that the distribution of the weights is nearly uniform with upper bound slightly below 2 (actual bound can be computed theoretically based on \(w(x)\)).

Let us now change \(a\) to 2:

```
set.seed(1)
x_bad <- rgamma(10000, 1, 2)
w_bad <- dgamma(x_bad, 2, 1) / dgamma(x_bad, 1, 2)
plot(w_bad)
```

`weighted_mean(x_bad, w_bad)`

`## [1] 2.313655`

Now our importance sampling estimate is cleary off, and the plot of
the importance weights show what is going on: There is one huge weight
and couple others which will dominate our mean estimator. We can use
`weight_plot`

function to further illustrate the differences
between our two approaches. First let’s check the results from our good
IS case:

`weight_plot(w)`

The function `weight_plot`

draws four figures. The first one
shows 100 largest weights, again illustrating that there are no single
draws dominating the sample. On the second figure, the weights are
sorted and drawn in increasing order. This figure has mixed uses,
sometimes it shows interesting phenomena better than for example a
histogram (which often needs some tweaking), whereas in other cases its
information value is quite small. Here we see that the weights are
distributed quite uniformly. The third figure is often the most
important; it shows the variance of the weights computed from successive
samples $w_1, , w_t, \(t = 1, \ldots,
n\). If the importance weights have finite variance, this figure
should show clear converge towards finite values as is the case here.
The final figure shows the effective sample sizes again in a form of
running line. The effective sample size (ESS) is defined as \[
ESS_n = \frac{(\sum_{i=1}^n w_i)^2}{\sum_{i=1}^n w^2_i} =
\frac{1}{\sum_{i=1}^n \bar w^2_i},
\] where \(\bar w_i = w_i /
\sum_{i=1}^n w_i\). The interpretation of ESS is that our
importance sampling corresponds to the case with direct simulation from
the target distribution \(p(x)\) using
ESS samples (i.e. larger the ESS is better). Other measures of
efficiency are also available in literature, and they might be added to
`diagis`

in future.

Now let’s use `weight_plot`

function for our bad IS
run:

`weight_plot(w_bad)`

Oops! The figures are pretty self-explanatory, the running statistics look fine at first, but when the we take that one huge weight into account, the variance explodes and at the same time ESS drops significantly as that one \((x_i, w_i)\) pair dominates the sums in ESS.

The `diagis`

package also contains function
`weighted_quantile`

for computing weighted quantiles (or
perhaps more precisely percentiles, but we call them quantiles as ìn
`quantile`

method in base R). As for non-weighted quantiles,
there are several ways to compute weighted quantiles, and the one used
in `diagis`

is based on the following linear interpolation of
the weighted empirical CDF (in nonweighted case, this corresponds to
type 4 quantile in `quantile`

). Let \(x_1,\ldots,x_n\) be an ordered (increasing)
vector, \(w_1,\ldots,w_n\) the
corresponding weights with \(\sum_{i=1}^n
w_i=1\), and \(p\) the target
probability for which we want to find the corresponding quantile \(q\). Now let \(w'_j=\sum_{i=1}^jw_i\) be the
cumulative sum of weights from \(1\) to
\(j\), and find index \(k\) so that \(w_{k-1} < p\) and \(w_k \geq p\). Then

\[
q = x_{k-1} + (x_k - x_{k-1})\frac{p - w_{k-1}}{w_{k} - w_{k-1}}.
\] In case where \(x\) can
contain several identical values, these elements should be combined (and
corresponding weights added together), which is done automatically in
`weighted_quantile`

.

Auguie, Baptiste. 2016. *gridExtra: Miscellaneous Functions for
"Grid" Graphics*. https://CRAN.R-project.org/package=gridExtra.

Eddelbuettel, Dirk. 2013. *Seamless R and
C++ Integration with Rcpp*. New York:
Springer.

Eddelbuettel, Dirk, and Romain François. 2011. “Rcpp:
Seamless R and C++ Integration.”
*Journal of Statistical Software* 40 (8): 1–18. https://doi.org/10.18637/jss.v040.i08.

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo:
Accelerating r with High-Performance c++ Linear Algebra.”
*Computational Statistics and Data Analysis* 71: 1054–63. https://dx.doi.org/10.1016/j.csda.2013.02.005.

Helske, Jouni, and Matti Vihola. 2021. “Bssm: Bayesian Inference
of Non-Linear and Non-Gaussian State Space Models in
R.” *R Journal (to Appear)*. https://arxiv.org/abs/2101.08492.

Owen, Art B. 2013. *Monte Carlo Theory, Methods and Examples*. http://statweb.stanford.edu/~owen/mc/.

Vihola, Matti, Jouni Helske, and Jordan Franks. 2020. “Importance
Sampling Type Estimators Based on Approximate Marginal
MCMC.” *Scandinavian Journal of Statistics*.
https://doi.org/10.1111/sjos.12492.

Wickham, Hadley. 2009. *Ggplot2: Elegant Graphics for Data
Analysis*. Springer-Verlag New York. https://ggplot2.tidyverse.org.