lecture 4: introduction to bayesian analysis using nimble

class: center, middle, inverse, title-slide

.title[
# lecture 4: introduction to bayesian analysis using nimble
]
.subtitle[
## FANR 8370
]
.author[
### Spring 2025
]

---

# readings

> Kéry & Schaub 38-40; 58-64

---
class: inverse, middle, center

# from custom mcmc to bugs

---
# the bugs language

#### **B**ayesian **A**nalysis **U**sing **G**ibbs **S**ampling

Language/program invented in the 1990's by epidemiologists in Cambridge

Later became WinBUGS

- First customizable MCMC software

- Revolutionized the use of Bayesian methods in applied statistics

---
# the bugs language

Since the development of WinBUGS, other Bayesian software programs have been developed:

- OpenBugs

- [JAGS](https://sourceforge.net/projects/mcmc-jags/files/)

- [Nimble](https://r-nimble.org/)

- [Stan](https://mc-stan.org/)

For the remainder of the course, we will fit models using Nimble

- Uses BUGS language and R (easy to learn, lots of online documentation)

- Out-performs WinBUGS

- Available for all operating systems

- Highly customizable

---
# the bugs language

#### Last week, we learned how to:

- Define a likelihood and priors

- Write a custom MCMC sampler to produce samples from the joint posterior distribution

Given a statistical model and user-specified prior distributions, Nimble does these steps for you!

- Possible to fit arbitrarily complex models

- "Frees the modeler in you"

???

remember, just because you *can* fit it doesn't mean it will return robust inferences. As we have already discussed, complex models often suffer from lack of identifiability and poor sampling of the joint posterior distribution. But, if you can write out a valid statistical model, Nimble can create a sampler and draw samples from the joint posterior distribution

---
# running nimble in r

### Uses the `nimble` package

- Write model in `R` script using the `nimbleCode()` function

- Send the model to nimble along with initial values and data using `nimbleModel()`

- Build the MCMC automatically, compile the model and run it

- Diagnostics/analysis/visualization conducted in `R`

---
# the bugs language

Very similar to `R` (but slight differences)

- Limited ability to vectorize operations

- Order of code less important (doesn't read top to bottom like `R`)

If you can write your model on the whiteboard, you can write it in Nimble

- Stochastic relationships represented by `~`

- Deterministic relationships represented by `<-`

- Since this is likelihood, uses the `d` part of distributions (e.g. dnorm, dpois, dgamma, etc.)

---
# the bugs language

#### Linear regression model

`$$\LARGE y_i = \alpha + \beta \times x_i + \epsilon_i$$`

`$$\LARGE \epsilon_i \sim normal(0, \sigma)$$`

---
# the bugs language

#### Linear regression model

`$$\LARGE y_i = \underbrace{\alpha + \beta \times x_i}_{Deterministic} + \underbrace{\epsilon_i}_{Stochastic}$$`

???

---
# the bugs language

#### Linear regression model

`$$\underbrace{\LARGE \mu_i = \alpha + \beta \times x_i}_{Deterministic}$$`

`$$\underbrace{\LARGE y_i \sim normal(\mu_i, \sigma)}_{Stochastic}$$`

Remember that these equations define the *likelihood* of our data given values of `$\alpha$`, `$\beta$`, and `$\sigma$`

---
# the bugs language

#### Linear regression model

To specify a fully Bayesian model, we also need to define the priors:

`$$\LARGE [\alpha] \sim normal(\alpha|0, 10)$$`

`$$\LARGE [\beta] \sim normal(\alpha|0, 10)$$`

`$$\LARGE [\sigma] \sim Gamma(\sigma|0.01, 0.01)$$`

---
# the bugs language

Linear regression model

```r
library(nimble)

seedmod <- nimbleCode({
 ## Priors
 alpha ~ dnorm(0, sd = 10) #specify sd argument to use stand dev rather than precision
 beta ~ dnorm(0, sd = 10)
 sigma ~ dgamma(.001,.001) # residual sd
 
 ## Likelihood
 for(i in 1:N){
 mu[i] <- alpha + beta * x[i]
 y[i] ~ dnorm(mu[i], sd = sigma)
 }
}) 
```

---
# preparing the data

```r
## Read simulated data frame
dat <- readRDS("data/sim_seed_counts.rds")

## Store data for as a named list. 
#Only 'data' if stochastic
nimdat <- list(y = dat$y)

#constants are for KNOWN values not from distributions
nimconsts <- list(x = dat$visits.c, N = nrow(dat))

## Create function that returns random initial values
niminits <- function(){list(alpha = runif(1, 200, 300), 
 beta = runif(1, 25, 75), 
 sigma = runif(1))}

## Store which parameters we want to monitor
params <- c("alpha", "beta", "sigma")
```

---
# run the model (one step method)

```r
 ## Run the model
nimfit <- nimbleMCMC(code = seedmod,
 data = nimdat,
 constants = nimconsts,
 inits = niminits(),
 monitors = params,
 thin = 1,
 niter = 10000,
 nburnin = 2500,
 nchains = 3,
 samplesAsCodaMCMC = TRUE
 )
```

---
# diagnostics
 
.small-code[

```r
str(nimfit)
```

```
## List of 3
##  $ chain1: 'mcmc' num [1:7500, 1:3] 252 250 251 250 251 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:3] "alpha" "beta" "sigma"
##   ..- attr(*, "mcpar")= num [1:3] 1 7500 1
##  $ chain2: 'mcmc' num [1:7500, 1:3] 252 251 251 250 250 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:3] "alpha" "beta" "sigma"
##   ..- attr(*, "mcpar")= num [1:3] 1 7500 1
##  $ chain3: 'mcmc' num [1:7500, 1:3] 250 250 250 251 251 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:3] "alpha" "beta" "sigma"
##   ..- attr(*, "mcpar")= num [1:3] 1 7500 1
##  - attr(*, "class")= chr "mcmc.list"
```
]

---
# diagnostics

```r
library(coda)
plot(nimfit)
```

---
# structure of the output

```r
class(nimfit)
```

```
## [1] "mcmc.list"
```

```r
names(nimfit)
```

```
## [1] "chain1" "chain2" "chain3"
```

---
# structure of the output

.small-code[

```r
summary(nimfit)
```

```
## 
## Iterations = 1:7500
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 7500 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean    SD Naive SE Time-series SE
## alpha 250.55 0.578  0.00385        0.00425
## beta   48.94 0.568  0.00379        0.00373
## sigma   7.49 0.412  0.00274        0.00599
## 
## 2. Quantiles for each variable:
## 
##         2.5%    25%    50%    75%  97.5%
## alpha 249.41 250.16 250.55 250.94 251.66
## beta   47.83  48.56  48.94  49.32  50.05
## sigma   6.75   7.19   7.47   7.77   8.34
```
]

---
# saving model output

```r
saveRDS(nimfit, "output/regression_out.rds")
```

Alternatively:

```r
MCMCvis::MCMCdiag(nimfit, 
                  mkdir = "regression",
                  file_name = 'regression_out',
                  dir = 'output',
                  save_obj = TRUE)
```