Lecture 21

class: center, middle, inverse, title-slide

.title[
# Lecture 21
]
.subtitle[
## Bonus Lecture - Multispecies Occupancy
]
.author[
### <br/><br/><br/>Spring 2025
]

---

## Readings

> Guillera-Arroita G, Kéry M, Lahoz-Monfort JJ. Inferring species richness using multispecies occupancy modeling: Estimation performance and interpretation. Ecol Evol. 2019 Feb 5;9(2):780-792. doi: 10.1002/ece3.4821. PMID: 30766668; PMCID: PMC6362448.

> Devarajan, K., Morelli, T.L. and Tenan, S. (2020), Multi-species occupancy models: review, roadmap, and recommendations. Ecography, 43: 1612-1624. https://doi.org/10.1111/ecog.04957

> Tingley MW, Nadeau CP, Sandor ME. Multi-species occupancy models as robust estimators of community richness. Methods Ecol Evol. 2020; 11: 633–642. https://doi.org/10.1111/2041-210X.13378

---
## Multi-Species Occupancy Models

Like single species occupancy models, MSOMs are built on the encounter histories of species across sites in a region during repeated visits

Based on the concept that a community is an ensemble of species occurring at a site and a metacommunity is the collection of such communities

Typically used to estimate species richness at both the community and metacommunity level

---
## Conceptual Framework from Guillera-Arroita et al 2019

---
## Conceptual Framework from Guillera-Arroita et al 2019

---
## Conceptual Framework from Guillera-Arroita et al 2019

---
## Assumptions and their reprocussions

#### Geographic and demographic closure

#### Independence

#### Accurate identification

#### Ecological similarity in species

---
## Assumptions and their reprocussions

#### Geographic and demographic closure

+  Sites represent closed populations; No births, deaths, colonization or extinction during the surveys
  
  
        +   Violations usually inflate species richness 
  
---
## Assumptions and their reprocussions

#### Independence

+ Detection and occupancy probability at site A is independent of detection and occupancy at site B 
  
  + Frequently violated when there is oversampling or clustered sampling 
  
  + Spatial correlations usually not a problem
  
  + Luckily, violations can be tested for statistically! 
  
    
      + Violations lead to overestimate of precision of estimates 
      
  
---
## Assumptions and their reprocussions

#### Accurate identification

+ We assume observers can correctly separate species from one another 
  
  + Easily violated during any auditory surveys (frogs, insects, birds, etc.)

+ Fluctuations in detectability or observer error can significantly impact and bias occupancy estimates
    
    + False positives result in an overestimation of species richness
    
    + Can partially address by modeling false positives if misidentification and 
      detection probabilities are caused by different processes
    
    
---
## Assumptions and their reprocussions

#### Ecological similarity in species

+ Assume that species in the community are similar, resulting in species-specific random effects that are drawn from the same distribution

+ Violating this assumption leads to prediction errors (just straight up wrong answers)
    
    
---
## Basic Mathematical Structure

`$x_i$` that is 1 for real specie and 0 for fake species with probability `$\omega$`:

`$$x_i \sim Bernoulli(\omega)$$`

Occupancy status of each species `$i$` at each site `$j$`:

`$$z_{i,j} \sim Bernoulli(x_i \psi_i)$$`

`$$logit(\psi_{i,j}) \sim Normal(\mu_{\psi}, \sigma^2_{\psi})$$`

`$$y_{i,j} \sim Binomial(z_{i,j}p_i, K)$$`

`$p_i$` is a species-specific random effect:

`$$logit(p_i) \sim Normal(\mu_{p}, \sigma_{p}^2)$$`

---
## Basic Mathematical Structure

Richness at each site:

`$$\Large R_j = \sum_{i=1}^M z_{i,j}$$`

Richness overall:

`$$\Large R = \sum_{i=1}^M x_{i}$$`

**Note: Even with model assumptions perfectly met, estimation of the total number of species can be poor if many species are missed (>15%)**

---
## False Positives

False positives are easier to deal with when we only have one species because we don't have to identify who the false detection 'belongs to'.

Without extra data, you cannot reliably model false positives in a multi-species framework.

Table stolen from:

Miller, D.A., Nichols, J.D., McClintock, B.T., Grant, E.H.C., Bailey, L.L. and Weir, L.A. (2011), Improving occupancy estimation when two types of observational error occur: non-detection and species misidentification. Ecology, 92: 1422-1428. https://doi.org/10.1890/10-1396.1

---
## Multispecies Occupancy in Nimble

``` r
nimbleCode({
   # Superpopulation process
   for (k in 1:M){
     w[k] ~ dbern(omega) # Community membership indicator
   }
   
   # Ecological model for latent occurrence z (process model)
   for (k in 1:M){
    # Priors to describe heterogeneity among species in community
     logit(psi[k]) <- lpsi[k] #present/absent prob
     lpsi[k] ~ dnorm(mu.lpsi, sd = sd.lpsi) #present/absent on normal scale
     lp[k] ~ dnorm(mu.lp, sd = sd.lp) #detection on normal scale
     for (i in 1:nsite) {
       z[i,k] ~ dbern(w[k] * psi[k]) #gotta be real to be present
     }
   }
```

---
## Multispecies Occupancy in Nimble

``` r
   # Observation model for observed detection frequencies
   for (k in 1:M){
     logit(p[k]) <- lp[k]
     for (i in 1:nsite) {
       y[i,k] ~ dbin(z[i,k] * p[k], ntrials[i])
     }
   }
   
   # Hyperpriors to describe full community
   omega ~ dbeta(1, 1) # For data augmentation (probability of community membership)
   mu.lpsi ~ dnorm(0,sd = 10) # Community mean of occupancy (logit)
   mu.lp ~ dnorm(0,sd = 10) # Community mean of detection (logit)
   sd.lpsi ~ dexp(1) # Species heterogeneity in logit(psi)
   sd.lp ~ dexp(1) # Species heterogeneity in logit(p)
```

---
## Multispecies Occupancy in Nimble

``` r
   # Derived quantities
   for (k in 1:M){
     Socc.fs[k] <- sum(z[1:nsite,k]) # Number of occupied sites among the sampled ones
     speciesP[k]<-(Socc.fs[k]>0)*1
   }
   
   Nsmall<-sum(speciesP[1:M]) #total species present
   for (i in 1:nsite) {
     Nsite[i] <- sum(z[i,1:M]) # Number of occurring species at each site
   }
   Ntotal <- sum(w[1:M]) # Total metacommunity size
})
```

---
## Example - Blantant Self Promotion

---
## Example - Southern Appalachian Birds

This study was conducted across an elevation gradient and 10 years. The original study also used abundance. We'll just analyze occupancy in 2018.

---
## Example - Southern Appalachian Birds

The data for this example are located in the WILD8370 package, or you can find them online where the paper is published.

Each column is a species, and each row is a point count location. There are 110 locations, with 45 total species detected. Each site was surveyed 4 times.

``` r
library(WILD8370)
data("Coweeta_birds")
str(Coweeta_birds)
```

```
## List of 2
##  $ obs : num [1:110, 1:4, 1:45] 0 0 0 0 0 0 0 0 0 0 ...
##  $ temp: num [1:110] 1.06 0.933 0.735 0.828 0.922 ...
```

---
## Example - Southern Appalachian Birds

First we need to convert our data into occupancy from abundance.

``` r
Coweeta_birds$obs[Coweeta_birds$obs >0] <- 1
```

Let's look at raw occupancy to get an idea of our data set. First, how many species did we see per site?

``` r
rowSums(apply(Coweeta_birds$obs, c(1, 3), function(x) any(x > 0)))
```

```
##   [1] 12  8 10  4  4 11  5  1  9  6  2  2  2  7 10 11  5  4  6  2  8  5 10  7 10
##  [26] 11  4  1  7 12  9 14 11  2  5 11  9  1  7 11  4  8  7  4  4  7  6  6  5  9
##  [51]  9  9  8  3  5 10  6 10  7  7  6  9  4  9  6  8 11  7  3  6  5  6  4  5 10
##  [76]  9  8  9  7  4  6  6  2  8  7 14 10 11  8  7  5  9  8 10 10  6  4  5  9  4
## [101]  7  7  7  5  5  3  9  5  6  7
```

---
## Example - Southern Appalachian Birds

How many sites did we see each species at? (The 0s are because some of these species were seen in other years, just not 2018).

``` r
colSums(apply(Coweeta_birds$obs, c(1, 3), function(x) any(x > 0)))
```

```
##  [1]  2 57 55 20  5 52 23 15  3 29  3  8  2  3  5  0 25 15  0  0  4 36  2  4  3
## [26] 26 79  1 14 16  1 30  1  0 45 51  0 37 33  4 11  3 32  1  0
```

---
## Example - Southern Appalachian Birds

Now we need to setup our data for the model. Let's add an additional 30 species that we could have missed.

``` r
M <- dim(Coweeta_birds$obs)[3] + 30
nsites <- nrow(Coweeta_birds$obs)
y <- apply(Coweeta_birds$obs, c(1,3), sum)
ynew <- cbind(y, array(0, c(nsites, 30)))
nd <- list(y = ynew)
str(ynew)
```

```
##  num [1:110, 1:75] 0 0 0 0 0 0 0 0 0 0 ...
```

``` r
ynew[1:2, 1:10]
```

```
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    0    3    3    0    3    1    1    0    0     0
## [2,]    0    4    0    0    0    0    2    0    0     0
```

---
## Example - Southern Appalachian Birds

We'll use the model we saw before, but let occupancy change with temperature, since temperature and elevation are strongly correlated at this location (top of the mountain is cold, bottom of the mountain is warm).

Note: This is *NOT* a good model.

``` r
nimbleCode({
   # Superpopulation process
   for (k in 1:M){
     w[k] ~ dbern(omega) # Community membership indicator
   }
   
   # Ecological model for latent occurrence z (process model)
   for (k in 1:M){
    # Priors to describe heterogeneity among species in community
     lp[k] ~ dnorm(mu.lp, sd = sd.lp) #detection on normal scale
     for (i in 1:nsite) {
       logit(psi[i,k]) <- mu.lpsi + psi.t[k]*temp[i] #present/absent prob
       z[i,k] ~ dbern(w[k] * psi[i, k]) #gotta be real to be present
     }
   }
```

---
## Example - Southern Appalachian Birds
<style type="text/css">
.smallsmall .remark-code { 
font-size: 45% !important;
}
</style>

.smallsmall[

``` r
birdmod <- nimbleCode({
   # Superpopulation process
   for (k in 1:M){
     w[k] ~ dbern(omega) # Community membership indicator
   }
   
   # Ecological model for latent occurrence z (process model)
   for (k in 1:M){
    # Priors to describe heterogeneity among species in community
     lp[k] ~ dnorm(mu.lp, sd = sd.lp) #detection on normal scale
     psi.t[k] ~ dnorm(0, sd = sd.psi.t)
     for (i in 1:nsite) {
       logit(psi[i,k]) <- mu.lpsi + psi.t[k]*temp[i] #present/absent prob
       z[i,k] ~ dbern(w[k] * psi[i, k]) #gotta be real to be present
     }
  
     logit(p[k]) <- lp[k]
     for (i in 1:nsite) {
       y[i,k] ~ dbin(z[i,k] * p[k], ntrials[i])
     }
   }
   
   # Hyperpriors to describe full community
   omega ~ dbeta(1,1) # For data augmentation (probability of community membership)
   mu.lpsi ~ dnorm(0,sd = 10) # Community mean of occupancy (logit)
   mu.lp ~ dnorm(0,sd = 10) # Community mean of detection (logit)
   sd.lp ~ dexp(1) # Species heterogeneity in logit(p)
   sd.psi.t ~ dexp(1)

for (k in 1:M){
     Socc.fs[k] <- sum(z[1:nsite,k]) # Number of occupied sites 
     speciesP[k] <- (Socc.fs[k]>0)*1
   }
   
   Nsmall<-sum(speciesP[1:M]) #total species present
   for (i in 1:nsite) {
     Nsite[i] <- sum(z[i,1:M]) # Number of occurring species at each site
   }
   Ntotal <- sum(w[1:M]) # Total metacommunity size
})
```
]

---
## Example - Southern Appalachian Birds

Most of the annoyance of this model will be initial values. Temperature is already scaled, woot woot!

``` r
nseen <- 45
nc <- list(nsite = nsites, M = M, temp = Coweeta_birds$temp, ntrials = rep(4, nsites))
ni <- list(w = c(rep(1, nseen), rep(0, 30)), #start only ones we saw as real
           lp = rep(.25, M), #plogis(.25) = 56% detection
           mu.lp = 0.25, 
           sd.lp = 1,
           sd.psi.t = 1,
           mu.lpsi = 0.25, #50% average chance of occupancy
           psi.t = rep(0, M), #start with no effect of temperature
           omega = nseen/M,
           z = cbind(array(1, c(nsites, nseen)), 
                     array(0, c(nsites, M-nseen)))
)
params <- c('Ntotal', 'Nsite', 'Nsmall', 'omega', 'w', 'lp', 
            'mu.lp', 'sd.lp', 'mu.lpsi', 'psi.t', 'sd.psi.t', 'Socc.fs')
```

---
## Example - Southern Appalachian Birds

Check model

``` r
prepnim <- nimbleModel(code = birdmod, constants = nc,
                           data = nd, inits = ni, calculate = T)
prepnim$initializeInfo()
prepnim$calculate()
```

```
## [1] -18131
```

---
## Example - Southern Appalachian Birds

Run the model

``` r
birds.out <- nimbleMCMC(code = birdmod,
                     data = nd,
                     constants = nc,
                     inits = ni,
                     monitors = params,
                     thin = 1,
                     niter = 10000,
                     nburnin = 4000,
                     nchains = 3,
                     samplesAsCodaMCMC = TRUE
                      )
```

---
## Example - Southern Appalachian Birds

Let's check convergence

``` r
MCMCvis::MCMCtrace(birds.out, params = c('Nsmall', 'omega', 'Ntotal'), pdf = F, Rhat = T, n.eff = T)
```

---
## Example - Southern Appalachian Birds

``` r
ps <- MCMCvis::MCMCsummary(birds.out, params = 'lp')
gg_p <- data.frame(median = plogis(ps$`50%`),
                   LCI = plogis(ps$`2.5%`),
                   UCI = plogis(ps$`97.5%`),
                   species = 1:M)
```

---
## Example - Southern Appalachian Birds

``` r
psis <- MCMCvis::MCMCsummary(birds.out, params = c('mu.lpsi', 'psi.t'))
tseq <- seq(-2, 2, by = .1)
mean_psi <- array(NA, c(M, length(tseq)))
for(k in 1:M){
  mean_psi[k,] <- plogis(psis$`50%`[1] + tseq*psis$`50%`[k+1])
}
```
<img src="21_MultiOcc_files/figure-html/unnamed-chunk-26-1.png" width="576" style="display: block; margin: auto;" />

---
## Conclusions

+ Multispecies occupancy has the potential to be very powerful but can also easily go wrong

+ Not very useful if you have poor detection or rare species 
  
  
  + Need to consider how large `$M$` can realistically be 
  
  
  + Dr. Gaya doesn't like these models, so these slides are biased