LECTURE 13: random effects

class: center, middle, inverse, title-slide

.title[
# LECTURE 13: random effects
]
.subtitle[
## FANR 6750 (Experimental design)
]
.author[
### Fall 2023
]

---

class: inverse

# outline

1) Independence assumption

2) Random effects vs. fixed effects

3) Random effects in linear models

---
# independence assumption

#### As we have seen several times, linear models assume that residuals are independent

#### Non-independence often stems from unmodeled "group" structure in the data

---
# example

We are interested in how blood mercury levels influence the weight of Wood Stork chicks. Wood Storks nest in large colonies, so we randomly select 5 colonies for sampling. Within each colony, we randomly select 20 chicks from different nests and then measure their mass at hatching.

.pull-left[
<img src="https://upload.wikimedia.org/wikipedia/commons/6/6e/Mycteria_americana_73472346.jpg" style="display: block; margin: auto;" />
]

.pull-right[
.vsmall-code[

```r
data("storkdata")
head(storkdata)
```

```
##   Colony Mercury Weight
## 1     C1  7.7476  48.60
## 2     C1 23.7235  46.75
## 3     C1 16.4074  51.31
## 4     C2  2.6362  68.25
## 5     C2  0.4626  73.36
## 6     C2 16.0777  65.37
```
]
]

---
# example

.pull-left[
<img src="https://upload.wikimedia.org/wikipedia/commons/6/6e/Mycteria_americana_73472346.jpg" style="display: block; margin: auto;" />
]

.pull-right[

What is the experimental unit of this study?

What is the observational unit?

Where do we expect non-independence to arise?
]

---
# example

```r
fm1 <- lm(Weight ~ Mercury, data = storkdata)
```

---
# example

```r
fm2 <- lm(Weight ~ Mercury + Colony, 
 data = storkdata)
```

---
# example

**Question**: Where does sampling error come from in this model?

**Answer**: At least two places:

- random selection of chicks within colonies

- random selection of colonies

Because we are sampling at multiple scales, there is uncertainty associated with each level

- If we re-ran the study, we would get slightly different estimates of each parameter because the exact colonies/chicks would be different

For both statistical and inferential reasons, our model should ideally account for sampling processes (and the associated non-independence) at each level

---
class:inverse, middle, center

# random effects

---
# what are random effects

--
- Fixed effects are constant across observational units, random effects vary across units

--
- Fixed effects are used when you are interested in the specific factor levels, random effects are used when you are interested in the underlying population

--
- When factors levels are exhaustive, they are fixed. When they are a sample of the possible levels, they are random

--
- Random effects are the realized values of a random variable

--
- Fixed effects are estimated by maximum likelihood, random effects are estimated with shrinkage

???

Definitions from Gelman & Hill (2007) pg. 245

---
# what are random effects

Previously, when we included factors in the model, we treated them as *fixed effects*. We chose each level for a reason and if we repeated the experiment, we would use the same levels, either because those are the specific levels we are interested in or because those are all of the possible levels

- Low, medium, high

- Brand A, Brand B, Brand C

- Species 1, species 2, species 3

A random effects model is appropriate when the treatment levels in the experiment can be considered a sample from a larger population of interest

- In the previous example, we are interested in all colonies, not just the ones included in the study

---
# fixed vs. random effects

Fixed-effect formulation:

`$$\large y_{ij} = \mu + \alpha_j + \beta x_i + \epsilon_{ij}$$`
`$$\large \epsilon_{ij} \sim normal(0, \sigma^2)$$`

- `$y_{ij}$` = weight of chick `$i$` in colony `$j$`; `$x_i$` = mercury level of chick `$i$`

---
# fixed vs. random effects

Random-effect formulation:

`$$\large y_{ij} = \mu + \alpha_j + \beta x_i + \epsilon_{ij}$$`
`$$\large \alpha_j \sim normal(0, \sigma^2_C)$$`

`$$\large \epsilon_{ij} \sim normal(0, \sigma^2)$$`

- `$y_{ij}$` = weight of chick `$i$` in colony `$j$`; `$x_i$` = mercury level of chick `$i$`

---
# fixed vs. random effects

Random-effect formulation:

`$$\large y_{ij} = \mu + \alpha_j + \beta x_i + \epsilon_{ij}$$`
`$$\large \alpha_j \sim normal(0, \sigma^2_C)$$`

`$$\large \epsilon_{ij} \sim normal(0, \sigma^2)$$`

- `$y_{ij}$` = weight of chick `$i$` in colony `$j$`; `$x_i$` = mercury level of chick `$i$`

---
# fixed vs. random effects

Random-effect formulation:

`$$\large y_{ij} = \mu + \alpha_{[j]} + \beta x_i + \epsilon_{ij}$$`
`$$\large \alpha_{[j]} \sim normal(0, \sigma^2_C)$$`

`$$\large \epsilon_{ij} \sim normal(0, \sigma^2)$$`

- `$y_{ij}$` = weight of chick `$i$` in colony `$j$`; `$x_i$` = mercury level of chick `$i$`

---
# hypotheses

#### In a random-effects model, our interest is in assessing how much variation there is among all the effects in the population, not just the ones in our sample

#### As a result, our hypotheses must be written differently:

#### Fixed-effects model

`$$\large H_0 : \alpha_1 = \alpha_2 = ... = \alpha_a = 0$$`

`$$\large H_a : At\;least\;one\;inequality$$`

#### Random-effects model

`$$\large H_0 : \sigma^2_C = 0$$`

`$$\large H_a : \sigma^2_C > 0$$`

---
# random-intercept models in `r`

.pull-left[
.vsmall-code[

```r
library(lme4)
data("storkdata")
m1 <- lmer(Weight ~ Mercury + (1|Colony), data = storkdata)
summary(m1)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: Weight ~ Mercury + (1 | Colony)
##    Data: storkdata
## 
## REML criterion at convergence: 316.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1711 -0.4972 -0.0025  0.6121  2.4314 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Colony   (Intercept) 51.58    7.18    
##  Residual              4.71    2.17    
## Number of obs: 67, groups:  Colony, 5
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  61.9644     3.2669   18.97
## Mercury      -0.3261     0.0355   -9.19
## 
## Correlation of Fixed Effects:
##         (Intr)
## Mercury -0.149
```
]
]

.pull-right[
.vsmall-code[

```r
ranef(m1)
```

```
## $Colony
##    (Intercept)
## C1     -7.6440
## C2      9.1814
## C3     -6.0028
## C4      4.9890
## C5     -0.5236
## 
## with conditional variances for "Colony"
```
]

]

---
# random-slope models in `r`

It's also possible to treat the slopes as random effects

.pull-left[
.vsmall-code[

```r
m2 <- lmer(Weight ~ (Mercury|Colony), data = storkdata)
summary(m2)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: Weight ~ (Mercury | Colony)
##    Data: storkdata
## 
## REML criterion at convergence: 324.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3062 -0.5052 -0.0609  0.6958  2.4559 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  Colony   (Intercept) 172.82   13.146        
##           Mercury       0.10    0.316   -0.88
##  Residual               4.66    2.159        
## Number of obs: 67, groups:  Colony, 5
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    51.00       2.95    17.3
```
]
]

.pull-right[
.vsmall-code[

```r
ranef(m2)
```

```
## $Colony
##    (Intercept)  Mercury
## C1      -1.440 -0.03554
## C2      20.816 -0.39147
## C3       5.684 -0.37504
## C4      15.615 -0.30312
## C5       9.598 -0.26529
## 
## with conditional variances for "Colony"
```
]

]

---
# mixed-effect models in `r`

It's also possible to combine fixed and random effects in the same model (usually referred to as "mixed effect" models or "mixed" models)

.vsmall-code[

```r
data("mothdata")
m3 <- lmer(Larvae ~ Treatment + (1|Region), data = mothdata)
summary(m3)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: Larvae ~ Treatment + (1 | Region)
##    Data: mothdata
## 
## REML criterion at convergence: 62.3
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.109 -0.322 -0.112  0.210  1.390 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Region   (Intercept) 41.5     6.44    
##  Residual             19.1     4.37    
## Number of obs: 12, groups:  Region, 4
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)         20.50       3.89    5.27
## TreatmentBt         -8.75       3.09   -2.83
## TreatmentDimilin    -9.50       3.09   -3.07
## 
## Correlation of Fixed Effects:
##             (Intr) TrtmnB
## TreatmentBt -0.397       
## TretmntDmln -0.397  0.500
```
]

---
# why use random effects?

- Random effects only apply to categorical predictors, not continuous predictors

+ Generally need many treatment levels (5-10) to estimate variance parameter

- The choice of random vs fixed effects usually depends on the scope of inference

+ Are the factor levels exhaustive or are they samples of a larger population? If you repeated the experiment, could you end up with different groups?
  
--

- Random effects account for non-independence caused by group structure *and* provide insights into the larger population

- When sample sizes are sufficiently large and balanced, the regression parameters will be essentially the same in fixed vs random-effect models

+ When sample sizes are small/unbalanced, estimates for groups with less data will be "pulled" towards the mean (this is a good thing!)
    
    + The degree of change is a function of how far the parameters are from the mean and how much information is in the data

---
# looking ahead

### **Next time**: Nested designs

### **Reading**: [Fieberg chp. 18.4-18.6](https://statistics4ecologists-v1.netlify.app/mixed#what-are-random-effects-mixed-effect-models-and-when-should-they-be-used)