LECTURE 8: multiple comparisons

class: center, middle, inverse, title-slide

.title[
# LECTURE 8: multiple comparisons
]
.subtitle[
## FANR 6750 (Experimental design)
]
.author[
### <br/><br/><br/>Fall 2023
]

---

# motivation

<br/>

> Following a significant *F*-test, the next step is to determine which means differ

<br/>

--
> If all group means are to be compared, then we should correct for multiple testing

<br/>

> That is, conducting many tests increases the probability of finding one that is significant, even if it is not

---

---
class: inverse

# outline

#### 1) Pairwise *t*-tests

<br/> 
--

#### 2) Tukey's HSD test

<br/> 
--

#### 3) *a priori* contrasts

<br/>

---
# motivating example

#### Aphids are a major crop pest and farmers are interested which brands of pesticide have the biggest influence on crop yield

#### The data:

```r
data("pesticidedata")

ggplot(pesticidedata, aes(x = Brand, y = Yield)) + 
  geom_boxplot()
```

---
# motivating example

.small-code[

```r
fit.lm <- lm(Yield ~ Brand, data = pesticidedata)
summary(fit.lm)
```

```
## 
## Call:
## lm(formula = Yield ~ Brand, data = pesticidedata)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.932 -1.499  0.335  1.406  5.459 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   36.357      0.494   73.66  < 2e-16 ***
## BrandA         3.142      0.698    4.50  1.9e-05 ***
## BrandB         3.940      0.698    5.65  1.7e-07 ***
## BrandC         5.616      0.698    8.05  2.3e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.47 on 96 degrees of freedom
## Multiple R-squared:  0.416,	Adjusted R-squared:  0.397 
## F-statistic: 22.7 on 3 and 96 DF,  p-value: 3.29e-11
```
]

---
class: center, inverse, middle

## PAIRWISE *t*-TEST

---
## PAIRWISE *t*-TEST

<br/>

#### One can always fall back on pairwise, two-sample *t*-tests (or linear models)

```r
pairwise.t.test(pesticidedata$Yield, pesticidedata$Brand, p.adjust.method = "none")
```

```
## 
## 	Pairwise comparisons using t tests with pooled SD 
## 
## data:  pesticidedata$Yield and pesticidedata$Brand 
## 
##   Control A     B   
## A 2e-05   -     -   
## B 2e-07   0.26  -   
## C 2e-12   6e-04 0.02
## 
## P value adjustment method: none
```

---
## PAIRWISE *t*-TEST

One can always fall back on pairwise, two-sample *t*-tests (or linear models)...BUT

Running multiple tests increases the probability of finding one that is significant, even if it is not (remember the jellybean!)

One common way to deal with multiple comparisons is to do pairwise *t*-tests but *adjust* the *p*-values of each test

- Essentially, the adjustments make it harder to reject the null hypothesis, thereby making the tests more conservative

---
# bonferroni adjustment

#### One of the most common adjustments is the *Bonferroni* correction

- Multiply each *p*-value by the number of tests

- If this results in *p* > 1, set *p* to 1

.small-code[

```r
pairwise.t.test(pesticidedata$Yield, pesticidedata$Brand, p.adjust.method = "bonferroni")
```

```
## 
## 	Pairwise comparisons using t tests with pooled SD 
## 
## data:  pesticidedata$Yield and pesticidedata$Brand 
## 
##   Control A     B    
## A 1e-04   -     -    
## B 1e-06   1.000 -    
## C 1e-11   0.004 0.110
## 
## P value adjustment method: bonferroni
```
]

---
class: inverse

# reflection

#### At first, it can be confusing when a *p*-value is significant in one case (e.g., unadjusted) but not significant in another (e.g., Bonferroni)

- (especially if you're a grad student working on your thesis and you really *want* significant results)

#### If you find yourself in this situation, take a step back and think about a few things:

- What does the p-value measure? What is the risk when rejecting the null hypothesis?

- When is falsely rejecting the null hypothesis (Type I error) most likely?

- What is so special about `$\alpha = 0.05$`?

---
class: center, middle, inverse

# tukey's hsd test

---
# john tukey

<br/>

<br/>

> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data

---
# tukey's hsd test

<br/>

#### According to Tukey's Honestly Significant Difference test, two means ( `$\bar{y}_i$` and `$\bar{y}_j$`) are different if:

`$$\large |\bar{y}_i - \bar{y}_j | \geq  q_{1- \alpha,a,a(n-1)}\sqrt{\frac{MSW}{n}}$$`

<br/>

where `$q$` comes from the "Studentized Range Distribution"(see `qtukey` in `R`). MSW = mean squared error from the ANOVA table

---
# example

```r
fit.aov <- aov(Yield ~ Brand, data = pesticidedata)
TukeyHSD(fit.aov)
```

```
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Yield ~ Brand, data = pesticidedata)
## 
## $Brand
##             diff     lwr   upr  p adj
## A-Control 3.1424  1.3174 4.967 0.0001
## B-Control 3.9403  2.1153 5.765 0.0000
## C-Control 5.6159  3.7909 7.441 0.0000
## B-A       0.7979 -1.0271 2.623 0.6639
## C-A       2.4735  0.6485 4.299 0.0034
## C-B       1.6756 -0.1495 3.501 0.0838
```

---
# *a posteriori* comparisons

- Bonferonni and Tukey's HSD are the most commonly used types of *a posteriori* comparisons - comparing all possible pairwise combinations only *after* a significant F-test

- There are many other types of *a posteriori* comparison, e.g. Fisher's LSD (see `?p.adjust` for even more)

- *A posteriori* comparisons decrease the risk of Type I error (but increase the risk of Type II error)

- By forcing the researcher to correct for all possible combinations, *a posteriori* comparisons often overly conservative

- A better approach in many situations is to use *a priori* **contrasts**

---
class:center, inverse, middle

# *a priori* contrasts

---
# mussel size

<table class="table table-striped table-hover table-condensed" style="margin-left: auto; margin-right: auto;">
 <thead>
<tr><th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Watershed</div></th></tr>
  <tr>
   <th style="text-align:center;"> Twelvemile </th>
   <th style="text-align:center;"> Chattooga </th>
   <th style="text-align:center;"> Keowee </th>
   <th style="text-align:center;"> Coneross </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 16 </td>
   <td style="text-align:center;"> 18 </td>
   <td style="text-align:center;"> 28 </td>
   <td style="text-align:center;"> 14 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 8 </td>
   <td style="text-align:center;"> 25 </td>
   <td style="text-align:center;"> 22 </td>
   <td style="text-align:center;"> 20 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 12 </td>
   <td style="text-align:center;"> 22 </td>
   <td style="text-align:center;"> 24 </td>
   <td style="text-align:center;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 17 </td>
   <td style="text-align:center;"> 16 </td>
   <td style="text-align:center;"> 20 </td>
   <td style="text-align:center;"> 17 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 13 </td>
   <td style="text-align:center;"> 26 </td>
   <td style="text-align:center;"> 27 </td>
   <td style="text-align:center;"> 15 </td>
  </tr>
</tbody>
</table>

--
- Chattooga and Keowee are more forested

- Coneross and Twelvemile are more agricultural

---
# anova results

.small-code[

```r
fit.mussel <- lm(Length ~ Watershed, data = musseldata)
summary(fit.mussel)
```

```
## 
## Call:
## lm(formula = Length ~ Watershed, data = musseldata)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -5.4   -2.5   -0.2    3.0    4.6 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            21.40       1.64   13.02  6.2e-10 ***
## WatershedConeross      -6.00       2.32   -2.58   0.0201 *  
## WatershedKeowee         2.80       2.32    1.20   0.2458    
## WatershedTwelvemile    -8.20       2.32   -3.53   0.0028 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.67 on 16 degrees of freedom
## Multiple R-squared:  0.645,	Adjusted R-squared:  0.579 
## F-statistic:  9.7 on 3 and 16 DF,  p-value: 0.000692
```
]

---
# pairwise comparisons

.small-code[

```r
pairwise.t.test(musseldata$Length, musseldata$Watershed, p.adjust.method = "bonferroni")
```

```
## 
## 	Pairwise comparisons using t tests with pooled SD 
## 
## data:  musseldata$Length and musseldata$Watershed 
## 
##            Chattooga Coneross Keowee
## Coneross   0.120     -        -     
## Keowee     1.000     0.010    -     
## Twelvemile 0.017     1.000    0.001 
## 
## P value adjustment method: bonferroni
```
]

---
# hypotheses

#### Instead of comparing all possible combinations, what if we are only interested in some specific comparisons?

1) Do the agricultural watersheds differ from one another?

2) Do the forested watersheds differ from one another?

#### Also, what if we are interested in *combinations* of treatments?

3) Do mussels from forested watersheds differ from agricultural watersheds?

#### What are the advantages of focusing only on these specific questions?

---
# hypotheses as contrasts

#### Hypotheses

<table class="table table-condensed" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">  </th>
   <th style="text-align:left;"> Comparisons </th>
   <th style="text-align:center;"> H0 to test </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> Forested vs agricultural </td>
   <td style="text-align:center;"> $\frac{\mu_T + \mu_{Co}}{2} - \frac{\mu_{Ch} + \mu_K}{2}=0$ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> Twelvemile vs Coneross </td>
   <td style="text-align:center;"> $\mu_{T} - \mu_{Ch} = 0$ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:left;"> Chattooga vs Keowee </td>
   <td style="text-align:center;"> $\mu_{Ch} - \mu_{K} = 0$ </td>
  </tr>
</tbody>
</table>

#### Linear combinations

---
# are these contrasts orthogonal?

A set of linear combinations is called a set of orthogonal contrasts if the following conditions hold for all pairs of linear combinations:

<br/>

--
Given

`$$\Large L_1 = a_1\mu_1 + a_2\mu_2 + ... + a_a\mu_a$$`
--

and

`$$\Large L_2 = b_1\mu_1 + b_2\mu_2 +...+ b_a\mu_a$$`
--

then `$L_1$` and `$L_2$` are orthogonal if:

`$$\large \sum_i a_i=0;\;\; \sum_i b_i=0; \;\;and \; \sum_i a_ib_i=0$$`

---
# back to the saw data

Returning to the question: "Does mussel size differ among forested and agricultural watersheds?":

`$$\large H_{0_1} = \frac{\mu_T + \mu_{Co}}{2} - \frac{\mu_{Ch} + \mu_{K}}{2} = 0$$`

--
Multiplying through by 2 gives:

`$$\large H_{0_1} = (\mu_{T} + \mu_{Co}) - (\mu_{Ch} + \mu_K) = 0$$`

--
Which can be written as:
`$$\large L1 = (1)\mu_T + (-1)\mu_{Ch} + (-1)\mu_K + (1)\mu_{Co}$$`
where the coefficients are `$a_1 = 1$`, `$a_2 = -1$`, `$a_3 = -1$`, and `$a_4 = 1$`

> Note that it's easier to work with coefficients that are integers rather than fractions

---
# are the contrasts orthogonal?

#### Does each set of coefficients sum to 0?

- `$\large L_1$`: `$\large \sum_i a = 1 - 1 - 1 + 1 = 0$`

- `$\large L_2$`: `$\large \sum_i a  = 1 + 0 + 0 - 1 = 0$`

- `$\large L_3$`: `$\large \sum_i a  = 0 + 1 - 1 + 0 = 0$`

#### Do the products of pairs of coefficients sum to 0?

- For `$\large L_1$` and `$\large L_2$`: `$\large (1)(1) + (-1)(0) + (-1)(0) + (1)(-1) = 0$`

- For `$\large L_1$` and `$\large L_3$`: `$\large (1)(0) + (-1)(1) + (-1)(-1) + (1)(0) = 0$`

- For `$\large L_2$` and `$\large L_3$`: `$\large (1)(0) + (0)(1) + (0)(-1) + (-1)(0) = 0$`

---
# testing the null hypotheses

To obtain the sums of squares for each contrast, we use the general formula:

`$$\large SS_L =\frac{(\sum_i a_i T_i)^2}{n \sum_i a_i^2}$$`

where `$\large  T_i$` is the sum of observations in group `$\large  i$`, and `$\large  a_i$` is the corresponding coefficient for group `$\large  i$`

For thefirst hypothesis we have:

$$ \large SS_{L_1} = \frac{(66-107-121+77)^2}{5(1 + 1 + 1 + 1)} = 361.2$$

---
# sums of squares for each contrast

For the second hypothesis we have: `$\large L_2 = \mu_T + 0 + 0 - \mu_{Co}$` with coefficients `$\large a_i = (1, 0, 0,-1)$`

`$$\large SS_{L_2} = \frac{(66 +0 +0 - 77)^2}{5(1 + 0 + 0 + 1)} = 12.1$$`

For the third hypothesis we have: `$\large L_3 = 0 + \mu_B - \mu_C + 0$` with coefficients `$\large a_i = (0, 1,-1, 0)$`

`$$\large SS_{L_3} = \frac{(0 + 107 - 121 + 0)^2}{5(0 + 1 + 1 + 0)} = 19.6$$`
---
# expanded anova table

For each contrast, each SS is divided by 1 d.f., and then divided by MSW

<table class="table table-condensed" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> Source </th>
   <th style="text-align:center;"> df </th>
   <th style="text-align:center;"> SS </th>
   <th style="text-align:center;"> MS </th>
   <th style="text-align:center;"> F </th>
   <th style="text-align:center;"> p </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> Among watersheds </td>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 393 </td>
   <td style="text-align:center;"> 131.0 </td>
   <td style="text-align:center;"> 9.70 </td>
   <td style="text-align:center;"> 0.007 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> F vs Ag </td>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 361 </td>
   <td style="text-align:center;"> 361.0 </td>
   <td style="text-align:center;"> 26.76 </td>
   <td style="text-align:center;"> 9e-5 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> T vs Co </td>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 12 </td>
   <td style="text-align:center;"> 12.0 </td>
   <td style="text-align:center;"> 0.90 </td>
   <td style="text-align:center;"> 0.36 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> Ch vs K </td>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 20 </td>
   <td style="text-align:center;"> 20.0 </td>
   <td style="text-align:center;"> 1.45 </td>
   <td style="text-align:center;"> 0.25 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> Within </td>
   <td style="text-align:center;"> 16 </td>
   <td style="text-align:center;"> 216 </td>
   <td style="text-align:center;"> 13.5 </td>
   <td style="text-align:center;">  </td>
   <td style="text-align:center;">  </td>
  </tr>
</tbody>
</table>

Notice that the Sums of Squares are partitioned according to hypotheses we are interested in, unlike in multiple comparison procedures.

---
# *a priori* contrasts

- Orthogonal contrasts are more powerful than multiple comparison procedures

- They also require more thought and preparation (good things!)

- There can be only `$a - 1$` comparisons

- If the contrasts are not orthogonal, they can't be used to fully partition the Sums of Squares among groups

- If the comparisons really represent more than 1 treatment variable, it will be better to use a factorial design. More on that later

---
# looking ahead

<br/>

### **Next time**: Multiple regression

<br/>

### **Reading**: [Fieberg chp. 3.2-3.5](https://statistics4ecologists-v1.netlify.app/matrixreg#introduction-to-multiple-regression)