LECTURE 6: multiple comparison procedures

class: center, middle, inverse, title-slide

.title[
# LECTURE 6: multiple comparison procedures
]
.subtitle[
## FANR 6750 (Experimental design)
]
.author[
### <br/><br/><br/>Fall 2022
]

---

# motivation

<br/>

> Following a significant *F*-test, the next step is to determine which means differ

<br/>

--
> If all group means are to be compared, then we should correct for multiple testing

<br/>

> That is, conducting many tests increases the probability of finding one that is significant, even if it is not

---

---
class: inverse

# outline

<br/>
#### 1) Fisher's LSD

<br/>  
--

#### 2) Pairwise *t*-tests

<br/> 
--

#### 3) Tukey's HSD test

<br/> 
--

#### 4) Example

---
# ronald fisher

<br/>

> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of

---
# fisher's lsd test

> Warning: This test makes no correction for multiple testing

#### The least significant difference test is based on the two-sample *t*-statistic

`$$\large t = \frac{\bar{y}_1 - \bar{y}_2}{SEDM}$$`
--

#### Fisher said that 2 means ( `$\bar{y}_i$` and `$\bar{y}_j$` ) are different if:

`$$\large |\bar{y}_i - \bar{y}_j| \geq t_{1-\alpha/2,a(n-1)}\sqrt{\frac{2MSW}{n}}$$`

<br/>

where MSW (aka MSE) comes from the ANOVA table

---
class: center, inverse, middle

## PAIRWISE *t*-TEST

---
## PAIRWISE *t*-TEST

<br/>

### One can always fall back on pairwise, two-sample *t*-tests, but you should adjust the p values

<br/>

--
### Bonferroni adjustment

- Multiply each *p*-value by the number of tests

- If this results in *p* > 1, set *p* to 1

---
class: center, middle, inverse

# tukey's hsd test

---
# john tukey

<br/>

<br/>

> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data

---
# tukey's hsd test

<br/>

#### According to Tukey's Honestly Significant Difference test, two means ( `$\bar{y}_i$` and `$\bar{y}_j$`) are different if:

`$$\large |\bar{y}_i - \bar{y}_j | \geq  q_{1- \alpha,a,a(n-1)}\sqrt{\frac{MSW}{n}}$$`

<br/>

where `$q$` comes from the "Studentized Range Distribution"(see `qtukey` in `R`). MSW comes from the ANOVA table

---
class: center, middle, inverse

# example

---
# example

**Question:** Is there a difference in Canada Warbler abundance across elevations?

<br/>

.pull-left[

]

.pull-right[
<table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th>
</tr>
  <tr>
   <th style="text-align:center;"> Replicate </th>
   <th style="text-align:center;"> Low </th>
   <th style="text-align:center;"> Medium </th>
   <th style="text-align:center;"> High </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 0 </td>
   <td style="text-align:center;"> 7 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 0 </td>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 5 </td>
  </tr>
</tbody>
</table>
]

---
# example

**Question:** Is there a difference in Canada Warbler abundance across elevations?

<br/>

.pull-left[

<table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th>
</tr>
  <tr>
   <th style="text-align:center;"> Replicate </th>
   <th style="text-align:center;"> Low </th>
   <th style="text-align:center;"> Medium </th>
   <th style="text-align:center;"> High </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 0 </td>
   <td style="text-align:center;"> 7 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 0 </td>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 5 </td>
  </tr>
</tbody>
</table>

]

.pull-right[

<table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;"> Source </th>
   <th style="text-align:center;"> df </th>
   <th style="text-align:center;"> SS </th>
   <th style="text-align:center;"> MS </th>
   <th style="text-align:center;"> F </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Among groups </td>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 31.5 </td>
   <td style="text-align:center;"> 15.7 </td>
   <td style="text-align:center;"> 7.7 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Within groups </td>
   <td style="text-align:center;"> 9 </td>
   <td style="text-align:center;"> 18.5 </td>
   <td style="text-align:center;"> 2.1 </td>
   <td style="text-align:center;">  </td>
  </tr>
</tbody>
</table>

]

---
# example

**Question:** Is there a difference in Canada Warbler abundance across elevations?

#### Fisher's LSD test

- `$\large t_{1-\alpha/2,a(n-1)} = t_{0.975,9} = 2.26$`

- LSD = `$\large t_{1-\alpha/2;a(n-1)}\sqrt{2MSW/n} = 2.32$`

#### Tukey's HSD test

- `$\large q_{1-\alpha,a,a(n-1)} = q_{0.95,3,9} = 3.95$`

- HSD = `$\large q_{1-\alpha,a,a(n-1)}\sqrt{MSW/n} = 2.86$`

<br/>
--
Note that it's more difficult to observe a difference between group means with the more conservative Tukey test

---
# summary

--
- Only do multiple comparison tests after a significant *F*-test

--
- The motivation is that if you do 100 tests, 5 (on average) will be significant even if there is no difference

--
- From least conservative to most conservative, the order is: Fisher's LSD, Tukey's HSD, and then pairwise *t*-tests with Bonferroni adjustments

--
- There are many other types of multiple comparison tests other than the 3 we covered

--
- Tukey's HSD test is probably the method of choice these days. However,

+ It is so conservative that sometimes you won't see any pairwise differences even after a significant *F*-test