class: center, middle, inverse, title-slide .title[ # LECTURE 6: multiple comparison procedures ] .subtitle[ ## FANR 6750 (Experimental design) ] .author[ ###
Fall 2022 ] --- # motivation <br/> > Following a significant *F*-test, the next step is to determine which means differ <br/> -- > If all group means are to be compared, then we should correct for multiple testing <br/> -- > That is, conducting many tests increases the probability of finding one that is significant, even if it is not --- <img src="https://imgs.xkcd.com/comics/significant.png" height="600px" style="display: block; margin: auto;" /> --- class: inverse # outline <br/> #### 1) Fisher's LSD <br/> -- #### 2) Pairwise *t*-tests <br/> -- #### 3) Tukey's HSD test <br/> -- #### 4) Example --- # ronald fisher <img src="figs/Fisher1946.jpeg" width="400px" height="304px" style="display: block; margin: auto;" /> <br/> > To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of --- # fisher's lsd test > Warning: This test makes no correction for multiple testing -- #### The least significant difference test is based on the two-sample *t*-statistic `$$\large t = \frac{\bar{y}_1 - \bar{y}_2}{SEDM}$$` -- #### Fisher said that 2 means ( `\(\bar{y}_i\)` and `\(\bar{y}_j\)` ) are different if: `$$\large |\bar{y}_i - \bar{y}_j| \geq t_{1-\alpha/2,a(n-1)}\sqrt{\frac{2MSW}{n}}$$` <br/> where MSW (aka MSE) comes from the ANOVA table --- class: center, inverse, middle ## PAIRWISE *t*-TEST --- ## PAIRWISE *t*-TEST <br/> ### One can always fall back on pairwise, two-sample *t*-tests, but you should adjust the p values <br/> -- ### Bonferroni adjustment - Multiply each *p*-value by the number of tests - If this results in *p* > 1, set *p* to 1 --- class: center, middle, inverse # tukey's hsd test --- # john tukey <br/> <img src="figs/John_Tukey.jpeg" width="246px" height="300px" style="display: block; margin: auto;" /> <br/> > The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data --- # tukey's hsd test <br/> #### According to Tukey's Honestly Significant Difference test, two means ( `\(\bar{y}_i\)` and `\(\bar{y}_j\)`) are different if: `$$\large |\bar{y}_i - \bar{y}_j | \geq q_{1- \alpha,a,a(n-1)}\sqrt{\frac{MSW}{n}}$$` <br/> where `\(q\)` comes from the "Studentized Range Distribution"(see `qtukey` in `R`). MSW comes from the ANOVA table --- class: center, middle, inverse # example --- # example **Question:** Is there a difference in Canada Warbler abundance across elevations? <br/> .pull-left[ <img src="https://upload.wikimedia.org/wikipedia/commons/b/b1/8G7D5475-Canada.jpg" width="80%" /> ] .pull-right[ <table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> Low </th> <th style="text-align:center;"> Medium </th> <th style="text-align:center;"> High </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 5 </td> </tr> </tbody> </table> ] --- # example **Question:** Is there a difference in Canada Warbler abundance across elevations? <br/> .pull-left[ <table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> Low </th> <th style="text-align:center;"> Medium </th> <th style="text-align:center;"> High </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 5 </td> </tr> </tbody> </table> ] .pull-right[ <table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Source </th> <th style="text-align:center;"> df </th> <th style="text-align:center;"> SS </th> <th style="text-align:center;"> MS </th> <th style="text-align:center;"> F </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Among groups </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 31.5 </td> <td style="text-align:center;"> 15.7 </td> <td style="text-align:center;"> 7.7 </td> </tr> <tr> <td style="text-align:center;"> Within groups </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 18.5 </td> <td style="text-align:center;"> 2.1 </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> ] --- # example **Question:** Is there a difference in Canada Warbler abundance across elevations? -- #### Fisher's LSD test - `\(\large t_{1-\alpha/2,a(n-1)} = t_{0.975,9} = 2.26\)` -- - LSD = `\(\large t_{1-\alpha/2;a(n-1)}\sqrt{2MSW/n} = 2.32\)` -- #### Tukey's HSD test - `\(\large q_{1-\alpha,a,a(n-1)} = q_{0.95,3,9} = 3.95\)` -- - HSD = `\(\large q_{1-\alpha,a,a(n-1)}\sqrt{MSW/n} = 2.86\)` <br/> -- Note that it's more difficult to observe a difference between group means with the more conservative Tukey test --- # summary -- - Only do multiple comparison tests after a significant *F*-test -- - The motivation is that if you do 100 tests, 5 (on average) will be significant even if there is no difference -- - From least conservative to most conservative, the order is: Fisher's LSD, Tukey's HSD, and then pairwise *t*-tests with Bonferroni adjustments -- - There are many other types of multiple comparison tests other than the 3 we covered -- - Tukey's HSD test is probably the method of choice these days. However, + It is so conservative that sometimes you won't see any pairwise differences even after a significant *F*-test