class: center, middle, inverse, title-slide .title[ # LECTURE 5: One-way ANOVA ] .subtitle[ ## FANR 6750 (Experimental design) ] .author[ ###
Fall 2022 ] --- class: inverse # outline <br/> #### 1) Overview <br/> -- #### 2) ANOVA as a linear model <br/> -- #### 3) ANOVA table <br/> -- #### 4) Example --- # general idea <br/> <br/> <br/> ### Extension of the *t*-test for comparing > 2 populations --- # motivating example Foresters are studying the effect of 4 different fertilizers (treatments) on the growth of loblolly pine, which are grown on 3 plots (replicates) receiving each treatment. Data are average height per plot after 5 years: .pull-left[ <br/> <table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Treatment</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> A </th> <th style="text-align:center;"> B </th> <th style="text-align:center;"> C </th> <th style="text-align:center;"> D </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 11 </td> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 3 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 4 </td> </tr> </tbody> </table> ] -- .pull-right[ #### Notation - The number of groups (treatments) is `\(\large a=4\)` - The number of observations within each group (replicates) is `\(\large n=3\)` - `\(\large y_{ij}\)` denotes the `\(\large j\)`th observation from the `\(\large i\)`th group ] --- # a brief tangent #### What counts as an observation? -- #### Experimental unit > the physical unit that receives a particular treatment -- #### Observational unit > the physical unit on which measurements are taken -- These are not always the same! -- Examples - Agricultural fields given different fertilizer, crop yield measured - Rats given different diets, disease state measured - Microcosm given different predator abundance, tadpole growth measured --- # motivating example **Question:** Is there a difference in growth among the four treatment groups? -- <img src="05_anova_files/figure-html/pine1-1.png" width="576" style="display: block; margin: auto;" /> --- # motivating example #### Hypotheses - `\(\large H_0 : \mu_A = \mu_B = \mu_C = \mu_D\)` - `\(\large H_a :\)` At least one inequality -- #### How should we test the null? -- We could do this using 6 *t*-tests <br/> -- But this would alter the overall (experiment-wise) `\(\large \alpha\)` level because each individual test has a chance (usually `\(\large \alpha = 0.05\)`) of incorrectly rejecting a true null hypothesis, and this is multiplied when multiple tests are used <br/> -- An alternative procedure involves comparing the variation among the groups with the variation within the groups. If `\(H_0\)` is false, then the variance among is greater than the variance within groups. --- # toward the additive model #### To understand why the test is based on variance, it is helpful to consider several types of means: -- - Grand mean `$$\large \bar{y}. = \frac{\sum_i\sum_j y_{ij}}{a \times n}$$` --- # motivating example **Question:** Is there a difference in growth among the four treatment groups? <img src="05_anova_files/figure-html/pine_grm-1.png" width="576" style="display: block; margin: auto;" /> --- # toward the additive model #### To understand why the test is based on variance, it is helpful to consider several types of means: - Grand mean `$$\large \bar{y}. = \frac{\sum_i\sum_j y_{ij}}{a \times n}$$` - Group means `$$\large \bar{y}_i = \frac{\sum_j y_{ij}}{n}$$` --- # motivating example **Question:** Is there a difference in growth among the four treatment groups? <img src="05_anova_files/figure-html/pine_gm-1.png" width="576" style="display: block; margin: auto;" /> --- # toward the additive model #### To understand why the test is based on variance, it is helpful to consider several types of means: - Grand mean `$$\large \bar{y}. = \frac{\sum_i\sum_j y_{ij}}{a \times n}$$` - Group means `$$\large \bar{y}_i = \frac{\sum_j y_{ij}}{n}$$` We can now decompose the observations as `$$\large y_{ij} = \color{#446E9B}{\bar{y}.} + \color{#D47500}{(\bar{y}_i - \bar{y}.)} + \color{#3CB521}{(y_{ij} - \bar{y}_i)}$$` --- # the additive model #### The decomposition `$$\Large y_{ij} = \color{#446E9B}{\bar{y}.} + \color{#D47500}{(\bar{y}_i - \bar{y}.)} + \color{#3CB521}{(y_{ij} - \bar{y}_i)}$$` -- #### The additive model `$$\Large y_{ij} = \color{#446E9B}{\mu} + \color{#D47500}{\alpha_i} + \color{#3CB521}{\epsilon_{ij}}$$` -- #### where `$$\Large \epsilon_{ij} \sim normal(0, \sigma^2)$$` --- # the additive model `$$\large y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$` `$$\large \epsilon_{ij} \sim normal(0, \sigma^2)$$` #### Notes - `\(\large \mu\)` is the grand mean of the population, estimated by `\(\large \bar{y}.\)` -- - `\(\large \alpha_i\)` is the effect of treatment *i*, estimated by `\(\large\bar{y}_i - \bar{y}.\)` -- + It is the deviation of the group mean from the grand mean + If all `\(\large\alpha_i = 0\)`, there is no treatment effect + Thus, we can write either - `\(H_0 : \mu_1 = \mu_2=... =\mu_a\)`, or - `\(H_0 : \alpha_1 = \alpha_2=... =\alpha_a = 0\)` -- - `\(\large \epsilon_{ij}\)` is the residual error, estimated by `\(\large y_{ij} - \bar{y}_i\)` + It is the unexplained (random) deviation of the observation from the group mean --- # sums of squares #### Variation among groups `$$\Large SS_A = n \sum_i (\bar{y}_i - \bar{y}.)^2$$` --- # motivating example **Question:** Is there a difference in growth among the four treatment groups? <img src="05_anova_files/figure-html/pine_ssa-1.png" width="576" style="display: block; margin: auto;" /> --- # sums of squares #### Variation among groups `$$\Large SS_A = n \sum_i (\bar{y}_i - \bar{y}.)^2$$` #### Variation within groups `$$\Large SS_W = \sum_i \sum_j (y_{ij} - \bar{y}_i)^2$$` --- # motivating example **Question:** Is there a difference in growth among the four treatment groups? <img src="05_anova_files/figure-html/pine_ssw-1.png" width="576" style="display: block; margin: auto;" /> --- # sums of squares #### Variation among groups `$$\Large SS_A = n \sum_i (\bar{y}_i - \bar{y}.)^2$$` #### Variation within groups `$$\Large SS_W = \sum_i \sum_j (y_{ij} - \bar{y}_i)^2$$` #### Total variation `$$\Large SS_T = SS_A + SS_W = \sum_i \sum_j (y_{ij} - \bar{y}.)^2$$` --- # motivating example **Question:** Is there a difference in growth among the four treatment groups? <img src="05_anova_files/figure-html/pine_sst-1.png" width="576" style="display: block; margin: auto;" /> --- # mean squares ### To covert the sums of squares to variances, divide by the degrees of freedom -- #### Mean squares among `$$\Large MS_A = \frac{SS_A}{a-1}$$` -- #### Mean squares within `$$\Large MS_W = \frac{SS_W}{a(n-1)}$$` --- # F-statistic <br/> <br/> `$$\LARGE F = \frac{MS_A}{MS_W}$$` -- ### To test the null hypothesis - Compare the F statistic to the critical value: `\(\large F_{a-1,a(n-1)}\)` - This is always a one-tailed test. Why? --- class: inverse, center, middle # anova table --- # anova table <br/> <table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Source </th> <th style="text-align:center;"> df </th> <th style="text-align:center;"> SS </th> <th style="text-align:center;"> MS </th> <th style="text-align:center;"> F </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Among groups </td> <td style="text-align:center;"> \(a-1\) </td> <td style="text-align:center;"> \(n \sum_i (\bar{y}_i - \bar{y}.)^2\) </td> <td style="text-align:center;"> \(\frac{SS_A}{a-1}\) </td> <td style="text-align:center;"> \(\frac{MS_A}{MS_W}\) </td> </tr> <tr> <td style="text-align:center;"> Within groups </td> <td style="text-align:center;"> \(a(n-1)\) </td> <td style="text-align:center;"> \(\sum_i \sum_j (y_{ij} - \bar{y}_i)^2\) </td> <td style="text-align:center;"> \(\frac{SS_W}{a(n-1)}\) </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> Total </td> <td style="text-align:center;"> \(an-1\) </td> <td style="text-align:center;"> \(\sum_i \sum_j (y_{ij} - \bar{y}.)^2\) </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> --- # worked example #### Suppose we are interested in the effect of elevation on the abundance of Canada Warblers .pull-left[ <img src="https://upload.wikimedia.org/wikipedia/commons/b/b1/8G7D5475-Canada.jpg" width="80%" /> ] -- .pull-right[ <table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> Low </th> <th style="text-align:center;"> Medium </th> <th style="text-align:center;"> High </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 5 </td> </tr> </tbody> </table> ] ??? Image courtesy of William H. Majoros via Wikicommons -- #### Hypotheses - `\(H_0 : \mu_L = \mu_M = \mu_H\)` or `\(H_0 : \alpha_L = \alpha_M = \alpha_H = 0\)` -- - `\(H_a\)` : At least one inequality --- # worked example <img src="05_anova_files/figure-html/cawa2-1.png" width="576" style="display: block; margin: auto;" /> --- # worked example <img src="05_anova_files/figure-html/cawa_grm-1.png" width="576" style="display: block; margin: auto;" /> --- # worked example <img src="05_anova_files/figure-html/cawa_gm-1.png" width="576" style="display: block; margin: auto;" /> --- # procedure .pull-left[ <table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> Low </th> <th style="text-align:center;"> Medium </th> <th style="text-align:center;"> High </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> Group means </td> <td style="text-align:center;"> 1.50 </td> <td style="text-align:center;"> 2.25 </td> <td style="text-align:center;"> 5.25 </td> </tr> <tr> <td style="text-align:center;"> Grand mean </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 3.00 </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> ] -- .pull-right[ #### Calculate sums of squares **among** groups ( `\(SS_A\)` ) `\(= n \sum_i (\bar{y}_i - \bar{y}.)^2\)` `\(\small = 4 \times ((1.5 - 3)^2 + (2.25 - 3)^2 + (5.25-3)^2)\)` `\(= 31.5\)` ] --- # procedure .pull-left[ <table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> Low </th> <th style="text-align:center;"> Medium </th> <th style="text-align:center;"> High </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> Group means </td> <td style="text-align:center;"> 1.50 </td> <td style="text-align:center;"> 2.25 </td> <td style="text-align:center;"> 5.25 </td> </tr> <tr> <td style="text-align:center;"> Grand mean </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 3.00 </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> ] .pull-right[ #### Calculate sums of squares **within** groups ( `\(SS_W\)` ) `\(= \sum_i \sum_j (y_{ij} - \bar{y}_i)^2\)` ] --- # procedure .pull-left[ <table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> Low </th> <th style="text-align:center;"> Medium </th> <th style="text-align:center;"> High </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> Group means </td> <td style="text-align:center;"> 1.50 </td> <td style="text-align:center;"> 2.25 </td> <td style="text-align:center;"> 5.25 </td> </tr> <tr> <td style="text-align:center;"> Grand mean </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 3.00 </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> ] .pull-right[ #### Calculate sums of squares **within** groups ( `\(SS_W\)` ) `\(= \sum_i \sum_j (y_{ij} - \bar{y}_i)^2\)` `\(\scriptsize = (1 - 1.50)^2 + (3 - 1.50)^2 + (0 - 1.50)^2 + (2 - 1.50)^2 +\)` `\(\scriptsize \;\;\; (2 - 2.25)^2 + (0 - 2.25)^2 + (4 - 2.25)^2 + (3 - 2.25)^2 +\)` `\(\scriptsize \;\;\; (4 - 5.25)^2 + (7 - 5.25)^2 + (5 - 5.25)^2 + (5 - 5.25)^2 +\)` ] --- # procedure .pull-left[ <table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Elevation</div></th> </tr> <tr> <th style="text-align:center;"> Replicate </th> <th style="text-align:center;"> Low </th> <th style="text-align:center;"> Medium </th> <th style="text-align:center;"> High </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 5 </td> </tr> <tr> <td style="text-align:center;"> Group means </td> <td style="text-align:center;"> 1.50 </td> <td style="text-align:center;"> 2.25 </td> <td style="text-align:center;"> 5.25 </td> </tr> <tr> <td style="text-align:center;"> Grand mean </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 3.00 </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> ] .pull-right[ #### Calculate sums of squares **within** groups ( `\(SS_W\)` ) `\(= \sum_i \sum_j (y_{ij} - \bar{y}_i)^2\)` `\(\scriptsize = (1 - 1.50)^2 + (3 - 1.50)^2 + (0 - 1.50)^2 + (2 - 1.50)^2 +\)` `\(\scriptsize \;\;\; (2 - 2.25)^2 + (0 - 2.25)^2 + (4 - 2.25)^2 + (3 - 2.25)^2 +\)` `\(\scriptsize \;\;\; (4 - 5.25)^2 + (7 - 5.25)^2 + (5 - 5.25)^2 + (5 - 5.25)^2 +\)` `\(= 18.5\)` ] --- # procedure <br/> <table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Source </th> <th style="text-align:center;"> df </th> <th style="text-align:center;"> SS </th> <th style="text-align:center;"> MS </th> <th style="text-align:center;"> F </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Among groups </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 31.5 </td> <td style="text-align:center;"> 15.7 </td> <td style="text-align:center;"> 7.7 </td> </tr> <tr> <td style="text-align:center;"> Within groups </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 18.5 </td> <td style="text-align:center;"> 2.1 </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> -- #### Critical value: `\(\large F_{\alpha=0.05,2,9} = 4.26\)` -- ### Decision? --- # anova as a linear model As discussed previously, ANOVA is a linear model `$$\large y_{j} = \beta_0 + \beta_1 x^1_j + \beta_2x^2_j + \epsilon_{j}$$` -- So we could also analyze these data using the `lm()` function: ```r cawa_long <- tidyr::pivot_longer(cawa, cols = c("Low", "Medium", "High"), names_to = "Elevation", values_to = "Count") fit.lm <- lm(Count ~ Elevation, data = cawa_long) summary(fit.lm) ``` ``` ## term estimate std.error statistic p.value ## 1 (Intercept) 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow -3.75 1.0138 -3.699 4.928e-03 ## 3 ElevationMedium -3.00 1.0138 -2.959 1.598e-02 ``` --- # anova as a linear model ``` ## term estimate std.error statistic p.value ## 1 (Intercept) 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow -3.75 1.0138 -3.699 4.928e-03 ## 3 ElevationMedium -3.00 1.0138 -2.959 1.598e-02 ``` Before we can interpret these output (and how it relates to the ANOVA table), we need to understand how `R` fits this model --- # anova as a linear model ``` ## term estimate std.error statistic p.value ## 1 (Intercept) 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow -3.75 1.0138 -3.699 4.928e-03 ## 3 ElevationMedium -3.00 1.0138 -2.959 1.598e-02 ``` #### The model matrix ```r head(model.matrix(fit.lm), 2) ``` ``` ## (Intercept) ElevationLow ElevationMedium ## 1 1 1 0 ## 2 1 0 1 ``` - One row for each observation - Intercept = reference level (alphabetical order by default) - Low and Medium treated as *dummy variables* (0/1) --- # anova as a linear model ``` ## term estimate std.error statistic p.value ## 1 (Intercept) 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow -3.75 1.0138 -3.699 4.928e-03 ## 3 ElevationMedium -3.00 1.0138 -2.959 1.598e-02 ``` #### The model matrix ```r head(model.matrix(fit.lm), 2) ``` ``` ## (Intercept) ElevationLow ElevationMedium ## 1 1 1 0 ## 2 1 0 1 ``` - Multiplied by the vector of model coefficients `\(\beta_0\)`, `\(\beta_1\)`, `\(\beta_2\)` to get `\(E[y_i]\)` - `R` names the coefficients `Intercept`, `ElevationLow`, `ElevationMedium` - e.g., row 1 = `\(E[y_1] = Intercept \times 1 + ElevationLow \times 1 + ElevationMedium \times 0\)` --- # anova as a linear model ``` ## term estimate std.error statistic p.value ## 1 (Intercept) 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow -3.75 1.0138 -3.699 4.928e-03 ## 3 ElevationMedium -3.00 1.0138 -2.959 1.598e-02 ``` #### How do we interpret the coefficients? -- - `Intercept` is the expected count at a high elevation site -- - `ElevationLow` is the *difference* between high and low elevation -- - `ElevationMedium` is the *difference* between high and medium elevation --- # anova as a linear model ``` ## term estimate std.error statistic p.value ## 1 (Intercept) 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow -3.75 1.0138 -3.699 4.928e-03 ## 3 ElevationMedium -3.00 1.0138 -2.959 1.598e-02 ``` #### Residuals - `lm()` also returns residuals (e.g., `\(y_i - E[y_i]\)`) ```r fit.lm$residual ``` ``` ## 1 2 3 4 5 6 7 8 9 10 11 12 ## -0.50 -0.25 -1.25 1.50 -2.25 1.75 -1.50 1.75 -0.25 0.50 0.75 -0.25 ``` -- ```r sum(fit.lm$residuals^2) ``` ``` ## [1] 18.5 ``` -- - Does this look familiar? --- # anova as a linear model ``` ## term estimate std.error statistic p.value ## 1 (Intercept) 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow -3.75 1.0138 -3.699 4.928e-03 ## 3 ElevationMedium -3.00 1.0138 -2.959 1.598e-02 ``` #### Residuals What about among group variation? ```r fit.lm$fitted.values ``` ``` ## 1 2 3 4 5 6 7 8 9 10 11 12 ## 1.50 2.25 5.25 1.50 2.25 5.25 1.50 2.25 5.25 1.50 2.25 5.25 ``` ```r sum((fit.lm$fitted.values - mean(fit.lm$fitted.values))^2) ``` ``` ## [1] 31.5 ``` -- - So the model is the same, the only difference is *how* we present the results --- # anova as a linear model One more way to fit the model: ```r fit.lm2 <- lm(Count ~ Elevation - 1, data = cawa_long) summary(fit.lm2) ``` ``` ## term estimate std.error statistic p.value ## 1 ElevationHigh 5.25 0.7169 7.324 4.451e-05 ## 2 ElevationLow 1.50 0.7169 2.092 6.592e-02 ## 3 ElevationMedium 2.25 0.7169 3.139 1.195e-02 ``` -- ```r head(model.matrix(fit.lm2), 5) ``` ``` ## ElevationHigh ElevationLow ElevationMedium ## 1 0 1 0 ## 2 0 0 1 ## 3 1 0 0 ## 4 0 1 0 ## 5 0 0 1 ``` --- # causal inference #### Can we make causal inference about the effect of elevation on Canada Warbler abundance? <br/> -- ### **Answer**: Definitely not! <br/> -- ### **What was missing?** --- # looking ahead <br/> #### **Next time:** Multiple comparisons <br/> #### **Reading:** Quinn chp. 3.4