class: center, middle, inverse, title-slide .title[ # LECTURE 9: assumptions and transformations ] .subtitle[ ## FANR 6750 (Experimental design) ] .author[ ###
Fall 2022 ] --- # outline <br/> 1) Motivation <br/> -- 2) Assumptions <br/> -- 3) Transformations --- # assumptions The key assumptions of ANOVA are that the residuals are **independent** and come from a **normal distribution** with mean 0 and **variance** `\(\sigma^2\)` -- #### This implies the following: 1) Within each treatment group, the residuals are normally distributed -- 2) Treatment group variances are equal -- 3) There is no spatial, temporal, or other forms of correlation among the residuals - they are independent --- # normality assumption <br/> ### Assessment -- - Inspect histograms and boxplots of the data for each treatment group -- - Inspect histograms and boxplots of the residuals -- - Conduct a Shapiro-Wilk test of normality on the residuals + Goodness-of-fit test of normality + Assumption of normality is rejected if p < 0.05 --- # normality assumption <img src="09_assumptions_files/figure-html/unnamed-chunk-1-1.png" width="648" style="display: block; margin: auto;" /> --- # normality assumption #### ANOVA is usually considered robust to violations of this assumption as long as they aren't severe and the sample size isn't too small. #### However, some problems can arise <br/> -- **Consequences of non-normality** - Implausible confidence intervals - Altered power if the data are heavily skewed (platykurtic or leptokurtic) --- # normality assumption <br/> #### Cases where normality assumption might not hold - Count data, especially when there are lots of zeros - Proportion or percentage data - Arbitrary scales, such as a 10-point taste test - Weights of very small things --- # equal variance assumption <br/> ### Assessment -- - Inspect histograms and boxplots of the data from each treatment group -- - Plot group variances -- - Conduct a Bartlett test of equality of variances + Assumption of homogeneity is rejected if p < 0.05 --- # equal variance assumption <img src="09_assumptions_files/figure-html/unnamed-chunk-2-1.png" width="648" style="display: block; margin: auto;" /> --- # equal variance assumption <img src="09_assumptions_files/figure-html/unnamed-chunk-3-1.png" width="648" style="display: block; margin: auto;" /> --- # equal variance assumption #### ANOVA is again considered robust to violations of this assumption, especially if sample sizes are similar among groups. -- #### Consequences of heteroscedasticity -- - Estimates of variance ( `\(\sigma^2\)` ) will be wrong - Confidence intervals and F-tests will be affected -- #### Cases where heteroscedasticity might be expected - Count data, especially when there are lots of zeros - Proportion or percentage data - Arbitrary scales, such as a 10-point taste test --- # independence assumption <br/> -- #### Residuals are independent if the value of one residual tells us nothing about the value of another residual <br/> -- #### This assumption can be met by randomly selecting experimental units and randomly assigning them to treatment groups <br/> -- #### It is unlikely to hold if multiple observations are made on the same experimental unit, or if the experimental units are highly clustered --- # independence assumption ### Assessment -- - In general, this problem is hard to diagnose unless you know the details of the design -- - Plot group variances against group means. There shouldn't be any pattern -- - When temporal or spatial autocorrelation is a concern, auto-covariance plots can be used for diagnosis -- ### Consequences of non-independence - Estimates of variance will generally be too small - Power will be in inflated --- # what if assumptions don't hold? ### At least three options: -- 1) Use a more complicated model that relaxes some assumptions, such as: + Generalized linear models that don't assume normality + Linear models with multiple variance parameters + Time-series models or spatial models allowing for correlated residuals -- 2) Nonparametrics <br/> -- 3) Transform the data --- class: inverse, center, middle # transformations --- # transformations ### Key idea > Transform the raw data ( `\(u_{ij}\)` ) so that the transformed data ( `\(y_{ij}\)` ) meet the assumptions of ANOVA -- #### What kinds of transformations are valid? - The transformation must maintain the rank order of the original data - Common examples include: + Logarithmic transformations + Square-root transformations + Arcsine transformations + Reciprocal transformations --- # logarithmic transformation <br/> `$$\huge y = log(u + C)$$` <br/> -- - The constant C is often 1, or 0 if there are no zeros in the data ( `\(u\)` ) -- - Useful when group variances are proportional to the means (count data) -- - Could use `\(log_{10}\)` or any base, but the natural logarithm is preferred --- # logarithmic transformation <img src="09_assumptions_files/figure-html/unnamed-chunk-4-1.png" width="648" style="display: block; margin: auto;" /> --- # logarithmic transformation <img src="09_assumptions_files/figure-html/unnamed-chunk-5-1.png" width="648" style="display: block; margin: auto;" /> --- # square root transformation <br/> `$$\huge y = \sqrt{u + C}$$` <br/> -- - C is often 0.5 or some other small number -- - Useful when group variances are proportional to the means (count data) --- # square root transformation <img src="09_assumptions_files/figure-html/unnamed-chunk-6-1.png" width="648" style="display: block; margin: auto;" /> --- # square root transformation <img src="09_assumptions_files/figure-html/unnamed-chunk-7-1.png" width="648" style="display: block; margin: auto;" /> --- # summary -- - The assumptions of ANOVA are made explicit by the linear model -- - Colloquially, these assumptions are: + Within each treatment group, the data are normally distributed + Treatment group variances are equal + There is no spatial, temporal, or other forms of correlation among the residuals - they are independent -- - ANOVA is considered robust to violations of the first two assumptions -- - Transformations, in some cases, can be used to meet all three assumptions -- - If transformations don't work, it is always possible to relax assumptions by adopting a more complicated model