lab09_nested.Rmd
Remember from lecture that nested designs are useful in situations where we subsample within each experimental unit and we’re interested in treatment effects at the experimental (whole) unit level, not the subunit level. For example:
We count larvae at multiple subplots within a plot
We weigh multiple chicks in a brood
We measure shrub growth at multiple subplots within a plot
\[\Large y_{ijk} = \mu + \alpha_i + \beta_{ij} + \epsilon_{ijk}\]
Can you define all of the terms in the linear model? Why do we need so many subscripts?
\[\Large \beta_{ij} \sim normal(0, \sigma^2_B)\]
And as always,
\[\Large \epsilon_{ijk} \sim normal(0, \sigma^2)\]
Treatment effects:
\[\Large H_0 : \alpha_1 = · · · = \alpha_a = 0\] \[\Large H_a : at\; least\; one\; inequality\] Random variation among experimental units:
\[\Large H_0 : \sigma^2_B = 0\]
\[\Large H_a : \sigma^2_B > 0\]
Note: Here we have referred to the variance term as \(\sigma^2_B\) but we could use any reasonable subscript to refer to the variance term for the random effect.
Note: Recall that a nested ANOVA is primarily used when the variation among experimental units is of interest. If this variation was not of interest or if we had reason to believe that all the experimental units did not differ from eachother, we can simplify to a one-way ANOVA.
For this lab, we’ll use another data set on the response of gypsy moth larvae to pesticide treatments:
library(FANR6750)
data("mothdata2")
str(mothdata2)
#> 'data.frame': 36 obs. of 3 variables:
#> $ larvae : num 16 16 15.8 14.2 13.9 14.2 13.5 13.4 14 13.1 ...
#> $ Treatment: chr "Bt" "Bt" "Bt" "Bt" ...
#> $ Plot : int 1 1 1 1 2 2 2 2 3 3 ...
As before, we’ll need to convert Plot
and
Treatment
to be factors:
mothdata2$Treatment <- factor(mothdata2$Treatment,
levels = c("Control", "Bt", "Dimilin"))
mothdata2$Plot <- factor(mothdata2$Plot)
Note that we explicitly define the levels for Treatment
so that control will be treated as the baseline level and plotted first.
We don’t need to do that with Plot
since the default
numeric order is fine.
To see the level of replication within each treatment and plot, let’s cross-tabulate the data. It is a good idea to check this since unbalanced designs generally require some additionl thought.
table(mothdata2$Treatment, mothdata2$Plot)
#>
#> 1 2 3 4 5 6 7 8 9
#> Control 0 0 0 4 4 4 0 0 0
#> Bt 4 4 4 0 0 0 0 0 0
#> Dimilin 0 0 0 0 0 0 4 4 4
Notice how plots are labeled! Not according to \(j \in {1, 2, 3}\), even though there are three plots within each treatment (and four replicates within each plot). Rather, each plot is labeled independently, 1–9.
Why is this important?
aov()
Based on the models we’ve fit previously this semester, we might instinctively analyze these data using the following:
aov.wrong <- aov(larvae ~ Treatment + Plot, data = mothdata2)
summary(aov.wrong)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Treatment 2 215.4 107.7 208.89 <2e-16 ***
#> Plot 6 11.2 1.9 3.61 0.0093 **
#> Residuals 27 13.9 0.5
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
But if we do so, the denominator degrees of freedom are incorrect. Instead, we should use:
aov.correct <- aov(larvae ~ Treatment + Error(Plot), data = mothdata2)
summary(aov.correct)
#>
#> Error: Plot
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Treatment 2 215.4 107.7 57.9 0.00012 ***
#> Residuals 6 11.2 1.9
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Error: Within
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Residuals 27 13.9 0.516
Can you spot the difference? What calculations changed between these two analyses?
Note: The next few code chunks are not necessary. They are simply demonstrating that what we are doing here is exactly the same as performing a one way ANOVA after averaging across the subunits.
What happens if we analyze plot-level means? The
aggregate()
function is similar to tapply()
but it works on entire data frames. Here we get averages for each whole
plot:
plotData <- aggregate(larvae ~ Treatment + Plot,
data = mothdata2, FUN = mean)
plotData
#> Treatment Plot larvae
#> 1 Bt 1 15.50
#> 2 Bt 2 13.75
#> 3 Bt 3 14.00
#> 4 Control 4 18.25
#> 5 Control 5 18.75
#> 6 Control 6 19.25
#> 7 Dimilin 7 12.50
#> 8 Dimilin 8 13.50
#> 9 Dimilin 9 13.00
When we use aov()
to analyze the aggregated data, we get
the same F and p as before:
When using using aov()
with Error term:
You can’t use TukeyHSD()
You don’t get a direct estimate of \(\sigma^2_B\)
Doesn’t handle unbalanced designs well
But, you can use model.tables()
An alternative is to use the lme()
function in the
nlme
package (which is designed for mixed effects
models)
Possible to get direct estimates of \(\sigma^2_B\) and other variance parameters
Handles very complex models and unbalanced designs
Possible to do multiple comparisons and contrasts using the
glht()
function in the multcomp
package
But…
Only works if there are random effects
ANOVA tables aren’t as complete as aov()
lme()
function
To see how the lme()
function works let’s fit the moth
data model using this approach:
The syntax for random effects can be a little confusing at first but
see ?lme
for additional information.
Now we can view the ANOVA table from the model using:
anova(lme1, Terms = "Treatment")
#> F-test for: Treatment
#> numDF denDF F-value p-value
#> 1 2 6 57.87 1e-04
You’ll notice that it is a very simplified version of the ANOVA table. It only provides us with one line which relates to the treatment.
And we can view the variance parameter estimates:
VarCorr(lme1)
#> Plot = pdLogChol(1)
#> Variance StdDev
#> (Intercept) 0.3364 0.580
#> Residual 0.5156 0.718
The first row shows the estimates of \(\sigma^2_B\) and \(\sigma_B\). These are redundant values since one can be calculated from the other. The second row shows the estimates of \(\sigma^2\) and \(\sigma\).
What can we say about the amount of random variation within whole units relative to among whole units (after accounting for treatment effects)? Does it appear that in this scenario the random effect of plot was necessary?
Note that we will reject the null hypothesis related to the variance (\(\sigma^2_B = 0\)) if the F-statistic (\(\frac{MS_B}{MS_E}\)) > \(F_{a(b-1), ab(n-1),\alpha}\). Keep in mind though that to get this F-statistic, we will need to use the aov() function approach.
Next, we can extract the plot-level random effects. These are the \(\beta_{ij}\)’s:
round(ranef(lme1), 2)
#> (Intercept)
#> 1 0.78
#> 2 -0.48
#> 3 -0.30
#> 4 -0.36
#> 5 0.00
#> 6 0.36
#> 7 -0.36
#> 8 0.36
#> 9 0.00
And finally, multiple comparisons:
tuk <- glht(lme1, linfct = mcp(Treatment="Tukey"))
summary(tuk)
#>
#> Simultaneous Tests for General Linear Hypotheses
#>
#> Multiple Comparisons of Means: Tukey Contrasts
#>
#>
#> Fit: lme.formula(fixed = larvae ~ Treatment, data = mothdata2, random = ~1 |
#> Plot)
#>
#> Linear Hypotheses:
#> Estimate Std. Error z value Pr(>|z|)
#> Bt - Control == 0 -4.333 0.557 -7.78 <0.001 ***
#> Dimilin - Control == 0 -5.750 0.557 -10.32 <0.001 ***
#> Dimilin - Bt == 0 -1.417 0.557 -2.54 0.03 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)
confint(tuk)
#>
#> Simultaneous Confidence Intervals
#>
#> Multiple Comparisons of Means: Tukey Contrasts
#>
#>
#> Fit: lme.formula(fixed = larvae ~ Treatment, data = mothdata2, random = ~1 |
#> Plot)
#>
#> Quantile = 2.343
#> 95% family-wise confidence level
#>
#>
#> Linear Hypotheses:
#> Estimate lwr upr
#> Bt - Control == 0 -4.333 -5.638 -3.028
#> Dimilin - Control == 0 -5.750 -7.055 -4.445
#> Dimilin - Bt == 0 -1.417 -2.722 -0.112
To determine if salinity affects fish reproductive performance, a researcher places one pregnant female in a tank with one of three salinity levels: low, medium, and high, or a control tank. A week after birth, two offspring (fry) are measured.
The data can be accessed using:
Perform the appropriate analysis using aov()
and
lme()
on the dataset. Answer the following questions:
What are the null and alternative hypotheses?
Does salinity affect fry growth?
If so, which salinity levels differ?
Is there more random variation among or within experimental units?
You may format the report however you like but it should be well-organized, with relevant headers, plain text, and the elements described above.
As always:
Be sure the output type is set to:
output: html_document
Be sure to include your first and last name in the
author
section
Be sure to set echo = TRUE
in all R
chunks so we can see both your code and the output
Please upload both the html
and .Rmd
files when you submit your assignment
See the R Markdown reference sheet for help with creating
R
chunks, equations, tables, etc.
See the “Creating publication-quality graphics” reference sheet for tips on formatting figures