Scenario

Remember from lecture that nested designs are useful in situations where we subsample within each experimental unit and we’re interested in treatment effects at the experimental (whole) unit level, not the subunit level. For example:

  • We count larvae at multiple subplots within a plot

  • We weigh multiple chicks in a brood

The additive model

\[\Large y_{ijk} = \mu + \alpha_i + \beta_{ij} + \epsilon_{ijk}\]

Because we want our inferences to apply to all experimental units, not just the ones in our sample, \(\beta_{ij}\) is random:

\[\Large \beta_{ij} \sim normal(0, \sigma^2_B)\]

And as always,

\[\Large \epsilon_{ijk} \sim normal(0, \sigma^2)\]

Hypotheses

Treatment effects:

\[\Large H_0 : \alpha_1 = · · · = \alpha_a = 0\] \[\Large H_a : at\; least\; one\; inequality\] Random variation among experimental units:

\[\Large H_0 : \sigma^2_B = 0\]

\[\Large H_a : \sigma^2_B > 0\]

Recall that a nested ANOVA is primarily used when the variation among experimental units is of interest. If it is not, we can simplify to a one-way ANOVA.

Example data

For this lab, we’ll use another data set on the response of gypsy moth larvae to pesticide treatments:

library(FANR6750)
data("mothData2")
str(mothData2)
#> 'data.frame':    36 obs. of  3 variables:
#>  $ larvae   : num  16 16 15.8 14.2 13.9 14.2 13.5 13.4 14 13.1 ...
#>  $ Treatment: chr  "Bt" "Bt" "Bt" "Bt" ...
#>  $ Plot     : int  1 1 1 1 2 2 2 2 3 3 ...

As before, we’ll need to convert Plot and Treatment to be factors:

mothData2$Treatment <- factor(mothData2$Treatment, 
                              levels = c("Control", "Bt", "Dimilin"))
mothData2$Plot <- factor(mothData2$Plot)

Note that we explicitly define the levels for Treatment so that control will be treated as the baseline level and plotted first. We don’t need to do that with Plot since the default numeric order is fine.

To see the level of replication within each treatment and plot, let’s cross-tabulate the data:

table(mothData2$Treatment, mothData2$Plot)
#>          
#>           1 2 3 4 5 6 7 8 9
#>   Control 0 0 0 4 4 4 0 0 0
#>   Bt      4 4 4 0 0 0 0 0 0
#>   Dimilin 0 0 0 0 0 0 4 4 4

Notice how plots are labeled! Not according to \(j \in {1, 2, 3}\), even though there are three plots within each treatment (and four replicates within each plot). Rather, each plot is labeled independently, 1–9.

Why is this important?


Analysis using aov()

Based on the models we’ve fit previously this semester, we might instinctively analyze these data using the following:

aov.wrong <- aov(larvae ~ Treatment + Plot, data = mothData2)
summary(aov.wrong)
#>             Df Sum Sq Mean Sq F value Pr(>F)    
#> Treatment    2  215.4   107.7  208.89 <2e-16 ***
#> Plot         6   11.2     1.9    3.61 0.0093 ** 
#> Residuals   27   13.9     0.5                   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

But if we do so, the denominator degrees of freedom are incorrect. Instead, we should use:

aov.correct <- aov(larvae ~ Treatment + Error(Plot), data = mothData2)
summary(aov.correct)
#> 
#> Error: Plot
#>           Df Sum Sq Mean Sq F value  Pr(>F)    
#> Treatment  2  215.4   107.7    57.9 0.00012 ***
#> Residuals  6   11.2     1.9                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Error: Within
#>           Df Sum Sq Mean Sq F value Pr(>F)
#> Residuals 27   13.9   0.516

Can you spot the difference? What calculations changed between these two analyses?

What happens if we analyze plot-level means? The aggregate() function is similar to tapply() but it works on entire data frames. Here we get averages for each whole plot:

plotData <- aggregate(larvae ~ Treatment + Plot, 
                      data = mothData2, FUN = mean)
plotData
#>   Treatment Plot larvae
#> 1        Bt    1  15.50
#> 2        Bt    2  13.75
#> 3        Bt    3  14.00
#> 4   Control    4  18.25
#> 5   Control    5  18.75
#> 6   Control    6  19.25
#> 7   Dimilin    7  12.50
#> 8   Dimilin    8  13.50
#> 9   Dimilin    9  13.00

When we use aov() to analyze the aggregated data, we get the same F and p as before:

aov.plot <- aov(larvae ~ Treatment, data = plotData)
summary(aov.plot)
#>             Df Sum Sq Mean Sq F value  Pr(>F)    
#> Treatment    2   53.8   26.92    57.9 0.00012 ***
#> Residuals    6    2.8    0.47                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Issues with analyzing nested data

When using using aov() with Error term:

An alternative is to use lme() function in nlme package

  • Possible to get direct estimates of \(\sigma^2_B\) and other variance parameters

  • Handles very complex models and unbalanced designs

  • Possible to do multiple comparisons and contrasts using the glht() function in the multcomp package

But…

  • Only works if there random effects

  • ANOVA tables aren’t as complete as aov()

Using the lme() function

To see how the lme() function works let’s fit the moth data model using this approach:

library(nlme)
library(multcomp)

lme1 <- lme(larvae ~ Treatment, random = ~1|Plot, data = mothData2)

The syntax for random effects can be a little confusing at first but see ?lme for additional information.

Now we can view the ANOVA table from the model using:

anova(lme1, Terms = "Treatment")
#> F-test for: Treatment 
#>   numDF denDF F-value p-value
#> 1     2     6   57.87   1e-04

And we can view the variance parameter estimates:

VarCorr(lme1)
#> Plot = pdLogChol(1) 
#>             Variance StdDev
#> (Intercept) 0.3364   0.580 
#> Residual    0.5156   0.718

The first row shows the estimates of \(\sigma^2_B\) and \(\sigma_B\). The second row shows the estimates of \(\sigma^2\) and \(\sigma\).


What can we say about the amount of random variation within whole units relative to among whole units (after accounting for treatment effects)?


Next, we can extract the plot-level random effects. These are the \(\beta_{ij}\)’s:

round(ranef(lme1), 2)
#>   (Intercept)
#> 1        0.78
#> 2       -0.48
#> 3       -0.30
#> 4       -0.36
#> 5        0.00
#> 6        0.36
#> 7       -0.36
#> 8        0.36
#> 9        0.00

And finally, multiple comparisons:

tuk <- glht(lme1, linfct = mcp(Treatment="Tukey"))
summary(tuk)
#> 
#>   Simultaneous Tests for General Linear Hypotheses
#> 
#> Multiple Comparisons of Means: Tukey Contrasts
#> 
#> 
#> Fit: lme.formula(fixed = larvae ~ Treatment, data = mothData2, random = ~1 | 
#>     Plot)
#> 
#> Linear Hypotheses:
#>                        Estimate Std. Error z value Pr(>|z|)    
#> Bt - Control == 0        -4.333      0.557   -7.78   <0.001 ***
#> Dimilin - Control == 0   -5.750      0.557  -10.32   <0.001 ***
#> Dimilin - Bt == 0        -1.417      0.557   -2.54     0.03 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)
confint(tuk)
#> 
#>   Simultaneous Confidence Intervals
#> 
#> Multiple Comparisons of Means: Tukey Contrasts
#> 
#> 
#> Fit: lme.formula(fixed = larvae ~ Treatment, data = mothData2, random = ~1 | 
#>     Plot)
#> 
#> Quantile = 2.345
#> 95% family-wise confidence level
#>  
#> 
#> Linear Hypotheses:
#>                        Estimate lwr    upr   
#> Bt - Control == 0      -4.333   -5.639 -3.027
#> Dimilin - Control == 0 -5.750   -7.056 -4.444
#> Dimilin - Bt == 0      -1.417   -2.723 -0.111

Assignment

To determine if salinity affects adult fish reproductive performance, a researcher places one pregnant female in a tank with one of three salinity levels: low, medium, and high, or a control tank. A week after birth, two offspring (fry) are measured.

The data can be accessed using:

library(FANR6750)
data("salinityData")

Run a nested ANOVA using aov() and lme() on the fish dataset. Answer the following questions:

  1. What are the null and alternative hypotheses?

  2. Does salinity affect fry growth?

  3. If so, which salinity levels differ?

  4. Is there more random variation among or within experimental units?

Put your answers in an R Markdown report. Your report should include:

  1. A well-formatted ANOVA table (with caption); and

  2. A publication-quality plot of the estimated effect of salinity on fry growth, including 95% confidence intervals. The figure should also have a descriptive caption and any aesthetics (color, line type, etc.) should be clearly defined.

You may format the report however you like but it should be well-organized, with relevant headers, plain text, and the elements described above.

As always:

  • Be sure the output type is set to: output: html_document

  • Title the document: title: "Homework 3: Nested design"

  • Be sure to include your first and last name in the author section

  • Be sure to set echo = TRUE in all R chunks so we can see both your code and the output

  • Please upload both the html and .Rmd files when you submit your assignment

  • See the R Markdown reference sheet for help with creating R chunks, equations, tables, etc.

  • See the “Creating publication-quality graphics” reference sheet for tips on formatting figures