Lab 8

  • Model selection

    • Exploration

    • Prediction

Today’s topics

  • Nested ANOVA

    • Specifying the model

    • Hypotheses

    • Random effects

    • lme() function


Scenario

Remember from lecture that nested designs are useful in situations where we subsample within each experimental unit and we’re interested in treatment effects at the experimental (whole) unit level, not the subunit level. For example:

  • We count larvae at multiple subplots within a plot

  • We weigh multiple chicks in a brood

  • We measure shrub growth at multiple subplots within a plot


The additive model

\[\Large y_{ijk} = \mu + \alpha_i + \beta_{ij} + \epsilon_{ijk}\]

Can you define all of the terms in the linear model? Why do we need so many subscripts?


\[\Large \beta_{ij} \sim normal(0, \sigma^2_B)\]

And as always,

\[\Large \epsilon_{ijk} \sim normal(0, \sigma^2)\]

Hypotheses

Treatment effects:

\[\Large H_0 : \alpha_1 = · · · = \alpha_a = 0\] \[\Large H_a : at\; least\; one\; inequality\] Random variation among experimental units:

\[\Large H_0 : \sigma^2_B = 0\]

\[\Large H_a : \sigma^2_B > 0\]

Note: Here we have referred to the variance term as \(\sigma^2_B\) but we could use any reasonable subscript to refer to the variance term for the random effect.

Note: Recall that a nested ANOVA is primarily used when the variation among experimental units is of interest. If this variation was not of interest or if we had reason to believe that all the experimental units did not differ from eachother, we can simplify to a one-way ANOVA.

Example data

For this lab, we’ll use another data set on the response of gypsy moth larvae to pesticide treatments:

library(FANR6750)
data("mothdata2")
str(mothdata2)
#> 'data.frame':    36 obs. of  3 variables:
#>  $ larvae   : num  16 16 15.8 14.2 13.9 14.2 13.5 13.4 14 13.1 ...
#>  $ Treatment: chr  "Bt" "Bt" "Bt" "Bt" ...
#>  $ Plot     : int  1 1 1 1 2 2 2 2 3 3 ...

As before, we’ll need to convert Plot and Treatment to be factors:

mothdata2$Treatment <- factor(mothdata2$Treatment, 
                              levels = c("Control", "Bt", "Dimilin"))

mothdata2$Plot <- factor(mothdata2$Plot)

Note that we explicitly define the levels for Treatment so that control will be treated as the baseline level and plotted first. We don’t need to do that with Plot since the default numeric order is fine.

To see the level of replication within each treatment and plot, let’s cross-tabulate the data. It is a good idea to check this since unbalanced designs generally require some additionl thought.

table(mothdata2$Treatment, mothdata2$Plot)
#>          
#>           1 2 3 4 5 6 7 8 9
#>   Control 0 0 0 4 4 4 0 0 0
#>   Bt      4 4 4 0 0 0 0 0 0
#>   Dimilin 0 0 0 0 0 0 4 4 4

Notice how plots are labeled! Not according to \(j \in {1, 2, 3}\), even though there are three plots within each treatment (and four replicates within each plot). Rather, each plot is labeled independently, 1–9.

Why is this important?


Analysis using aov()

Based on the models we’ve fit previously this semester, we might instinctively analyze these data using the following:

aov.wrong <- aov(larvae ~ Treatment + Plot, data = mothdata2)
summary(aov.wrong)
#>             Df Sum Sq Mean Sq F value Pr(>F)    
#> Treatment    2  215.4   107.7  208.89 <2e-16 ***
#> Plot         6   11.2     1.9    3.61 0.0093 ** 
#> Residuals   27   13.9     0.5                   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

But if we do so, the denominator degrees of freedom are incorrect. Instead, we should use:

aov.correct <- aov(larvae ~ Treatment + Error(Plot), data = mothdata2)
summary(aov.correct)
#> 
#> Error: Plot
#>           Df Sum Sq Mean Sq F value  Pr(>F)    
#> Treatment  2  215.4   107.7    57.9 0.00012 ***
#> Residuals  6   11.2     1.9                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Error: Within
#>           Df Sum Sq Mean Sq F value Pr(>F)
#> Residuals 27   13.9   0.516

Can you spot the difference? What calculations changed between these two analyses?



Note: The next few code chunks are not necessary. They are simply demonstrating that what we are doing here is exactly the same as performing a one way ANOVA after averaging across the subunits.

What happens if we analyze plot-level means? The aggregate() function is similar to tapply() but it works on entire data frames. Here we get averages for each whole plot:

plotData <- aggregate(larvae ~ Treatment + Plot, 
                      data = mothdata2, FUN = mean)
plotData
#>   Treatment Plot larvae
#> 1        Bt    1  15.50
#> 2        Bt    2  13.75
#> 3        Bt    3  14.00
#> 4   Control    4  18.25
#> 5   Control    5  18.75
#> 6   Control    6  19.25
#> 7   Dimilin    7  12.50
#> 8   Dimilin    8  13.50
#> 9   Dimilin    9  13.00

When we use aov() to analyze the aggregated data, we get the same F and p as before:

aov.plot <- aov(larvae ~ Treatment, data = plotData)
summary(aov.plot)
#>             Df Sum Sq Mean Sq F value  Pr(>F)    
#> Treatment    2   53.8   26.92    57.9 0.00012 ***
#> Residuals    6    2.8    0.47                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Issues with analyzing nested data

When using using aov() with Error term:

  • You can’t use TukeyHSD()

  • You don’t get a direct estimate of \(\sigma^2_B\)

  • Doesn’t handle unbalanced designs well

  • But, you can use model.tables()


An alternative is to use the lme() function in the nlme package (which is designed for mixed effects models)

  • Possible to get direct estimates of \(\sigma^2_B\) and other variance parameters

  • Handles very complex models and unbalanced designs

  • Possible to do multiple comparisons and contrasts using the glht() function in the multcomp package

But…

  • Only works if there are random effects

  • ANOVA tables aren’t as complete as aov()

Using the lme() function

To see how the lme() function works let’s fit the moth data model using this approach:

library(nlme)
library(multcomp)

lme1 <- lme(larvae ~ Treatment, random = ~1|Plot, data = mothdata2)

The syntax for random effects can be a little confusing at first but see ?lme for additional information.

Now we can view the ANOVA table from the model using:

anova(lme1, Terms = "Treatment")
#> F-test for: Treatment 
#>   numDF denDF F-value p-value
#> 1     2     6   57.87   1e-04

You’ll notice that it is a very simplified version of the ANOVA table. It only provides us with one line which relates to the treatment.

And we can view the variance parameter estimates:

VarCorr(lme1)
#> Plot = pdLogChol(1) 
#>             Variance StdDev
#> (Intercept) 0.3364   0.580 
#> Residual    0.5156   0.718

The first row shows the estimates of \(\sigma^2_B\) and \(\sigma_B\). These are redundant values since one can be calculated from the other. The second row shows the estimates of \(\sigma^2\) and \(\sigma\).



What can we say about the amount of random variation within whole units relative to among whole units (after accounting for treatment effects)? Does it appear that in this scenario the random effect of plot was necessary?

Note that we will reject the null hypothesis related to the variance (\(\sigma^2_B = 0\)) if the F-statistic (\(\frac{MS_B}{MS_E}\)) > \(F_{a(b-1), ab(n-1),\alpha}\). Keep in mind though that to get this F-statistic, we will need to use the aov() function approach.


Next, we can extract the plot-level random effects. These are the \(\beta_{ij}\)’s:

round(ranef(lme1), 2)
#>   (Intercept)
#> 1        0.78
#> 2       -0.48
#> 3       -0.30
#> 4       -0.36
#> 5        0.00
#> 6        0.36
#> 7       -0.36
#> 8        0.36
#> 9        0.00

And finally, multiple comparisons:

tuk <- glht(lme1, linfct = mcp(Treatment="Tukey"))
summary(tuk)
#> 
#>   Simultaneous Tests for General Linear Hypotheses
#> 
#> Multiple Comparisons of Means: Tukey Contrasts
#> 
#> 
#> Fit: lme.formula(fixed = larvae ~ Treatment, data = mothdata2, random = ~1 | 
#>     Plot)
#> 
#> Linear Hypotheses:
#>                        Estimate Std. Error z value Pr(>|z|)    
#> Bt - Control == 0        -4.333      0.557   -7.78   <0.001 ***
#> Dimilin - Control == 0   -5.750      0.557  -10.32   <0.001 ***
#> Dimilin - Bt == 0        -1.417      0.557   -2.54     0.03 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)
confint(tuk)
#> 
#>   Simultaneous Confidence Intervals
#> 
#> Multiple Comparisons of Means: Tukey Contrasts
#> 
#> 
#> Fit: lme.formula(fixed = larvae ~ Treatment, data = mothdata2, random = ~1 | 
#>     Plot)
#> 
#> Quantile = 2.343
#> 95% family-wise confidence level
#>  
#> 
#> Linear Hypotheses:
#>                        Estimate lwr    upr   
#> Bt - Control == 0      -4.333   -5.638 -3.028
#> Dimilin - Control == 0 -5.750   -7.055 -4.445
#> Dimilin - Bt == 0      -1.417   -2.722 -0.112

Assignment

To determine if salinity affects fish reproductive performance, a researcher places one pregnant female in a tank with one of three salinity levels: low, medium, and high, or a control tank. A week after birth, two offspring (fry) are measured.

The data can be accessed using:

library(FANR6750)
data("salinitydata")

Perform the appropriate analysis using aov() and lme() on the dataset. Answer the following questions:

  1. What are the null and alternative hypotheses?

  2. Does salinity affect fry growth?

  3. If so, which salinity levels differ?

  4. Is there more random variation among or within experimental units?

  1. Do you think that it was important to include ‘adult’ as a random effect or could this analysis have been simplified to a one-way ANOVA?
  2. Calculate a F-statistic and critical value to determine whether or not to reject the null hypothesis related to the variance term.
  1. Include a well formatted ANOVA table (with header) from the aov() results.

You may format the report however you like but it should be well-organized, with relevant headers, plain text, and the elements described above.

As always:

  • Be sure the output type is set to: output: html_document

  • Be sure to include your first and last name in the author section

  • Be sure to set echo = TRUE in all R chunks so we can see both your code and the output

  • Please upload both the html and .Rmd files when you submit your assignment

  • See the R Markdown reference sheet for help with creating R chunks, equations, tables, etc.

  • See the “Creating publication-quality graphics” reference sheet for tips on formatting figures