Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide

LECTURE 7: linear models part 2: categorical predictor > 2 levels

FANR 6750 (Experimental design)




Fall 2024

1 / 37

outline

1) Motivating example


2 / 37

outline

1) Motivating example


2) Extending the linear model


2 / 37

outline

1) Motivating example


2) Extending the linear model


3) Dummy variables with > 2 levels


2 / 37

outline

1) Motivating example


2) Extending the linear model


3) Dummy variables with > 2 levels


4) F-tests


2 / 37

outline

1) Motivating example


2) Extending the linear model


3) Dummy variables with > 2 levels


4) F-tests


5) Comparing linear model and ANOVA approaches

2 / 37

categorical predictors

In lecture 4, we learned about fitting models with categorical predictors with 1 (measurement error) or 2 (frog) levels

3 / 37

categorical predictors

In lecture 4, we learned about fitting models with categorical predictors with 1 (measurement error) or 2 (frog) levels

  • Linear models with these categorical predictors correspond to one-sample or two-sample t-tests
3 / 37

categorical predictors

In lecture 4, we learned about fitting models with categorical predictors with 1 (measurement error) or 2 (frog) levels

  • Linear models with these categorical predictors correspond to one-sample or two-sample t-tests

  • Binary categorical predictors are quite common in ecological studies (e.g., male vs female, juvenile vs adult, disturbed vs undisturbed)

3 / 37

categorical predictors

In lecture 4, we learned about fitting models with categorical predictors with 1 (measurement error) or 2 (frog) levels

  • Linear models with these categorical predictors correspond to one-sample or two-sample t-tests

  • Binary categorical predictors are quite common in ecological studies (e.g., male vs female, juvenile vs adult, disturbed vs undisturbed)

However, you will often encounter categorical predictors with >2 levels

  • What are some examples?
3 / 37

motivating example

Foresters are studying the effect of 4 different fertilizers (treatments) on the growth of loblolly pine, which are grown on 3 plots (replicates) receiving each treatment. Data are average height per plot after 5 years:


Treatment
Replicate A B C D
1 11 9 7 5
2 10 8 6 4
3 9 7 5 3
4 / 37

motivating example

Foresters are studying the effect of 4 different fertilizers (treatments) on the growth of loblolly pine, which are grown on 3 plots (replicates) receiving each treatment. Data are average height per plot after 5 years:


Treatment
Replicate A B C D
1 11 9 7 5
2 10 8 6 4
3 9 7 5 3

Notation

  • The number of groups (treatments) is a=4

  • The number of observations within each group (replicates) is n=3

  • yij denotes the jth observation from the ith group

4 / 37

a brief tangent

What counts as an observation?

5 / 37

a brief tangent

What counts as an observation?

Experimental unit

the physical unit that receives a particular treatment

5 / 37

a brief tangent

What counts as an observation?

Experimental unit

the physical unit that receives a particular treatment

Observational unit

the physical unit on which measurements are taken

5 / 37

a brief tangent

What counts as an observation?

Experimental unit

the physical unit that receives a particular treatment

Observational unit

the physical unit on which measurements are taken

These are not always the same!

5 / 37

a brief tangent

What counts as an observation?

Experimental unit

the physical unit that receives a particular treatment

Observational unit

the physical unit on which measurements are taken

These are not always the same!

Examples

  • Agricultural fields given different fertilizer, crop yield measured

  • Rats given different diets, disease state measured

  • Microcosm given different predator abundance, tadpole growth measured

5 / 37

motivating example

Question: Is there a difference in growth among the four treatment groups?

6 / 37

motivating example

Question: Is there a difference in growth among the four treatment groups?

6 / 37

motivating example

Hypotheses

  • H0:μA=μB=μC=μD

  • Ha: At least one inequality

7 / 37

motivating example

Hypotheses

  • H0:μA=μB=μC=μD

  • Ha: At least one inequality

How should we test the null?

7 / 37

motivating example

Hypotheses

  • H0:μA=μB=μC=μD

  • Ha: At least one inequality

How should we test the null?

We could do this using 6 t-tests


7 / 37

motivating example

Hypotheses

  • H0:μA=μB=μC=μD

  • Ha: At least one inequality

How should we test the null?

We could do this using 6 t-tests


But this would alter the overall (experiment-wise) α level because each individual test has a chance (usually α=0.05) of incorrectly rejecting a true null hypothesis, and this is multiplied when multiple tests are used

7 / 37

extending the linear model

Fortunately, extending the previous linear model to include additional predictor levels is "straightforward"

  • The model looks familiar

yij=β0+βixij+ϵij

ϵijnormal(0,σ)

8 / 37

extending the linear model

Fortunately, extending the previous linear model to include additional predictor levels is "straightforward"

  • The model looks familiar

yij=β0+βixij+ϵij

ϵijnormal(0,σ)

  • What's tricky is how we code xij when it is more than just 0 or 1
8 / 37

extending the linear model

Fortunately, extending the previous linear model to include additional predictor levels is "straightforward"

  • The model looks familiar

yij=β0+βixij+ϵij

ϵijnormal(0,σ)

  • What's tricky is how we code xij when it is more than just 0 or 1

  • Fortunately, what we learned about how R codes dummy variables will help

8 / 37

dummy variable coding

data(loblollydata)
loblollydata[c(1,4,7,10),]
## Treatment Replicate Height
## 1 A 1 11
## 4 B 1 9
## 7 C 1 7
## 10 D 1 5
fit1 <- lm(Height ~ Treatment, data = loblollydata)
model.matrix(fit1)[c(1,4,7,10),]
## (Intercept) TreatmentB TreatmentC TreatmentD
## 1 1 0 0 0
## 4 1 1 0 0
## 7 1 0 1 0
## 10 1 0 0 1
9 / 37

dummy variable coding

model.matrix(fit1)[c(1,4,7,10),]
## (Intercept) TreatmentB TreatmentC TreatmentD
## 1 1 0 0 0
## 4 1 1 0 0
## 7 1 0 1 0
## 10 1 0 0 1

Remember that the model matrix has one column for every parameter in the model

  • How many parameters is this model estimating?
10 / 37

dummy variable coding

model.matrix(fit1)[c(1,4,7,10),]
## (Intercept) TreatmentB TreatmentC TreatmentD
## 1 1 0 0 0
## 4 1 1 0 0
## 7 1 0 1 0
## 10 1 0 0 1

Remember that the model matrix has one column for every parameter in the model

  • How many parameters is this model estimating?

E[yij]=β0+β1I(B)ij+β2I(C)ij+β3I(D)ij where I(B/C/D)ij are dummy variables (0/1 depending on treatment)

10 / 37

dummy variable coding

model.matrix(fit1)[c(1,4,7,10),]
## (Intercept) TreatmentB TreatmentC TreatmentD
## 1 1 0 0 0
## 4 1 1 0 0
## 7 1 0 1 0
## 10 1 0 0 1

Remember that the model matrix has one column for every parameter in the model

  • How many parameters is this model estimating?

E[yij]=β0+β1I(B)ij+β2I(C)ij+β3I(D)ij where I(B/C/D)ij are dummy variables (0/1 depending on treatment)

  • What is the interpretation of each parameter?
10 / 37

motivating example

broom::tidy(fit1)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 10.0 0.577 17.3 0.000000126
## 2 TreatmentB -2.00 0.816 -2.45 0.0400
## 3 TreatmentC -4.00 0.816 -4.90 0.00120
## 4 TreatmentD -6 0.816 -7.35 0.0000801
11 / 37

motivating example

broom::tidy(fit1)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 10.0 0.577 17.3 0.000000126
## 2 TreatmentB -2.00 0.816 -2.45 0.0400
## 3 TreatmentC -4.00 0.816 -4.90 0.00120
## 4 TreatmentD -6 0.816 -7.35 0.0000801
  • Be sure you understand what each parameter means!
11 / 37

motivating example

broom::tidy(fit1)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 10.0 0.577 17.3 0.000000126
## 2 TreatmentB -2.00 0.816 -2.45 0.0400
## 3 TreatmentC -4.00 0.816 -4.90 0.00120
## 4 TreatmentD -6 0.816 -7.35 0.0000801
  • Be sure you understand what each parameter means!

  • t-statistics and p-values are calculated and interpreted in the same way as the frog example

11 / 37

motivating example

broom::tidy(fit1)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 10.0 0.577 17.3 0.000000126
## 2 TreatmentB -2.00 0.816 -2.45 0.0400
## 3 TreatmentC -4.00 0.816 -4.90 0.00120
## 4 TreatmentD -6 0.816 -7.35 0.0000801
  • Be sure you understand what each parameter means!

  • t-statistics and p-values are calculated and interpreted in the same way as the frog example

  • But wait, the null hypothesis was μA=μB=μC=μD

11 / 37

regression F-statistic

summary(fit1)
##
## Call:
## lm(formula = Height ~ Treatment, data = loblollydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1 -1 0 1 1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.000 0.577 17.32 1.3e-07 ***
## TreatmentB -2.000 0.816 -2.45 0.0400 *
## TreatmentC -4.000 0.816 -4.90 0.0012 **
## TreatmentD -6.000 0.816 -7.35 8.0e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1 on 8 degrees of freedom
## Multiple R-squared: 0.882, Adjusted R-squared: 0.838
## F-statistic: 20 on 3 and 8 DF, p-value: 0.000449
12 / 37

regression F-statistic

The F-statistic tests whether all regression coefficients (other than the intercept) are all 0

  • H0: All regression coefficients are 0

  • HA: At least one regression coefficient is not 0

13 / 37

regression F-statistic

The F-statistic tests whether all regression coefficients (other than the intercept) are all 0

  • H0: All regression coefficients are 0

  • HA: At least one regression coefficient is not 0

F-statistics are the ratio of sample variances

F=s2As2B

  • Null hypothesis is that population variances are equal ( σ2A=σ2B )

  • Always positive (variances cannot be negative)

  • Usually > 1 (by putting larger variance in the numerator)

13 / 37

regression F-statistic

The F-statistic tests whether all regression coefficients (other than the intercept) are all 0

  • H0: All regression coefficients are 0

  • HA: At least one regression coefficient is not 0

F-statistics are the ratio of sample variances

F=s2As2B

  • Null hypothesis is that population variances are equal ( σ2A=σ2B )

  • Always positive (variances cannot be negative)

  • Usually > 1 (by putting larger variance in the numerator)

Where do the variances come from in the linear model and what do they tell us about differences among groups?

13 / 37

regression F-statistic

To understand why the test is based on variance, it is helpful to consider several types of means:

14 / 37

regression F-statistic

To understand why the test is based on variance, it is helpful to consider several types of means:

  • Grand mean

ˉy.=ijyija×n

14 / 37

regression F-statistic

To understand why the test is based on variance, it is helpful to consider several types of means:

  • Grand mean

ˉy.=ijyija×n

  • Group means

ˉyi=jyijn

15 / 37

regression F-statistic

To understand why the test is based on variance, it is helpful to consider several types of means:

  • Grand mean

ˉy.=ijyija×n

  • Group means

ˉyi=jyijn

We can now decompose the observations as

yij=ˉy.+(ˉyiˉy.)+(yijˉyi)

16 / 37

the additive model

The decomposition

yij=ˉy.+(ˉyiˉy.)+(yijˉyi)

17 / 37

the additive model

The decomposition

yij=ˉy.+(ˉyiˉy.)+(yijˉyi)

The additive model

yij=μ+αi+ϵij

17 / 37

the additive model

The decomposition

yij=ˉy.+(ˉyiˉy.)+(yijˉyi)

The additive model

yij=μ+αi+ϵij

where

ϵijnormal(0,σ2)

17 / 37

the additive model

yij=μ+αi+ϵij

ϵijnormal(0,σ2)

Notes

  • μ is the grand mean of the population, estimated by ˉy.
18 / 37

the additive model

yij=μ+αi+ϵij

ϵijnormal(0,σ2)

Notes

  • μ is the grand mean of the population, estimated by ˉy.

  • αi is the effect of treatment i, estimated by ˉyiˉy.

18 / 37

the additive model

yij=μ+αi+ϵij

ϵijnormal(0,σ2)

Notes

  • μ is the grand mean of the population, estimated by ˉy.

  • αi is the effect of treatment i, estimated by ˉyiˉy.

    • It is the deviation of the group mean from the grand mean

    • If all αi=0, there is no treatment effect

    • Thus, we can re-write the null hypothesis H0:α1=α2=...=αa=0

18 / 37

the additive model

yij=μ+αi+ϵij

ϵijnormal(0,σ2)

Notes

  • μ is the grand mean of the population, estimated by ˉy.

  • αi is the effect of treatment i, estimated by ˉyiˉy.

    • It is the deviation of the group mean from the grand mean

    • If all αi=0, there is no treatment effect

    • Thus, we can re-write the null hypothesis H0:α1=α2=...=αa=0

  • ϵij is the residual error, estimated by yijˉyi

    • It is the unexplained (random) deviation of the observation from the group mean
18 / 37

sums of squares

Variation among groups

SSA=ni(ˉyiˉy.)2

19 / 37

motivating example

Question: Is there a difference in growth among the four treatment groups?

20 / 37

sums of squares

Variation among groups

SSA=ni(ˉyiˉy.)2

Variation within groups

SSW=ij(yijˉyi)2

21 / 37

motivating example

Question: Is there a difference in growth among the four treatment groups?

22 / 37

sums of squares

Variation among groups

SSA=ni(ˉyiˉy.)2

Variation within groups

SSW=ij(yijˉyi)2

Total variation

SST=SSA+SSW=ij(yijˉy.)2

23 / 37

motivating example

Question: Is there a difference in growth among the four treatment groups?

24 / 37

mean squares

To covert the sums of squares to variances, divide by the degrees of freedom

25 / 37

mean squares

To covert the sums of squares to variances, divide by the degrees of freedom

Mean squares among

MSA=SSAa1

25 / 37

mean squares

To covert the sums of squares to variances, divide by the degrees of freedom

Mean squares among

MSA=SSAa1

Mean squares within

MSW=SSWa(n1)

25 / 37

F-statistic



F=MSAMSW

26 / 37

F-statistic



F=MSAMSW

Note:

26 / 37

F-statistic



F=MSAMSW

Note:

  • F is a ratio measuring the variance among groups to the variance within groups
26 / 37

F-statistic



F=MSAMSW

Note:

  • F is a ratio measuring the variance among groups to the variance within groups

  • If there is a large treatment effect, what happens to MSA?

26 / 37

F-statistic



F=MSAMSW

Note:

  • F is a ratio measuring the variance among groups to the variance within groups

  • If there is a large treatment effect, what happens to MSA?

  • If there is little residual variation, what happens to MSW?

26 / 37

F-statistic



F=MSAMSW

Note:

  • F is a ratio measuring the variance among groups to the variance within groups

  • If there is a large treatment effect, what happens to MSA?

  • If there is little residual variation, what happens to MSW?

  • Large values of F indicate treatment effects are large relative to residual variation, but can we conclude there is a treatment effect?

26 / 37

the F-distribution

If the null hypothesis is true ( σ2A=σ2B), the ratio of sample variances follows an F-distribution

Properties:

  • F>0

  • F-distribution is not symmetrical

  • Shape of distribution depends on an ordered pair of degrees of freedom, dfA and dfB

27 / 37

f-distribution


Just like the t-distribution, we can use the F-distribution to calculate p-values and test the null hypothesis that the variances are equal

28 / 37

analysis of variance

Using the F-statistic to test whether there is a treatment effect is commonly referred to as Analysis of Variance (ANOVA)

29 / 37

analysis of variance

Using the F-statistic to test whether there is a treatment effect is commonly referred to as Analysis of Variance (ANOVA)

  • Especially in experimental settings, ANOVA is very commonly used
29 / 37

analysis of variance

Using the F-statistic to test whether there is a treatment effect is commonly referred to as Analysis of Variance (ANOVA)

  • Especially in experimental settings, ANOVA is very commonly used

  • The linear model approach is increasingly common, especially in observational settings

29 / 37

analysis of variance

Using the F-statistic to test whether there is a treatment effect is commonly referred to as Analysis of Variance (ANOVA)

  • Especially in experimental settings, ANOVA is very commonly used

  • The linear model approach is increasingly common, especially in observational settings

  • From a practical standpoint, the linear model and ANOVA approaches provide exactly the same information, just presented in different ways (just like the linear model vs t-test we saw previously)

29 / 37

worked example

Suppose we are interested in the effect of elevation on the abundance of Canada Warblers

30 / 37

worked example

Suppose we are interested in the effect of elevation on the abundance of Canada Warblers

Elevation
Replicate Low Medium High
1 1 2 4
2 3 0 7
3 0 4 5
4 2 3 5
30 / 37

Image courtesy of William H. Majoros via Wikicommons

worked example

data(warblerdata)
fit.lm <- lm(Count ~ Elevation, data = warblerdata)
summary(fit.lm)
##
## Call:
## lm(formula = Count ~ Elevation, data = warblerdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.250 -0.688 -0.250 0.938 1.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.250 0.717 7.32 4.5e-05 ***
## ElevationLow -3.750 1.014 -3.70 0.0049 **
## ElevationMedium -3.000 1.014 -2.96 0.0160 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.43 on 9 degrees of freedom
## Multiple R-squared: 0.63, Adjusted R-squared: 0.548
## F-statistic: 7.66 on 2 and 9 DF, p-value: 0.0114
31 / 37

worked example

data(warblerdata)
fit.aov <- aov(Count ~ Elevation, data = warblerdata)
summary(fit.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 2 31.5 15.75 7.66 0.011 *
## Residuals 9 18.5 2.06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
32 / 37

anova table from lm

In case you want more detail about the relationship between lm and aov output:

  • lm() function also returns residuals (e.g., yiE[yi])
fit.lm$residual
## 1 2 3 4 5 6 7 8 9 10 11 12
## -0.50 1.50 -1.50 0.50 -0.25 -2.25 1.75 0.75 -1.25 1.75 -0.25 -0.25
  • Residual sum of squares
sum(fit.lm$residuals^2)
## [1] 18.5
  • Residual mean square
sum(fit.lm$residuals^2)/9
## [1] 2.056
33 / 37

anova table from lm

In case you want more detail about the relationship between lm and aov output:

What about among group variation?

fit.lm$fitted.values
## 1 2 3 4 5 6 7 8 9 10 11 12
## 1.50 1.50 1.50 1.50 2.25 2.25 2.25 2.25 5.25 5.25 5.25 5.25
  • Treatment sum of squares
sum((fit.lm$fitted.values - mean(fit.lm$fitted.values))^2)
## [1] 31.5
  • Treatment mean square
sum((fit.lm$fitted.values - mean(fit.lm$fitted.values))^2)/2
## [1] 15.75
34 / 37

anova table from lm

In case you want more detail about the relationship between lm and aov output:

F-statistic and p-value

MSa <- sum((fit.lm$fitted.values - mean(fit.lm$fitted.values))^2)/2
MSe <- sum(fit.lm$residuals^2)/9
(F <- MSa/MSe)
## [1] 7.662
(p <- pf(F, 2, 9, lower.tail = FALSE))
## [1] 0.0114
35 / 37

recap

1) Models with categorical variables can be fit using either lm() or aov()

36 / 37

recap

1) Models with categorical variables can be fit using either lm() or aov()

2) The test traditionally known as ANOVA is just an extension of the linear model used analyze a binary predictor variable

36 / 37

recap

1) Models with categorical variables can be fit using either lm() or aov()

2) The test traditionally known as ANOVA is just an extension of the linear model used analyze a binary predictor variable

3) R codes categorical predictors using dummy variables and lm coefficients correspond to the difference between the reference level and the other treatment levels

36 / 37

recap

1) Models with categorical variables can be fit using either lm() or aov()

2) The test traditionally known as ANOVA is just an extension of the linear model used analyze a binary predictor variable

3) R codes categorical predictors using dummy variables and lm coefficients correspond to the difference between the reference level and the other treatment levels

4) lm() or aov() provide the exact same information, just presented in different ways

36 / 37

recap

1) Models with categorical variables can be fit using either lm() or aov()

2) The test traditionally known as ANOVA is just an extension of the linear model used analyze a binary predictor variable

3) R codes categorical predictors using dummy variables and lm coefficients correspond to the difference between the reference level and the other treatment levels

4) lm() or aov() provide the exact same information, just presented in different ways

5) ANOVA output commonly used for manipulative experiments, linear model output for observational studies

36 / 37

recap

1) Models with categorical variables can be fit using either lm() or aov()

2) The test traditionally known as ANOVA is just an extension of the linear model used analyze a binary predictor variable

3) R codes categorical predictors using dummy variables and lm coefficients correspond to the difference between the reference level and the other treatment levels

4) lm() or aov() provide the exact same information, just presented in different ways

5) ANOVA output commonly used for manipulative experiments, linear model output for observational studies

6) But how do we tell which groups differ from each other (aside from just the reference level)...

36 / 37

looking ahead


Next time: Multiple comparisons


Reading: Fieberg chp. 3.9 and 3.12

37 / 37

outline

1) Motivating example


2 / 37
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow