LECTURE 11: blocking and blocked designs

class: center, middle, inverse, title-slide

.title[
# LECTURE 11: blocking and blocked designs
]
.subtitle[
## FANR 6750 (Experimental design)
]
.author[
### <br/><br/><br/>Fall 2022
]

---

# outline

1) Motivation

<br/>

2) Design

<br/>

3) Model

<br/>

4) Procedure

<br/>

5) Example

---
# motivation

#### Recall that the completely randomized design is used when no sources of variation other than the treatment effects are known or anticipated

<br/>

#### For example, in the loblolly pine example, we assumed that the plots were essentially homogeneous with respect to soil conditions and other spatial factors

<br/>

### However, one should always look for other sources of variation!

---
# motivation

#### What other sources of variation should we look for?

- Factors that are likely to influence the response variable that are not related to the treatments of interest

- We might *not* be interested in these other factors per se

+ In field studies, two adjacent plots that get different treatments can be more similar than two separated plots that receive the same treatment  
    
--

+ Heavier animals in a group respond to a diet differently than light animals, as might animals from different litters  
    
--

+ Observations taken on the same day will have a similar response relative to observations taken on different days

#### In each of the above examples, we recognize this heterogeneity **PRIOR** to the experimental activity. That is, we recognize it in the design stage.

---
class: inverse, center, middle

# block designs

---
# block designs

We partition our experimental units into groups or **BLOCKS** such that:

- Within the block, experimental conditions are as homogeneous as we can make them

- Between the blocks, variation may be considerable but differences between blocks are not of immediate interest

In the loblolly example, the plots might be located along a slope that would have an associated moisture gradient

Our blocks would be arranged across the moisture gradient; but each block would contain an area that is homogeneous with regard to moisture

Then, we would assign the various treatment levels at random to the experimental units contained within each block

---
# randomized complete block design

**RCBD**: If each treatment level is present in each block, we call it a complete block. And the overall design is called a randomized complete block design

**Example**: The effect of four different drugs (A, B, C, D) on mice could be confounded by the litter the mouse came from. To remove nuisance variability we use litters as blocks.

<table class="table table-condensed" style="font-size: 12px; margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Litter (block)</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Mouse</div></th>
</tr>
  <tr>
   <th style="text-align:center;">   </th>
   <th style="text-align:center;"> 1 </th>
   <th style="text-align:center;"> 2 </th>
   <th style="text-align:center;"> 3 </th>
   <th style="text-align:center;"> 4 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> B </td>
   <td style="text-align:center;"> A </td>
   <td style="text-align:center;"> D </td>
   <td style="text-align:center;"> C </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> A </td>
   <td style="text-align:center;"> C </td>
   <td style="text-align:center;"> B </td>
   <td style="text-align:center;"> D </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> C </td>
   <td style="text-align:center;"> D </td>
   <td style="text-align:center;"> A </td>
   <td style="text-align:center;"> B </td>
  </tr>
</tbody>
</table>

---
# randomized complete block design

<br/>

- Blocking comes from agricultural field experiments (R. A. Fisher)

- Blocking is an extension of the paired t-test for >2 groups

- It's easy to think of blocks as plots of land, but blocks need not be spatial

#### Blocking is used when an experimenter is interested in controlling an extraneous source of variability that is not of direct interest but is likely to affect the response variable

---
# data structure for rcbd

<table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Block ($j$)</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Treatment ($i$)</div></th>
</tr>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;"> 1 </th>
   <th style="text-align:left;"> 2 </th>
   <th style="text-align:left;"> 3 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> $y_{11}$ </td>
   <td style="text-align:left;"> $y_{21}$ </td>
   <td style="text-align:left;"> $y_{31}$ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> $y_{12}$ </td>
   <td style="text-align:left;"> $y_{22}$ </td>
   <td style="text-align:left;"> $y_{32}$ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:left;"> $y_{13}$ </td>
   <td style="text-align:left;"> $y_{23}$ </td>
   <td style="text-align:left;"> $y_{33}$ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 4 </td>
   <td style="text-align:left;"> $y_{14}$ </td>
   <td style="text-align:left;"> $y_{24}$ </td>
   <td style="text-align:left;"> $y_{34}$ </td>
  </tr>
</tbody>
</table>

Notice that the subscript `$j$` now represents more than just a replicate; it also represents a block

Also notice that no additional replicates are needed in order to develop a blocked design, just some planning such that every treatment is represented in every block

---
class: inverse, center, middle

# model

---
# model

`$$\huge y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}$$`
`$$\Large \epsilon_{ij} \sim normal(0, \sigma^2)$$`

`$$\Large i = 1,..., a$$`
`$$\Large j = 1,..., b$$`

- `$\mu =$` grand mean

- `$\alpha_i =$` deviation from the mean for the `$i$`th treatment

- `$\beta_j =$` deviation from the mean for the `$j$`th block

- `$\epsilon_{ij} =$` unexplained variation

---
# hypotheses

### Main hypothesis

`$$\Large H_0 : \alpha_1 = \alpha_2 = ... = \alpha_a = 0$$`

`$$\Large H_a : At\;least\;one\;inequality$$`
### Secondary hypothesis

`$$\Large H_0 : \beta_1 = \beta_2 = ... = \beta_b = 0$$`

`$$\Large H_a : At\;least\;one\;inequality$$`

---
# anova table

<br/>

<table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;"> Source </th>
   <th style="text-align:center;"> df </th>
   <th style="text-align:center;"> SS </th>
   <th style="text-align:center;"> MS </th>
   <th style="text-align:center;"> F </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Treatments </td>
   <td style="text-align:center;"> $a-1$ </td>
   <td style="text-align:center;"> $b\sum_i (\bar{y}_i - \bar{y}.)^2$ </td>
   <td style="text-align:center;"> $\frac{SS_a}{a-1}$ </td>
   <td style="text-align:center;"> $\frac{MS_a}{MS_e}$ </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Blocks </td>
   <td style="text-align:center;"> $b-1$ </td>
   <td style="text-align:center;"> $a\sum_j  (\bar{y}_{j} - \bar{y}.)^2$ </td>
   <td style="text-align:center;"> $\frac{SS_b}{b-1}$ </td>
   <td style="text-align:center;"> $\frac{MS_b}{MS_e}$ </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Error </td>
   <td style="text-align:center;"> $(a-1)(b-1)$ </td>
   <td style="text-align:center;"> $\sum_i \sum_j (y_{ij} - \bar{y}_i - \bar{y}_{j} + \bar{y}.)^2$ </td>
   <td style="text-align:center;"> $\frac{SS_e}{df_e}$ </td>
   <td style="text-align:center;">  </td>
  </tr>
</tbody>
</table>

---
# what happens if block effect is ignored?

<br/>

- What happens mathematically if we have a blocked design but choose to not use the block factor in the analysis?

- The treatment sum of squares and degrees of freedom are unaffected. Not so with the error sum of squares and df

- The block sum of squares and df will be added to the error (or residual) SS and df

- If the block effect is significant, then a substantial amount is added to the denominator in the F-test, **decreasing power**

---
# what if the block effect isn't significant?

<br/>

#### There are differences of opinion regarding follow-up procedure if the block is not significant

<br/>

#### Some say you are stuck with that design regardless. Others are more liberal, suggesting that you redo the analysis without the blocking factor

---
class: middle, center, inverse

# example

---
# gypsy moth data

We are interested in comparing the alternative gypsy moth control strategies (Bt, Dimilin, no spray) in their effectiveness in controlling gypsy moth. Because sprayed areas are large, and
different treatments are applied on different ridges, extraneous variability due to location is expected and should be controlled. Data are the average number of moths captured in pheromone
traps placed in the plots.

<table class="table table-condensed" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Region</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Treatment</div></th>
</tr>
  <tr>
   <th style="text-align:right;">   </th>
   <th style="text-align:right;"> Control </th>
   <th style="text-align:right;"> Bt </th>
   <th style="text-align:right;"> Dimilin </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 25 </td>
   <td style="text-align:right;"> 16 </td>
   <td style="text-align:right;"> 14 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 10 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:right;"> 10 </td>
   <td style="text-align:right;"> 16 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 32 </td>
   <td style="text-align:right;"> 18 </td>
   <td style="text-align:right;"> 12 </td>
  </tr>
</tbody>
</table>

---
# analyze using aov

```r
aov1 <- aov(caterpillar ~ Treatment + Region, mothData)
summary(aov1)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> df </th>
   <th style="text-align:right;"> sumsq </th>
   <th style="text-align:right;"> meansq </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 223.17 </td>
   <td style="text-align:right;"> 111.58 </td>
   <td style="text-align:right;"> 5.83 </td>
   <td style="text-align:right;"> 0.04 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Region </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 430.92 </td>
   <td style="text-align:right;"> 143.64 </td>
   <td style="text-align:right;"> 7.51 </td>
   <td style="text-align:right;"> 0.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residuals </td>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:right;"> 114.83 </td>
   <td style="text-align:right;"> 19.14 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
</tbody>
</table>

---
# analyze using aov

#### Look what happens if we ignore the blocking variable

```r
aov2 <- aov(caterpillar ~ Treatment, mothData)
summary(aov2)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> df </th>
   <th style="text-align:right;"> sumsq </th>
   <th style="text-align:right;"> meansq </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 223.17 </td>
   <td style="text-align:right;"> 111.58 </td>
   <td style="text-align:right;"> 1.84 </td>
   <td style="text-align:right;"> 0.21 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residuals </td>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:right;"> 545.75 </td>
   <td style="text-align:right;"> 60.64 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
</tbody>
</table>

#### Why is the effect of pesticide no longer significant?

---
# conclusions

<br/>

- There is a treatment effect. The effect of location also was significant, so blocking is an effective design feature in this experiment

- As with the completely randomized design, we would want to follow up a significant F-score for the treatment variable with a multiple comparison test, or else use contrasts

- For the blocking variable, we are usually content to just know that the block effect was significant, and leave it at that

- We might, however, want to see which forests had more gypsy moths, so there is nothing wrong with a follow-up procedure for the blocks as well

---
# summary

<br/>

- Blocking provides additional power to test treatment effects by controlling extraneous sources of variation

- There is little lost by using a blocked design, and there is much that can be gained in terms of power

- The key issue is that blocking must be done as part of the design. You can't "*search for blocks*" in the data