LECTURE 4: t-tests and null hypothesis testing

class: center, middle, inverse, title-slide

.title[
# LECTURE 4: t-tests and null hypothesis testing
]
.subtitle[
## FANR 6750 (Experimental design)
]
.author[
### <br/><br/><br/>Fall 2022
]

---

class: inverse

# outline

<br/>
#### 1) One-sample t-test

<br/>  
--

#### 2) Null hypothesis testing

<br/> 
--

#### 3) Two-sample t-test

<br/> 
--

#### 4) Paired t-test

---
# one sample t-test

#### Context

- We want to know if a population mean ( `$\mu$` ) differs from some value `$\mu_0$`

- Examples  
  + Is the average measurement error equal to zero?  
  
  + Is the average student taller than 5'?
  
--

- Expressed as the *simplest* linear model:

`$$\Large y_i = \beta_0 + \epsilon_i$$`

`$$\Large \epsilon_i \sim normal(0, \sigma)$$`

---
# one sample t-test

<br/>

`$$\Large y_i = \mu + \epsilon_i$$`

`$$\Large \epsilon_i \sim normal(0, \sigma)$$`

#### How can we express our question as a hypothesis?

- Is the average measurement error equal to zero?

+ Is `$\large \mu = 0$`?

-  Is the average student taller than 5' (152.4cm)?
  
--

+ Is `$\large \mu = 152.4$`?
    
---
# one sample t-test

#### How can we answer this question?

#### Sample!

1) What is our best estimate of `$\mu$`?

---
# one sample t-test

#### How can we answer this question?

#### Sample!

1) What is our best estimate of `$\mu$`?

2) How do we decide if `$\mu = \mu_0$`?

---
class: inverse, center, middle
# null hypothesis testing

---
# null hypothesis testing

> formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance

#### Necessary because of **sampling error**

#### Long (and controversial)<sup>1</sup> history in statistics

.footnote[[1] We will discuss this more later in the semester ]

#### Requires two hypotheses

- Null hypothesis `$H_0$` (no difference/relationship/effect)

- Alternative hypothesis `$H_a$`

- Note that these hypotheses refer to the **population(s)**!

---
# null hypothesis testing

#### Example: Is measurement error 0?

- `$\Large H_0$`: `$\Large \mu = 0$`

- `$\Large H_a$`: `$\Large \mu \neq 0$`

- #### Collect sample of `$\Large n = 25$` measurements

- `$\Large \bar{y} = 0.15$`, `$\Large s = 0.7$`

### What is our decision (accept or reject `$\large H_0$`)?

---
# null hypothesis testing

#### NHT is based on the *expectation* of what our data should look like **if the null hypothesis is true**

- If our data look really different than what we expect **if the null hypothesis is true**, then it is unlikely that the null hypothesis is true and we reject `$H_0$`

- It's important to note that there is *always* a chance that our results are simply due to chance

#### Type I error (i.e., false positive rate)

- The probability that we will reject `$H_0$` when it is actually true

- Usually denoted `$\alpha$`

- Generally, `$\alpha = 0.05$` or `$\alpha = 0.01$` are accepted Type I error rates

---
# null hypothesis testing

#### One-sample t-test example

- `$\Large \bar{y} = 0.15$`, `$\Large s = 0.7$`

- Define `$\large t = \frac{\bar{y} - \mu_0}{SE_y} = \frac{0.15 - 0}{0.7/\sqrt{25}} = 1.07$`

- If `$\large H_0$` is true, what do we *expect* the value of `$\large t$` to be?

+ 0

- If we accept that `$\large t$` will never be exactly `$\large 0$`, how far from `$\large 0$` does `$\large t$` need to be for us to reject `$\large H_0$`?

---
# null hypothesis testing

If we accept that `$t$` will never be exactly `$0$`, how far from `$0$` does `$t$` need to be for us to reject `$H_0$`?

Reframe the question: **If the null hypothesis is true**, how much do we expect `$t$` to vary?

- We can answer that question using `R`!

.pull-left[

```r
mu0 <- 0
y.bar <- numeric(length = 2500)
t <- numeric(length = 2500)

for(i in 1:2500){
  y <- rnorm(25, mu0, 1)
  y.bar[i] <- mean(y)
  SE <- sd(y)/sqrt(25)
  t[i] <- (y.bar[i] - mu0)/SE
}
```
]

.pull-right[
<img src="04_NHST_files/figure-html/unnamed-chunk-4-1.png" width="324" style="display: block; margin: auto;" />
All of this variation is due to sampling error!
]

---
# null hypothesis testing

If we accept that `$t$` will never be exactly `$0$`, how far from `$0$` does `$t$` need to be for us to reject `$H_0$`?

Reframe the question: **If the null hypothesis is true**, how much do we expect `$t$` to vary?

- We can answer that question using `R`!

.pull-left[

```r
mu0 <- 0
y.bar <- numeric(length = 2500)
t <- numeric(length = 2500)

for(i in 1:2500){
  y <- rnorm(25, mu0, 1)
  y.bar[i] <- mean(y)
  SE <- sd(y)/sqrt(25)
  t[i] <- (y.bar[i] - mu0)/SE
}
```
]

.pull-right[
<img src="04_NHST_files/figure-html/unnamed-chunk-6-1.png" width="324" style="display: block; margin: auto;" />
]

---
# null hypothesis testing

#### Is there evidence to reject `$\large H_0$`?

---
# null hypothesis testing

#### Is there evidence to reject `$\large H_0$`?

Approximately 30% of the simulated values of `$t$` are larger than 1.07 (or smaller than -1.07)

- Put another way, **if the null hypothesis is true**, there is about a 1 in 3 chance of observing `$t \geq 1.07$`

---
# null hypothesis testing

In reality, we don't need to simulate the distribution of `$t$` every time we do an experiment

Theory says that the test statistic will follow a `$t$`-distribution with `$n - 1$` degrees of freedom, **if the null hypothesis is true**

.pull-left[
<img src="04_NHST_files/figure-html/unnamed-chunk-9-1.png" width="432" style="display: block; margin: auto;" />
]

.pull-right[
- The expected distribution of `$t$` **if the null hypothesis is true** and we repeated our experiment `$\infty$` number of times

- Symmetrical around 0

- Smaller sample sizes = wider `$t$`-distribution

- Approximately normal for `$n \geq 30$`
]

---
## `$\Large p$`-values

---
# critical values

---
# null hypothesis testing

#### Recap (one sample t-test)

1) Draw a random sample from a population (assumed to be normally distributed)

2) Compute the standard error of the mean:

`$$\large SEM = \frac{s}{\sqrt{n}} = \frac{\sqrt{\frac{1}{n-1}\sum^n_{i=1}(y_i-\bar{y})^2}}{\sqrt{n}}$$`

3) Compute the t statistic:

`$$\large t = \frac{\bar{y} - \mu_0}{SEM}$$`

4) If the `$p$`-value is `$< \alpha$` (or if `$t$` is more extreme than the critical value), reject the null

---
## MORE ON `$\Large p$`-VALUES

<br/>
#### A `$\large p$`-value tells you how likely your observations (or more extreme) would be **if the null hypothesis is true**

#### Our conclusion must be to either reject or "fail to reject" the null hypothesis

#### A `$\large p$`-value does not tell us how much evidence there is in favor of a particular difference in means

#### What factors result in a small `$\large p$`-value?

- The sample mean is far from 0

- And/or the SE is small
    
---
# one-tailed vs. two-tailed tests

---
# more on degrees of freedom

<br/>
> The degrees of freedom for a calculation on a set of numbers is the number of elements in the set (i.e., how many numbers there are) minus the number of different things you must know about the set
in order to complete the calculation

<br/>

--
#### Example:

> Consider a set of n = 5 numbers. In the absence of any information about them, all are free to be any value. However, if you are also told that the sum of the set is 20, then only 4 of the numbers are free to be anything, but the fifth is constrained by your knowledge that the sum must be 20.
Hence, `$df = n - 1 = 4$`

---
class: inverse, center, middle

## TWO-SAMPLE t-TEST

---
### TWO-SAMPLE t-TEST

#### Concept  
- We want to determine if two population means differ

- The null hypothesis is: `$\large H_0 : \mu_1 = \mu_2$`

- The alternative hypothesis is either:  
  + `$\large H_a : \mu_1 \neq \mu_2$` for a two-tailed test, or  
  + `$\large H_a :  \mu_1 > \mu_2$` for a one-tailed test

- Appropriate when:  
  + The two samples, one from each population, are independent  
  + Both populations are (approximately) normally distributed  
  + The population variances are unknown but are the same for both populations

---
# procedure

1) Draw two random samples from two populations

--
<br/>
2) Compute the standard error of the difference in means:

`$$\large SEDM = \sqrt{SEM_1^2 + SEM^2_2}$$`

--
<br/>
3) Compute the t statistic:

`$$\large t = \frac{\bar{y}_1 -\bar{y}_2}{SEDM}$$`

--
<br/>
4) Calculate the `$p$`-value

--
<br/>
5) If `$p < \alpha$`, reject the null hypothesis

---
# worked example

#### Question:

- Is there a difference in the density of trees at low and high elevations?

#### Hypothesis:

- Trees are more numerous at low elevations

#### Field procedure:

- `$\large n=10$` plots are sampled using randomly located belt transects 100m long `$\times$` 10m wide at both high and low elevations

#### Data:

.pull-left[
`Low elevation: 16, 14, 18, 17, 29, 31, 14, 16, 22, 15`
]

.pull-right[
`High elevation: 2, 11, 6, 8, 0, 3, 19, 1, 6, 5`
]

---
# worked example

---
# worked example

.pull-left[
`Low elevation`

`16, 14, 18, 17, 29,`  
`31, 14, 16, 22, 15`  
]

.pull-right[
`High elevation`

`2, 11, 6, 8, 0,`  
`3, 19, 1, 6, 5`  
]

- Mean of low group: `$\large \bar{y}_L = 19.2$`

- Mean of high group: `$\large \bar{y}_H = 6.1$`

- Standard deviation of low group: `$\large s_L = 6.16$`

- Standard deviation of high group: `$\large s_H = 5.63$`

- Standard error of difference in means `$\large SEDM_1 = 2.64$`

- Test statistic: `$\large t = (19.2 - 6.1)/2.64 = 4.97$`

- `$\large p$`-value: `$< 0.001$` (critical value: `$\large t_{0.95,df=10+10-2} = 1.73$`)

#### Is this a one- or two-tailed test? Why?

---
#  equal variance assumption

#### Are the variances of the two populations equal?

- This is an assumption of the two-sample t-test

- Again, we use samples to make inferences about populations

- Hypotheses:
  + `$\large H_0 : \sigma^2_1 = \sigma^2_2$`  
  + `$\large H_a : \sigma^2_1 \neq \sigma^2_2$`

- Tested using a ratio of sample variances: `$\large F = s^2_1/s^2_2$`

- This is always a two-tailed test

#### Note: It makes life easier to place the larger variance in the numerator of this ratio

---
# f-distribution

#### A ratio of variances follows an `$\large F$`-distribution

.pull-left[
#### Properties:

- `$F > 0$`

- `$F$`-distribution is not symmetrical

- Shape of distribution depends on an ordered pair of degrees of freedom, `$df_1$`
and `$df_2$`
]

.pull-right[
<img src="04_NHST_files/figure-html/f-1.png" width="288" style="display: block; margin: auto;" />
]

---
# f-distribution

---
# continuing with tree example

#### Test statistic: `$\large F = 6.032/5.632 = 1.07$`

#### Degrees of freedom: `$\large df = 9, 9$`

#### `$\large p$`-value: `$\large 0.54$`

#### Critical value: `$\large F_{0.975,df=9,9} = 4.03$`

#### Decision: `$\large p > \alpha$` (or observed `$F$` is lower than critical value). Fail to reject the null. No strong evidence that variances are different.

---
## The t-test as a linear model

As discussed previously, the t-test is a linear model ( `$y = a + bx$` )

So we could also analyze these data using the `lm()` function:

```r
trees <- data.frame(Trees = c(16, 14, 18, 17, 29, 31, 14, 16, 22, 15,
                              2, 11, 6, 8, 0, 3, 19, 1, 6, 5),
                    Elevation = factor(rep(c("Low", "High"), each = 10), 
                                       levels = c("Low", "High")))

fit.lm <- lm(Trees ~ Elevation, data = trees)
summary(fit.lm)
```

```
##            term estimate std.error statistic   p.value
## 1   (Intercept)     19.2     1.866    10.291 5.729e-09
## 2 ElevationHigh    -13.1     2.638    -4.965 1.001e-04
```

---
class: inverse, middle, center

## PAIRED *t*-TEST

---
### PAIRED *t*-TEST

#### Context

- Used when two measurements are taken on each experimental unit

- Problem can be analyzed by taking differences of each pair and then conducting a one-sample *t*-test

- Examples:  
  + Are right feet usually longer than left feet?  
  + Is small mammal density higher before or after the use of prescribedfire?  
  + Do two methods of measuring tree height yield similar results?

---
# motivation

> Matching is done in a variety of ways, but the object is always to remove extraneous variability from the experiment

---
# worked example

> Plots were arranged in pairs at 12 different locations. One plot in each pair was randomly selected for treatment with the microbial pesticide *Bacillus thuringiensis* (Bt). The other plot was untreated. Surveys of nontarget caterpillars were performed by counting caterpillars on samples of 10,000 leaves on each plot. Data below are caterpillar counts on each plot, paired by location.

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; "><table class="table table-striped table-hover table-condensed table-responsive" style="font-size: 10px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> Location </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> Untreated </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> Treated </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> Difference </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 23 </td>
   <td style="text-align:center;"> 19 </td>
   <td style="text-align:center;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 18 </td>
   <td style="text-align:center;"> 18 </td>
   <td style="text-align:center;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 29 </td>
   <td style="text-align:center;"> 24 </td>
   <td style="text-align:center;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> 22 </td>
   <td style="text-align:center;"> 23 </td>
   <td style="text-align:center;"> -1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 5 </td>
   <td style="text-align:center;"> 33 </td>
   <td style="text-align:center;"> 31 </td>
   <td style="text-align:center;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 6 </td>
   <td style="text-align:center;"> 20 </td>
   <td style="text-align:center;"> 22 </td>
   <td style="text-align:center;"> -2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 7 </td>
   <td style="text-align:center;"> 17 </td>
   <td style="text-align:center;"> 16 </td>
   <td style="text-align:center;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 8 </td>
   <td style="text-align:center;"> 25 </td>
   <td style="text-align:center;"> 23 </td>
   <td style="text-align:center;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 9 </td>
   <td style="text-align:center;"> 27 </td>
   <td style="text-align:center;"> 24 </td>
   <td style="text-align:center;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 10 </td>
   <td style="text-align:center;"> 30 </td>
   <td style="text-align:center;"> 26 </td>
   <td style="text-align:center;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 11 </td>
   <td style="text-align:center;"> 25 </td>
   <td style="text-align:center;"> 24 </td>
   <td style="text-align:center;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 12 </td>
   <td style="text-align:center;"> 27 </td>
   <td style="text-align:center;"> 28 </td>
   <td style="text-align:center;"> -1 </td>
  </tr>
</tbody>
</table></div>

---
# worked example

---
# worked example

#### Hypotheses ( `$\large \mu_d$` is the mean difference )

- `$\large H_0 :\mu_d = 0$`

- `$\large H_a :\mu_d > 0$`

#### Calculations

- Mean differences: `$\large\bar{y}_d = 1.5$`

- Standard deviation of differences: `$\large s_d = 2.24$`

- Standard error of mean differences: `$\large SEM_d = 0.65$`

- Test statistic: `$\large t = 1.5/0.65 = 2.32$`, Critical value: `$\large t_{0.95,11} = 1.80$`

### Decision?

---
# looking ahead

<br/>

#### **Next time:** Completely randomized ANOVA

<br/>

#### **Reading:** Quinn chp. 8