LECTURE 5: null hypothesis testing

class: center, middle, inverse, title-slide

.title[
# LECTURE 5: null hypothesis testing
]
.subtitle[
## FANR 6750 (Experimental design)
]
.author[
### Fall 2024
]

---

class: inverse

# outline

#### 1) Sampling error and regression coefficients

--

#### 2) Logic of null hypothesis testing

--

#### 3) *t*-statistics and *t*-distribution

#### 4) *p*-values and decisions

#### 5) NHT for regression models

---
# fitting the model in `R`

```r
data(frogdata)

fit2 <- lm(Frogs ~ Development, data = frogdata)

broom::tidy(fit2)
```

```
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 19.2 1.87 10.3 0.00000000573
## 2 DevelopmentHigh -13.1 2.64 -4.97 0.000100
```

- The `(Intercept)` estimate ( `$\beta_0$` ) is the expected number of frogs in a low-development plot

- The `DevelopmentHigh` estimate ( `$\beta_1$` ) is the *difference* between low-development and high-development plots

- **Question**: Are there fewer frogs in high-development plots?

---

#### **Question**: Are there fewer frogs in high-development plots?

- This question is harder to answer than it might seem. Why?

#### Sampling error!

- If we did the experiment again, we'd get a different difference (maybe even a change in direction)

- The sample means will *always* be different (which we will prove shortly)

If the sample means will never be the same, we need to re-formulate our question:

- How large of a difference between our samples provides evidence that the population means are different?

---
class:inverse,middle

# null hypothesis testing

---
# null hypothesis testing

#### How can we express our question about development and its impacts on frogs as a hypothesis?

If development does not influence frog abundance, `$\beta_1 = 0$`

- This is called the *null hypothesis* (i.e., no effect)

- Denoted `$H_0$`

If development does influence frog abundance, `$\beta_1 \neq 0$`

- This is the *alternative hypothesis*

- Denoted `$H_a$`

- Note that we could make the alternative more specific and say development negatively effects abundance ( `$\beta_1 < 0$` )

---
# null hypothesis testing

#### Remember that `$\large \beta_1$` is a **population** parameter

- We estimate population parameters from samples

#### What is our best estimate of `$\large \beta_1$`?

- The difference in sample means: `$\hat{\beta}_1 = \hat{\mu}_{Low} - \hat{\mu}_{High}$`
 
--

- For the frog data:

```r
mu_high <- mean(frogdata$Frogs[frogdata$Development == "High"])
mu_low <- mean(frogdata$Frogs[frogdata$Development == "Low"])

(beta_1.hat <- mu_high - mu_low)
```

```
## [1] -13.1
```

Note that this is same as the estimate of `$\beta_1$` provided by the `lm` and `t-test` functions

---
# null hypothesis testing

#### Our estimate of `$\large \beta_1$` is clearly not `$\large 0$` but does that imply that the null hypothesis is wrong?

- Not necessarily! Why?

- Sampling error! Even if `$\beta_1 = 0$`, `$\hat{\beta}_1$` will *never* equal 0

Enter *null hypothesis testing*:

> formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance

---
# null hypothesis testing

#### NHT is based on the *expectation* of what our data should look like **if the null hypothesis is true**

- If our data look really different than what we expect **if the null hypothesis is true**, then it is unlikely that the null hypothesis is true and we reject `$H_0$`

- It's important to note that there is *always* a chance that our results are simply due to chance

#### Type I error (i.e., false positive rate)

- The probability that we will reject `$H_0$` when it is actually true

- Usually denoted `$\alpha$`

- Generally, `$\alpha = 0.05$` or `$\alpha = 0.01$` are accepted Type I error rates

---
# null hypothesis testing

#### How do we know what the data should look like **if the null hypothesis is true**?

- Again, we can use theory to tell us about long-term expectations (similar to the sampling distribution)

- But maybe easier to understand using simulation

---
# simulating data under the null hypothesis

#### Imagine the null hypothesis is true in the measurement error example (i.e., measurement error = 0)

- Any difference between the sample mean and 0 is just due to sampling error

- We can simulate a "sample" in `R` using the `rnorm()` function

```r
y <- rnorm(n = 10, mean = 0, sd = 1)

mean(y) - 0
```

```
## [1] -0.003423
```

Note that the we fixed the mean of the normal distribution to be 0, so we know the null hypothesis is true

---
# simulating data under the null hypothesis

Instead of just taking the difference `$\bar{y} - 0$`, we can standardize the differences by dividing by the standard error:

```r
se.y <- sd(y)/sqrt(10)
(mean(y) - 0)/se.y
```

```
## [1] -0.008712
```

- This tells us how many standard errors away from 0 the sample mean is

- We'll call this value `$t$`

`$$\large t = \frac{\bar{y} - 0}{SE_y}$$`

---
# simulating data under the null hypothesis

#### Generate and plot 1000 sample means:

```r
t <- numeric(length = 1000)

for(i in 1:1000){
 y <- rnorm(10, mean = 0, sd = 1)
 t[i] <- mean(y)/(sd(y)/sqrt(10))
}
```

---
# simulating data under the null hypothesis

#### Note that:

--
1. The null hypothesis was true for every sample

2. Some sample means were bigger than expected, some were smaller (all due to sampling error!)

3. The resulting distribution looks *kind of* normal

---
## THE *t*-DISTRIBUTION

The distribution of the *t*-statistics is not quite normally distributed

Instead, theory says that the *t*-statistics will follow a `$t$`-distribution with `$n - 1$` degrees of freedom (**if the null hypothesis is true**!)

---
## THE *t*-DISTRIBUTION

#### More about the *t*-distribution

- Continuous probability distribution

- Symmetrical with mean = 0

- More mass in the tails as degrees of freedom get smaller (i.e., more extreme values become more likely)

- For sample sizes `$n \gt 30$`, the `$t$`-distribution is essentially a standard normal distribution with mean = 0 and SD = 1

- Published by William Sealy Gosset in 1908 under the pseudonym "Student". Gosset worked for Guinness and was interested in quality control of beer ingredients

---
# null hypothesis testing

#### Quick review:

1. The null hypothesis `$\large H_0$` is that there is no effect or no difference

2. The *t*-statistic measures the difference between the sample mean and its hypothesized value (under the null) relative to its standard error

3. **If the null hypothesis is true**, the *t*-statistics from repeated samples follow a *t*-distribution with `$\large n-1$` degrees of freedom

Importantly, because we can quantify properties of the *t*-distribution, we can compare the *t*-statistic calculated from our observed sample to the expected values under the null hypothesis

--
- If our observed *t*-statistic would be unlikely under the null hypothesis, we can conclude that the null hypothesis is false

- This is the logic behind null hypothesis testing

---
# null hypothesis testing

#### Is there evidence to reject `$\large H_0$`?

---
# null hypothesis testing

#### Measurement error example

```r
y <- c(-0.062, -0.38, 0.85, -0.58, 0.53, 0.09, 0.31, 0.77, 0.59, -0.17)

(mean.y <- mean(y))
```

```
## [1] 0.1948
```

```r
(se.y <- sd(y)/sqrt(10))
```

```
## [1] 0.1557
```

```r
(t.stat <- mean.y / se.y)
```

```
## [1] 1.251
```

#### Is there evidence to reject `$\large H_0$`?

---
# null hypothesis testing

#### Measurement error example

Approximately 18% of the simulated values of `$t$` are larger than 1.25 (or smaller than -1.25)

- Put another way, **if the null hypothesis is true**, there is about a 1 in 5 chance of observing `$t \geq 1.25$`

---
# null hypothesis testing

In reality, we don't need to simulate the distribution of `$t$` every time we do an experiment

- Easy (in `R`) to calculate the area of grey shaded regions, which is the probability of getting a value `$t$` larger in magnitude than 1.251 **if the null hypothesis true**

- This is called a *p*-value

- In our example, `$p = 0.174$`. Would you reject the null?

---
# more on *p*-values

#### The *p* -value tells you how likely your observations (or more extreme) would be **if the null hypothesis is true**

#### A *p* -value does not tell us how much evidence there is in favor of a particular difference in means

#### What factors result in a small *p* -value?

- The sample mean is far from 0

- And/or the SE is small

---
# *p*-values and type i error

#### In NHST, our conclusion must be to either reject or "fail to reject" the null hypothesis

#### When we reject the null hypothesis, there is always a chance that we do so mistakenly

- Due to sampling error, there is always a chance we get a large value of *t* even if the null hypothesis is true

- These "false positive" mistakes are referred to as *Type I error* (denoted `$\alpha$`)

#### The probability of type I error is measured by the *p*-value

#### Generally, we want to avoid false positive conclusions. Why?

- Type I error rates of `$\lt 5$`% or `$\lt 1$`% are generally considered reasonable

---
# critical values

#### Before statistical software made it easy to calculate *p*-values, researchers would look up *critical values*

- For a given sample size (degrees of freedom) and `$\alpha$`, what is the associated value of *t*

- If your calculated *t* is `$\geq$` the critical value, reject the null hypothesis

---
# frog example

#### Back to the frog example

```r
data(frogdata)

fit2 <- lm(Frogs ~ Development, data = frogdata)

broom::tidy(fit2)
```

Now we're ready to interpret the rest of this output

---
# nhst for regression parameters

#### It turns out, regression parameters also follow a *t*-distribution **if the null hypothesis is true**

#### In a regression context:

`$$\large t = \frac{\hat{\beta}_1 - \beta_1}{SE(\hat{\beta}_1)}$$`

#### For the frog example:

```r
beta1.hat <- mu_high - mu_low
(t <- (beta1.hat - 0)/2.64)
```

```
## [1] -4.962
```

---
# nhst for regression parameters

```r
broom::tidy(fit2)
```

Notice that the `lm` output provides a *t*-statistic and *p*-value for both the slope coefficient and the intercept

- How do we interpret the intercept *p*-value?

- Is the *p*-value for the intercept biologically meaningful?

---
# nhst for t-tests

As we saw before, the frog model can be fit as a two-sample *t*-test

```r
t.test(Frogs ~ Development, data = frogdata)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  Frogs by Development
## t = 5, df = 18, p-value = 1e-04
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   7.554 18.646
## sample estimates:
##  mean in group Low mean in group High 
##               19.2                6.1
```

Notice that the *t*-statistic and *p*-value is the same as the `lm` model

---
# looking ahead

### **Next time**: Statistical power