class: center, middle, inverse, title-slide .title[ # LECTURE 5: null hypothesis testing ] .subtitle[ ## FANR 6750 (Experimental design) ] .author[ ###
Fall 2024 ] --- class: inverse # outline #### 1) Sampling error and regression coefficients <br/> -- #### 2) Logic of null hypothesis testing <br/> -- #### 3) *t*-statistics and *t*-distribution <br/> -- #### 4) *p*-values and decisions <br/> -- #### 5) NHT for regression models --- # fitting the model in `R` ```r data(frogdata) fit2 <- lm(Frogs ~ Development, data = frogdata) broom::tidy(fit2) ``` ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 19.2 1.87 10.3 0.00000000573 ## 2 DevelopmentHigh -13.1 2.64 -4.97 0.000100 ``` - The `(Intercept)` estimate ( `\(\beta_0\)` ) is the expected number of frogs in a low-development plot - The `DevelopmentHigh` estimate ( `\(\beta_1\)` ) is the *difference* between low-development and high-development plots - **Question**: Are there fewer frogs in high-development plots? --- #### **Question**: Are there fewer frogs in high-development plots? - This question is harder to answer than it might seem. Why? -- #### Sampling error! -- - If we did the experiment again, we'd get a different difference (maybe even a change in direction) -- - The sample means will *always* be different (which we will prove shortly) -- If the sample means will never be the same, we need to re-formulate our question: - How large of a difference between our samples provides evidence that the population means are different? --- class:inverse,middle # null hypothesis testing --- # null hypothesis testing #### How can we express our question about development and its impacts on frogs as a hypothesis? -- If development does not influence frog abundance, `\(\beta_1 = 0\)` - This is called the *null hypothesis* (i.e., no effect) - Denoted `\(H_0\)` -- If development does influence frog abundance, `\(\beta_1 \neq 0\)` - This is the *alternative hypothesis* - Denoted `\(H_a\)` -- - Note that we could make the alternative more specific and say development negatively effects abundance ( `\(\beta_1 < 0\)` ) --- # null hypothesis testing #### Remember that `\(\large \beta_1\)` is a **population** parameter - We estimate population parameters from samples -- #### What is our best estimate of `\(\large \beta_1\)`? -- - The difference in sample means: `\(\hat{\beta}_1 = \hat{\mu}_{Low} - \hat{\mu}_{High}\)` -- - For the frog data: ```r mu_high <- mean(frogdata$Frogs[frogdata$Development == "High"]) mu_low <- mean(frogdata$Frogs[frogdata$Development == "Low"]) (beta_1.hat <- mu_high - mu_low) ``` ``` ## [1] -13.1 ``` Note that this is same as the estimate of `\(\beta_1\)` provided by the `lm` and `t-test` functions --- # null hypothesis testing #### Our estimate of `\(\large \beta_1\)` is clearly not `\(\large 0\)` but does that imply that the null hypothesis is wrong? -- - Not necessarily! Why? -- - Sampling error! Even if `\(\beta_1 = 0\)`, `\(\hat{\beta}_1\)` will *never* equal 0 -- Enter *null hypothesis testing*: > formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance --- # null hypothesis testing #### NHT is based on the *expectation* of what our data should look like **if the null hypothesis is true** -- - If our data look really different than what we expect **if the null hypothesis is true**, then it is unlikely that the null hypothesis is true and we reject `\(H_0\)` -- - It's important to note that there is *always* a chance that our results are simply due to chance -- #### Type I error (i.e., false positive rate) - The probability that we will reject `\(H_0\)` when it is actually true - Usually denoted `\(\alpha\)` - Generally, `\(\alpha = 0.05\)` or `\(\alpha = 0.01\)` are accepted Type I error rates --- # null hypothesis testing #### How do we know what the data should look like **if the null hypothesis is true**? -- - Again, we can use theory to tell us about long-term expectations (similar to the sampling distribution) -- - But maybe easier to understand using simulation --- # simulating data under the null hypothesis #### Imagine the null hypothesis is true in the measurement error example (i.e., measurement error = 0) - Any difference between the sample mean and 0 is just due to sampling error -- - We can simulate a "sample" in `R` using the `rnorm()` function ```r y <- rnorm(n = 10, mean = 0, sd = 1) mean(y) - 0 ``` ``` ## [1] -0.003423 ``` -- Note that the we fixed the mean of the normal distribution to be 0, so we know the null hypothesis is true --- # simulating data under the null hypothesis Instead of just taking the difference `\(\bar{y} - 0\)`, we can standardize the differences by dividing by the standard error: ```r se.y <- sd(y)/sqrt(10) (mean(y) - 0)/se.y ``` ``` ## [1] -0.008712 ``` -- - This tells us how many standard errors away from 0 the sample mean is - We'll call this value `\(t\)` `$$\large t = \frac{\bar{y} - 0}{SE_y}$$` --- # simulating data under the null hypothesis #### Generate and plot 1000 sample means: ```r t <- numeric(length = 1000) for(i in 1:1000){ y <- rnorm(10, mean = 0, sd = 1) t[i] <- mean(y)/(sd(y)/sqrt(10)) } ``` <img src="05_nht_files/figure-html/unnamed-chunk-6-1.png" width="432" style="display: block; margin: auto;" /> --- # simulating data under the null hypothesis <img src="05_nht_files/figure-html/unnamed-chunk-7-1.png" width="432" style="display: block; margin: auto;" /> #### Note that: -- 1. The null hypothesis was true for every sample -- 2. Some sample means were bigger than expected, some were smaller (all due to sampling error!) -- 3. The resulting distribution looks *kind of* normal --- ## THE *t*-DISTRIBUTION The distribution of the *t*-statistics is not quite normally distributed -- Instead, theory says that the *t*-statistics will follow a `\(t\)`-distribution with `\(n - 1\)` degrees of freedom (**if the null hypothesis is true**!) <img src="05_nht_files/figure-html/t-1.png" width="576" style="display: block; margin: auto;" /> --- ## THE *t*-DISTRIBUTION #### More about the *t*-distribution -- - Continuous probability distribution -- - Symmetrical with mean = 0 -- - More mass in the tails as degrees of freedom get smaller (i.e., more extreme values become more likely) -- - For sample sizes `\(n \gt 30\)`, the `\(t\)`-distribution is essentially a standard normal distribution with mean = 0 and SD = 1 -- - Published by William Sealy Gosset in 1908 under the pseudonym "Student". Gosset worked for Guinness and was interested in quality control of beer ingredients --- # null hypothesis testing #### Quick review: -- 1. The null hypothesis `\(\large H_0\)` is that there is no effect or no difference -- 2. The *t*-statistic measures the difference between the sample mean and its hypothesized value (under the null) relative to its standard error -- 3. **If the null hypothesis is true**, the *t*-statistics from repeated samples follow a *t*-distribution with `\(\large n-1\)` degrees of freedom -- Importantly, because we can quantify properties of the *t*-distribution, we can compare the *t*-statistic calculated from our observed sample to the expected values under the null hypothesis -- - If our observed *t*-statistic would be unlikely under the null hypothesis, we can conclude that the null hypothesis is false -- - This is the logic behind null hypothesis testing --- # null hypothesis testing --- # null hypothesis testing #### Is there evidence to reject `\(\large H_0\)`? <img src="05_nht_files/figure-html/unnamed-chunk-8-1.png" width="432" style="display: block; margin: auto;" /> --- # null hypothesis testing #### Measurement error example ```r y <- c(-0.062, -0.38, 0.85, -0.58, 0.53, 0.09, 0.31, 0.77, 0.59, -0.17) (mean.y <- mean(y)) ``` ``` ## [1] 0.1948 ``` ```r (se.y <- sd(y)/sqrt(10)) ``` ``` ## [1] 0.1557 ``` ```r (t.stat <- mean.y / se.y) ``` ``` ## [1] 1.251 ``` -- #### Is there evidence to reject `\(\large H_0\)`? --- # null hypothesis testing #### Measurement error example <img src="05_nht_files/figure-html/unnamed-chunk-10-1.png" width="432" style="display: block; margin: auto;" /> Approximately 18% of the simulated values of `\(t\)` are larger than 1.25 (or smaller than -1.25) - Put another way, **if the null hypothesis is true**, there is about a 1 in 5 chance of observing `\(t \geq 1.25\)` --- # null hypothesis testing In reality, we don't need to simulate the distribution of `\(t\)` every time we do an experiment <br/> <img src="05_nht_files/figure-html/p-1.png" width="360" style="display: block; margin: auto;" /> -- - Easy (in `R`) to calculate the area of grey shaded regions, which is the probability of getting a value `\(t\)` larger in magnitude than 1.251 **if the null hypothesis true** -- - This is called a *p*-value -- - In our example, `\(p = 0.174\)`. Would you reject the null? --- # more on *p*-values #### The *p* -value tells you how likely your observations (or more extreme) would be **if the null hypothesis is true** -- #### A *p* -value does not tell us how much evidence there is in favor of a particular difference in means -- #### What factors result in a small *p* -value? -- - The sample mean is far from 0 - And/or the SE is small --- # *p*-values and type i error #### In NHST, our conclusion must be to either reject or "fail to reject" the null hypothesis -- #### When we reject the null hypothesis, there is always a chance that we do so mistakenly - Due to sampling error, there is always a chance we get a large value of *t* even if the null hypothesis is true - These "false positive" mistakes are referred to as *Type I error* (denoted `\(\alpha\)`) -- #### The probability of type I error is measured by the *p*-value -- #### Generally, we want to avoid false positive conclusions. Why? -- - Type I error rates of `\(\lt 5\)`% or `\(\lt 1\)`% are generally considered reasonable --- # critical values #### Before statistical software made it easy to calculate *p*-values, researchers would look up *critical values* - For a given sample size (degrees of freedom) and `\(\alpha\)`, what is the associated value of *t* - If your calculated *t* is `\(\geq\)` the critical value, reject the null hypothesis <img src="05_nht_files/figure-html/cv-1.png" width="432" style="display: block; margin: auto;" /> --- # frog example #### Back to the frog example ```r data(frogdata) fit2 <- lm(Frogs ~ Development, data = frogdata) broom::tidy(fit2) ``` ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 19.2 1.87 10.3 0.00000000573 ## 2 DevelopmentHigh -13.1 2.64 -4.97 0.000100 ``` -- Now we're ready to interpret the rest of this output --- # nhst for regression parameters #### It turns out, regression parameters also follow a *t*-distribution **if the null hypothesis is true** -- #### In a regression context: `$$\large t = \frac{\hat{\beta}_1 - \beta_1}{SE(\hat{\beta}_1)}$$` -- #### For the frog example: ```r beta1.hat <- mu_high - mu_low (t <- (beta1.hat - 0)/2.64) ``` ``` ## [1] -4.962 ``` --- # nhst for regression parameters ```r broom::tidy(fit2) ``` ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 19.2 1.87 10.3 0.00000000573 ## 2 DevelopmentHigh -13.1 2.64 -4.97 0.000100 ``` Notice that the `lm` output provides a *t*-statistic and *p*-value for both the slope coefficient and the intercept - How do we interpret the intercept *p*-value? - Is the *p*-value for the intercept biologically meaningful? --- # nhst for t-tests As we saw before, the frog model can be fit as a two-sample *t*-test ```r t.test(Frogs ~ Development, data = frogdata) ``` ``` ## ## Welch Two Sample t-test ## ## data: Frogs by Development ## t = 5, df = 18, p-value = 1e-04 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 7.554 18.646 ## sample estimates: ## mean in group Low mean in group High ## 19.2 6.1 ``` -- Notice that the *t*-statistic and *p*-value is the same as the `lm` model --- # looking ahead <br/> ### **Next time**: Statistical power