class: center, middle, inverse, title-slide .title[ # LECTURE 3: introduction to statistical modeling ] .subtitle[ ## (or, everything is a linear model) ] .author[ ###
FANR 6750 (Experimental design) ] .date[ ###
Fall 2022 ] --- class: inverse # outline <br/> #### 1) What is a model? <br/> -- #### 2) What is a linear model? <br/> -- #### 3) Linear model assumptions --- # what is a model? ![](https://media.giphy.com/media/12npFVlmZoXN4Y/giphy.gif) --- # what is a model? > "an informative representation of an object, person or system" -- #### Many types (conceptual, graphical, mathematical) -- #### In this class, we will deal with *statistical* models -- - Mathematical representation of our hypothesis -- - By necessity, models will be simplifications of reality ("all models are wrong...") -- - Do not have to be complex --- # but i don't want to be a modeler! <img src="puffy_shirt.png" width="46%" height="65%" style="display: block; margin: auto;" /> -- - Inference **requires** models -- - Models link **observations** to **processes** -- - Models are tools that allow us understand processes that we **cannot directly observe** based on quantities that we **can** observe --- # a simple model </br> </br> `$$\Huge y = a + bx$$` -- <img src="03_models_files/figure-html/unnamed-chunk-2-1.png" width="360" style="display: block; margin: auto;" /> -- It may not be obvious, but this is essentially the only model we will use this semester<sup>1</sup> .footnote[[1] With some minor variations, mainly in `\(x\)`] --- # a simple model </br> </br> `$$\Huge y = a + bx$$` <img src="03_models_files/figure-html/unnamed-chunk-3-1.png" width="360" style="display: block; margin: auto;" /> If we want to use this as a statistical model, what's missing? --- # a simple model </br> </br> `$$\Huge y = a + bx$$` <img src="03_models_files/figure-html/unnamed-chunk-4-1.png" width="360" style="display: block; margin: auto;" /> If we want to use this as a statistical model, what's missing? #### **Stochasticity!** --- # a simple model </br> </br> `$$\Huge y = a + bx$$` <img src="03_models_files/figure-html/unnamed-chunk-5-1.png" width="360" style="display: block; margin: auto;" /> If we want to use this as a statistical model, what's missing? #### **Stochasticity!** --- class:inverse, middle, center # the linear model --- # Statistics cookbook <img src="stats_flow_chart.png" width="50%" style="display: block; margin: auto;" /> --- # the linear model <br/> <br/> `$$\Large response = deterministic\; part+stochastic\; part$$` <br/> <br/> -- `$$\underbrace{\LARGE E[y_i] = \beta_0 + \beta_1 \times x_i}_{Deterministic}$$` <br/> <br/> -- `$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma)}_{Stochastic}$$` ??? Note that the deterministic portion of the model has the same form as the equation for a line: `\(y = a + b \times x\)`, which is why we call these linear models --- # the linear model #### A "simple" example `$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$` -- <img src="03_models_files/figure-html/unnamed-chunk-7-1.png" width="288" style="display: block; margin: auto;" /> --- # the linear model #### A "simple" example `$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$` `$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma=0.25)}_{Stochastic}$$` -- <img src="03_models_files/figure-html/unnamed-chunk-8-1.png" width="288" style="display: block; margin: auto;" /> --- # the linear model #### Same model, different `\(\Large x\)` `$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$` `$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma=0.25)}_{Stochastic}$$` -- <img src="03_models_files/figure-html/unnamed-chunk-9-1.png" width="432" style="display: block; margin: auto;" /> --- # the linear model #### A more complex model `$$\large y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + ... + \beta_px_{ip} + \epsilon_i$$` -- - Each `\(\beta\)` coefficient is the effect of a specific predictor variables `\(x\)` - Predictor variables may be continuous, binary, factors, or a combination - We will cover more complex models (and interpretation) later --- # is this a linear model? `$$\Large y = 20 + 0.5x - 0.3x^2$$` <img src="03_models_files/figure-html/unnamed-chunk-10-1.png" width="396" style="display: block; margin: auto;" /> --- # residuals #### One concept we will talk about a lot is *residuals* -- - Residuals are the difference between the observed values `\(y_i\)` and the predicted values `\(E[y_i]\)` <img src="03_models_files/figure-html/unnamed-chunk-11-1.png" width="396" style="display: block; margin: auto;" /> --- # residuals #### One concept we will talk about a lot is *residuals* - Residuals are the difference between the observed values `\(y_i\)` and the predicted values `\(E[y_i]\)` <img src="03_models_files/figure-html/unnamed-chunk-12-1.png" width="396" style="display: block; margin: auto;" /> -- - How much variation in `\(y\)` is explained by `\(x\)`? -- - Useful for assessing whether data violate model assumptions --- class:inverse, center, middle # assumptions --- # assumptions #### **EVERY** model has assumptions -- - Assumptions are necessary to simplify real world to workable model -- - If your data violate the assumptions of your model, inferences *may* be invalid -- - **Always** know (and test) the assumptions of your model<sup>1</sup> .footnote[[1] You know what happens when you assume...] --- # linear model assumptions </br> `$$\Large y_i = \beta_0 + \beta_1 x_i + \epsilon_i$$` `$$\Large \epsilon_i \sim normal(0, \sigma)$$` </br> -- 1) **Linearity**: The relationship between `\(x\)` and `\(y\)` is linear -- 2) **Normality**: The residuals are normally distributed<sup>2</sup> .footnote[[2] Note that these assumptions apply to the residuals, not the data!] -- 3) **Homoscedasticity**: The residuals have a constant variance at every level of `\(x\)` -- 4) **Independence**: The residuals are independent (i.e., uncorrelated with each other) ??? Because virtually every model we will use this semester is a linear model, these assumptions apply to everything we will discuss from here out --- # linear models #### Very flexible -- - Predictor(s) can take different forms (binary, continuous, factor) -- - Can contain many predictors -- - Can model non-linear relationships -- #### Link different "tests" (e.g., t-tests, ANOVA, ANCOVA, linear regression) -- #### Can be used for different statistical goals - Estimating unknown parameters - Testing hypotheses - Describing stochastic systems - Making predictions that account for uncertainty --- # looking ahead <br/> #### **Next time:** t-tests and Null Hypothesis Testing <br/> #### **Reading:** Quinn chp. 3