class: center, middle, inverse, title-slide .title[ # LECTURE 1: introduction to statistics ] .subtitle[ ## FANR 6750 (Experimental design) ] .author[ ###
Fall 2023 ] --- class: inverse # outline <br/> #### 1) What is statistics? <br/> -- #### 2) Statistics and the scientific method <br/> -- #### 3) Populations vs. samples <br/> --- # what is statistics? <br/> <br/> > The study of the collection, analysis, interpretation, presentation, and organization of data (Dodge 2006) <br/> -- <br/> > The science of learning from data in the face of **uncertainty** (various) -- <br/> `$$Statistics = Information + Uncertainty$$` --- # why do we need statistics? ### Common tasks -- - Identify relationships between variables <br/> -- - Estimate unknown parameters <br/> -- - Test hypotheses <br/> -- - Describe stochastic systems <br/> -- - Make predictions that account for uncertainty --- # stats and the scientific method #### Ways of learnings -- .pull-left[ **Inductive reasoning** - Often attributed to Francis Bacon (and others) - Consistent observations -> general principle - Problem: "confirmatory" observations can't disprove theory - Example: I've only seen birds that fly :: all birds can fly ] -- .pull-right[ **Deductive reasoning** - Formalized by Karl Popper - Theory -> predictions -> observations - Based on *falsification* - Example: All birds can fly :: penguins are birds :: penguins can fly ] --- # stats and the scientific method #### Ways of learnings (real world) -- 1) Pattern identification (i.e., exploratory studies) - Anecdotes - Correlations/visual analysis - Exploratory modeling (i.e., fishing) --- # stats and the scientific method #### Ways of learnings (real world) 1) Pattern identification (i.e., exploratory studies) 2) Hypothesis formation - Formed from patterns - Should focus on mechanisms ("because", "controls", "adapted to") - Should be falsifiable - Ideally > 1 alternatives --- # stats and the scientific method #### Ways of learnings (real world) 1) Pattern identification (i.e., exploratory studies) 2) Hypothesis formation 3) Predictions - If the hypothesis is true, what do you expect to see? - Focus on things **we can measure** - More = better - "associated", "correlated", "greater/less than" --- # stats and the scientific method #### Ways of learnings (real world) 1) Pattern identification (i.e., exploratory studies) 2) Hypothesis formation 3) Predictions 4) Data collection - Can be observational but ideally manipulative experiment - Sampling must be *designed* to answer question --- # stats and the scientific method #### Ways of learnings (real world) 1) Pattern identification (i.e., exploratory studies) 2) Hypothesis formation 3) Predictions 4) Data collection 5) Models and testing - Model is mathematical abstraction of hypothesis - Model used to "confront" hypothesis with data (via predictions) - Draw conclusions: Does data support hypothesis? --- # stats and the scientific method #### Example 1) **Pattern**: Trees at higher elevations are shorter than at low elevations -- 2) **Hypotheses** -- 3) **Predictions** -- 4) **Data collection**<sup>1</sup> 5) **Models**<sup>1</sup> .footnote[[1] We'll get to these!] --- class: inverse, middle, center # causal inference --- # causal inference #### Often, we want to know whether `\(x\)` influences `\(y\)` -- - In other words, if we change `\(x\)`, will `\(y\)` change also (and by how much)? -- - Harder than it seems! Why? -- - Generally restricted to *manipulative* experiments -- + Well-designed experiments ensure that "treatment assignment is independent of the potential outcomes" (Gelman et al. 2021) -- - Increasing interest in causal inference from observational studies + This is something we will discuss later in the semester --- class: inverse, middle, center # uncertainty --- # populations vs samples #### **Hypothesis**: New plant variety is more disease resistant than current variety <img src="https://upload.wikimedia.org/wikipedia/commons/3/3e/Hemileia_vastatrix_-_coffee_leaf_rust.jpg" width="35%" style="display: block; margin: auto;" /> -- #### **Prediction**: Disease prevalence is lower in new variety than in current variety -- ### How can we determine whether the hypothesis is true? --- class:inverse # populations vs samples #### Population - A collection of subjects of interest - Often, a biologically meaningful unit - Sometimes a process of interest -- ### Question: What is the population in our example? --- # population <img src="01_intro_to_stats_files/figure-html/pop-1.png" width="648" style="display: block; margin: auto;" /> #### Note: 1) This is the **population** 2) There is **variation** within and among groups, but: 3) The hypothesis is correct (mean prevalence is lower in new vs. current variety) --- class:inverse # populations vs samples #### Population - A collection of subjects of interest - Often, a biologically meaningful unit - Sometimes a process of interest -- #### Sample - A finite subset of the population of interest, i.e. the data we collect - Samples allow us to draw inferences about the population - Good samples are: + Random + Representative + Sufficiently large --- # sample <img src="01_intro_to_stats_files/figure-html/samp1-1.png" width="648" style="display: block; margin: auto;" /> --- # sample <img src="01_intro_to_stats_files/figure-html/samp_lab-1.png" width="648" style="display: block; margin: auto;" /> -- #### Note: 1) This is *a* **sample** 2) The sample means are our best estimates of the population means 3) But the sample means will never equal the population means (Uncertainty!) --- # summary statistics ### Measures of central tendency - Sample mean `$$\large \bar{y} = \frac{\sum_{i=1}^n y_i}{n}$$` <br/> -- - Median <br/> -- - Mode --- # summary statistics ### Measures of dispersion - Sample variance `$$\large s^2 = \frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-1}$$` <br/> -- - Sample standard deviation `$$\large s = \sqrt{s^2}$$` <br/> -- - Range --- # sampling error Every sample has a different mean (and standard deviation) - more uncertainty! <img src="01_intro_to_stats_files/figure-html/samp1_again-1.png" width="648" style="display: block; margin: auto;" /> <img src="01_intro_to_stats_files/figure-html/samp2-1.png" width="648" style="display: block; margin: auto;" /> --- # sampling = uncertainty #### Because populations (usually) cannot be measured, sampling is essential -- #### But sampling is inherently *stochastic* - sampling produces uncertainty - unavoidable (but that's ok!) > Doubt is not a pleasant condition, but certainty is absurd -- Voltaire -- #### Statistics is what allows us to learn about the **population** using **samples** in the face of **uncertainty** - the primary goal of this class is for you to understand how to make robust inferences that account for uncertainty (and the limitations of those inferences) - we will return to this basic concept (sampling error) many times this semester --- # looking ahead <br/> ### **Next time**: Introduction to linear models <br/> ### **Reading**: [Fieberg chp. 1.2-1.4](https://statistics4ecologists-v1.netlify.app/linreg.html#data-example-sustainable-trophy-hunting-of-male-african-lions)