LECTURE 1: introduction to statistics

class: center, middle, inverse, title-slide

.title[
# LECTURE 1: introduction to statistics
]
.subtitle[
## FANR 6750 (Experimental design)
]
.author[
### Fall 2023
]

---

class: inverse

# outline

#### 1) What is statistics?

--

#### 2) Statistics and the scientific method

--

#### 3) Populations vs. samples

---

# what is statistics?

> The study of the collection, analysis, interpretation, presentation, and organization of data (Dodge 2006)

--
 
> The science of learning from data in the face of **uncertainty** (various)

--

`$$Statistics = Information + Uncertainty$$`

---
# why do we need statistics?

### Common tasks

- Identify relationships between variables

--
- Estimate unknown parameters

--
- Test hypotheses

--
- Describe stochastic systems

--
- Make predictions that account for uncertainty

---
# stats and the scientific method

#### Ways of learnings

.pull-left[
**Inductive reasoning**

- Often attributed to Francis Bacon (and others)

- Consistent observations -> general principle

- Problem: "confirmatory" observations can't disprove theory

- Example: I've only seen birds that fly :: all birds can fly
]

.pull-right[
**Deductive reasoning**

- Formalized by Karl Popper

- Theory -> predictions -> observations

- Based on *falsification*

- Example: All birds can fly :: penguins are birds :: penguins can fly
]

---
# stats and the scientific method

#### Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)
- Anecdotes  
- Correlations/visual analysis  
- Exploratory modeling (i.e., fishing)
    
---
# stats and the scientific method

#### Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation
- Formed from patterns  
- Should focus on mechanisms ("because", "controls", "adapted to") 
- Should be falsifiable  
- Ideally > 1 alternatives

---
# stats and the scientific method

#### Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation

3) Predictions
- If the hypothesis is true, what do you expect to see?  
- Focus on things **we can measure**
- More = better
- "associated", "correlated", "greater/less than"

---
# stats and the scientific method

#### Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation

3) Predictions

4) Data collection
- Can be observational but ideally manipulative experiment  
- Sampling must be *designed* to answer question

---
# stats and the scientific method

#### Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation

3) Predictions

4) Data collection

5) Models and testing  
- Model is mathematical abstraction of hypothesis
- Model used to "confront" hypothesis with data (via predictions)
- Draw conclusions: Does data support hypothesis?

---
# stats and the scientific method

#### Example
1) **Pattern**: Trees at higher elevations are shorter than at low elevations

2) **Hypotheses**

3) **Predictions**

4) **Data collection**1

5) **Models**1

.footnote[[1] We'll get to these!]

---
class: inverse, middle, center

# causal inference

---
# causal inference

#### Often, we want to know whether `$x$` influences `$y$`

- In other words, if we change `$x$`, will `$y$` change also (and by how much)?

- Harder than it seems! Why?

- Generally restricted to *manipulative* experiments

+ Well-designed experiments ensure that "treatment assignment is independent of the potential outcomes" (Gelman et al. 2021)

- Increasing interest in causal inference from observational studies

+ This is something we will discuss later in the semester
    
---
class: inverse, middle, center

# uncertainty

---
# populations vs samples

#### **Hypothesis**: New plant variety is more disease resistant than current variety

#### **Prediction**: Disease prevalence is lower in new variety than in current variety

### How can we determine whether the hypothesis is true?

---
class:inverse

# populations vs samples

#### Population  
- A collection of subjects of interest

- Often, a biologically meaningful unit

- Sometimes a process of interest

### Question: What is the population in our example?

---
# population

#### Note:
1) This is the **population**

2) There is **variation** within and among groups, but:

3) The hypothesis is correct (mean prevalence is lower in new vs. current variety)

---
class:inverse

# populations vs samples

#### Population  
- A collection of subjects of interest

- Often, a biologically meaningful unit

- Sometimes a process of interest

#### Sample

- A finite subset of the population of interest, i.e. the data we collect

- Samples allow us to draw inferences about the population

- Good samples are:
    + Random  
    + Representative  
    + Sufficiently large

---
# sample

---
# sample

#### Note:
1) This is *a* **sample**

2) The sample means are our best estimates of the population means

3) But the sample means will never equal the population means (Uncertainty!)

---
# summary statistics

### Measures of central tendency

- Sample mean

`$$\large \bar{y} = \frac{\sum_{i=1}^n y_i}{n}$$`

--
- Median

--

- Mode

---
# summary statistics

### Measures of dispersion

- Sample variance

`$$\large s^2 = \frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-1}$$`

--
- Sample standard deviation

`$$\large s = \sqrt{s^2}$$`

--
- Range

---
# sampling error

Every sample has a different mean (and standard deviation) - more uncertainty!

---
# sampling = uncertainty

#### Because populations (usually) cannot be measured, sampling is essential

#### But sampling is inherently *stochastic*

- sampling produces uncertainty

- unavoidable (but that's ok!)

> Doubt is not a pleasant condition, but certainty is absurd -- Voltaire

#### Statistics is what allows us to learn about the **population** using **samples** in the face of **uncertainty**

- the primary goal of this class is for you to understand how to make robust inferences that account for uncertainty (and the limitations of those inferences)

- we will return to this basic concept (sampling error) many times this semester

---
# looking ahead

### **Next time**: Introduction to linear models

### **Reading**: [Fieberg chp. 1.2-1.4](https://statistics4ecologists-v1.netlify.app/linreg.html#data-example-sustainable-trophy-hunting-of-male-african-lions)