Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

LECTURE 1: introduction to statistics

FANR 6750 (Experimental design)




Fall 2024

1 / 25

outline


1) What is statistics?


2 / 25

outline


1) What is statistics?


2) Statistics and the scientific method


2 / 25

outline


1) What is statistics?


2) Statistics and the scientific method


3) Populations vs. samples


2 / 25

what is statistics?



The study of the collection, analysis, interpretation, presentation, and organization of data (Dodge 2006)


3 / 25

what is statistics?



The study of the collection, analysis, interpretation, presentation, and organization of data (Dodge 2006)



The science of learning from data in the face of uncertainty (various)

3 / 25

what is statistics?



The study of the collection, analysis, interpretation, presentation, and organization of data (Dodge 2006)



The science of learning from data in the face of uncertainty (various)


Statistics=Information+Uncertainty

3 / 25

why do we need statistics?

Common tasks

4 / 25

why do we need statistics?

Common tasks

  • Identify relationships between variables

4 / 25

why do we need statistics?

Common tasks

  • Identify relationships between variables

  • Estimate unknown parameters

4 / 25

why do we need statistics?

Common tasks

  • Identify relationships between variables

  • Estimate unknown parameters

  • Test hypotheses

4 / 25

why do we need statistics?

Common tasks

  • Identify relationships between variables

  • Estimate unknown parameters

  • Test hypotheses

  • Describe stochastic systems

4 / 25

why do we need statistics?

Common tasks

  • Identify relationships between variables

  • Estimate unknown parameters

  • Test hypotheses

  • Describe stochastic systems

  • Make predictions that account for uncertainty
4 / 25

stats and the scientific method

Ways of learnings

5 / 25

stats and the scientific method

Ways of learnings

Inductive reasoning

  • Often attributed to Francis Bacon (and others)

  • Consistent observations -> general principle

  • Problem: "confirmatory" observations can't disprove theory

  • Example: I've only seen birds that fly :: all birds can fly

5 / 25

stats and the scientific method

Ways of learnings

Inductive reasoning

  • Often attributed to Francis Bacon (and others)

  • Consistent observations -> general principle

  • Problem: "confirmatory" observations can't disprove theory

  • Example: I've only seen birds that fly :: all birds can fly

Deductive reasoning

  • Formalized by Karl Popper

  • Theory -> predictions -> observations

  • Based on falsification

  • Example: All birds can fly :: penguins are birds :: penguins can fly

5 / 25

stats and the scientific method

Ways of learnings (real world)

6 / 25

stats and the scientific method

Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

  • Anecdotes
  • Correlations/visual analysis
  • Exploratory modeling (i.e., fishing)
6 / 25

stats and the scientific method

Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation

  • Formed from patterns
  • Should focus on mechanisms ("because", "controls", "adapted to")
  • Should be falsifiable
  • Ideally > 1 alternatives
7 / 25

stats and the scientific method

Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation

3) Predictions

  • If the hypothesis is true, what do you expect to see?
  • Focus on things we can measure
  • More = better
  • "associated", "correlated", "greater/less than"
8 / 25

stats and the scientific method

Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation

3) Predictions

4) Data collection

  • Can be observational but ideally manipulative experiment
  • Sampling must be designed to answer question
9 / 25

stats and the scientific method

Ways of learnings (real world)

1) Pattern identification (i.e., exploratory studies)

2) Hypothesis formation

3) Predictions

4) Data collection

5) Models and testing

  • Model is mathematical abstraction of hypothesis
  • Model used to "confront" hypothesis with data (via predictions)
  • Draw conclusions: Does data support hypothesis?
10 / 25

stats and the scientific method

Example

1) Pattern: Trees at higher elevations are shorter than at low elevations

11 / 25

stats and the scientific method

Example

1) Pattern: Trees at higher elevations are shorter than at low elevations

2) Hypotheses

11 / 25

stats and the scientific method

Example

1) Pattern: Trees at higher elevations are shorter than at low elevations

2) Hypotheses

3) Predictions

11 / 25

stats and the scientific method

Example

1) Pattern: Trees at higher elevations are shorter than at low elevations

2) Hypotheses

3) Predictions

4) Data collection1

5) Models1

[1] We'll get to these!

11 / 25

causal inference

12 / 25

causal inference

Often, we want to know whether x influences y

13 / 25

causal inference

Often, we want to know whether x influences y

  • In other words, if we change x, will y change also (and by how much)?
13 / 25

causal inference

Often, we want to know whether x influences y

  • In other words, if we change x, will y change also (and by how much)?

  • Harder than it seems! Why?

13 / 25

causal inference

Often, we want to know whether x influences y

  • In other words, if we change x, will y change also (and by how much)?

  • Harder than it seems! Why?

  • Generally restricted to manipulative experiments

13 / 25

causal inference

Often, we want to know whether x influences y

  • In other words, if we change x, will y change also (and by how much)?

  • Harder than it seems! Why?

  • Generally restricted to manipulative experiments

    • Well-designed experiments ensure that "treatment assignment is independent of the potential outcomes" (Gelman et al. 2021)
13 / 25

causal inference

Often, we want to know whether x influences y

  • In other words, if we change x, will y change also (and by how much)?

  • Harder than it seems! Why?

  • Generally restricted to manipulative experiments

    • Well-designed experiments ensure that "treatment assignment is independent of the potential outcomes" (Gelman et al. 2021)
  • Increasing interest in causal inference from observational studies

    • This is something we may discuss later in the semester
13 / 25

uncertainty

14 / 25

populations vs samples

Hypothesis: New plant variety is more disease resistant than current variety

15 / 25

populations vs samples

Hypothesis: New plant variety is more disease resistant than current variety

Prediction: Disease prevalence is lower in new variety than in current variety

15 / 25

populations vs samples

Hypothesis: New plant variety is more disease resistant than current variety

Prediction: Disease prevalence is lower in new variety than in current variety

How can we determine whether the hypothesis is true?

15 / 25

populations vs samples

Population

  • A collection of subjects of interest

  • Often, a biologically meaningful unit

  • Sometimes a process of interest

16 / 25

populations vs samples

Population

  • A collection of subjects of interest

  • Often, a biologically meaningful unit

  • Sometimes a process of interest

Question: What is the population in our example?

16 / 25

population

Note:

1) This is the population

2) There is variation within and among groups, but:

3) The hypothesis is correct (mean prevalence is lower in new vs. current variety)

17 / 25

populations vs samples

Population

  • A collection of subjects of interest

  • Often, a biologically meaningful unit

  • Sometimes a process of interest

18 / 25

populations vs samples

Population

  • A collection of subjects of interest

  • Often, a biologically meaningful unit

  • Sometimes a process of interest

Sample

  • A finite subset of the population of interest, i.e. the data we collect

  • Samples allow us to draw inferences about the population

  • Good samples are:

    • Random
    • Representative
    • Sufficiently large
18 / 25

sample

19 / 25

sample

20 / 25

sample

Note:

1) This is a sample

2) The sample means are our best estimates of the population means

3) But the sample means will never equal the population means (uncertainty!)

20 / 25

summary statistics

Measures of central tendency

  • Sample mean

ˉy=ni=1yin


21 / 25

summary statistics

Measures of central tendency

  • Sample mean

ˉy=ni=1yin


  • Median


21 / 25

summary statistics

Measures of central tendency

  • Sample mean

ˉy=ni=1yin


  • Median


  • Mode
21 / 25

summary statistics

Measures of dispersion

  • Sample variance

s2=ni=1(yiˉy)2n1


22 / 25

summary statistics

Measures of dispersion

  • Sample variance

s2=ni=1(yiˉy)2n1


  • Sample standard deviation

s=s2


22 / 25

summary statistics

Measures of dispersion

  • Sample variance

s2=ni=1(yiˉy)2n1


  • Sample standard deviation

s=s2


  • Range
22 / 25

sampling error

Every sample has a different mean (and standard deviation) - more uncertainty!

23 / 25

sampling = uncertainty

Because populations (usually) cannot be measured, sampling is essential

24 / 25

sampling = uncertainty

Because populations (usually) cannot be measured, sampling is essential

But sampling is inherently stochastic

  • sampling produces uncertainty

  • unavoidable (but that's ok!)

Doubt is not a pleasant condition, but certainty is absurd -- Voltaire

24 / 25

sampling = uncertainty

Because populations (usually) cannot be measured, sampling is essential

But sampling is inherently stochastic

  • sampling produces uncertainty

  • unavoidable (but that's ok!)

Doubt is not a pleasant condition, but certainty is absurd -- Voltaire

Statistics is what allows us to learn about the population using samples in the face of uncertainty

  • the primary goal of this class is for you to understand how to make robust inferences that account for uncertainty (and the limitations of those inferences)

  • we will return to this basic concept (sampling error) many times this semester

24 / 25

looking ahead


Next time: Introduction to linear models


Reading: Fieberg chp. 1.2-1.4

25 / 25

outline


1) What is statistics?


2 / 25
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow