Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

LECTURE 18: logistic regression

FANR 6750 (Experimental design)




Fall 2023

1 / 17

outline

1) Motivation


2 / 17

outline

1) Motivation


2) Model


2 / 17

outline

1) Motivation


2) Model


3) Model fitting


2 / 17

outline

1) Motivation


2) Model


3) Model fitting


4) Model interpretation

2 / 17

logistic regression

Logistic regression is a specific type of GLM in which the response variable follows a binomial distribution and the link function is (usually) the logit

3 / 17

logistic regression

Logistic regression is a specific type of GLM in which the response variable follows a binomial distribution and the link function is (usually) the logit

Logistic regression is used to model a binary response variable as a function of explanatory variables

3 / 17

logistic regression

Logistic regression is a specific type of GLM in which the response variable follows a binomial distribution and the link function is (usually) the logit

Logistic regression is used to model a binary response variable as a function of explanatory variables

Examples:

  • Is presence of a focal species related to habitat type?

  • Is survival a function of body condition?

  • Is disease prevalence related to population density?

3 / 17

logistic regression


yiBernoulli(pi)

log(pi1pi)=logit(pi)=β0+β1xi1+β2xi2+...

4 / 17

logistic regression


yiBernoulli(pi)

log(pi1pi)=logit(pi)=β0+β1xi1+β2xi2+...

where:

  • yi is the response (coded as 0/1)

  • pi is the probability of success for sample unit i

  • pi1pi is the odds of success

  • log(pi1pi) is the log odds

4 / 17

logit link

The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( to )

log(0.510.5)=log(1)=0

5 / 17

logit link

The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( to )

log(0.510.5)=log(1)=0 The inverse logit eμ1+eμ allows us to transform the linear predictor to the probability scale

e01+e0=12

5 / 17

logit link

The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( to )

log(0.510.5)=log(1)=0 The inverse logit eμ1+eμ allows us to transform the linear predictor to the probability scale

e01+e0=12

The odds ratio p1p varies from 0 to and indicate the probability of success relative to the probability of failure

5 / 17

logit link

The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( to )

log(0.510.5)=log(1)=0 The inverse logit eμ1+eμ allows us to transform the linear predictor to the probability scale

e01+e0=12

The odds ratio p1p varies from 0 to and indicate the probability of success relative to the probability of failure

  • 1:2 odds = 0.33/(10.33) = failure is twice as likely as success

  • 4:1 odds = 0.8/(10.8) = success if four times a likely as failure

5 / 17

logit link


log odds = log(p/(1-p)) -2.20 -1.10 0.0 1.39 4.60
odds = p/(1-p) 0.11 0.33 1.0 4.00 99.00
p 0.10 0.25 0.5 0.80 0.99
6 / 17

logit link


log odds = log(p/(1-p)) -2.20 -1.10 0.0 1.39 4.60
odds = p/(1-p) 0.11 0.33 1.0 4.00 99.00
p 0.10 0.25 0.5 0.80 0.99


6 / 17

example

Imagine we are interested in the effects of elevation and habitat on the probability of occurrence for a rare orchid

data("orchidata")
head(orchiddata, n=12)
## presence abundance elevation habitat
## 1 0 0 58 Oak
## 2 1 7 191 Oak
## 3 0 0 43 Oak
## 4 1 11 374 Oak
## 5 1 11 337 Oak
## 6 1 1 64 Oak
## 7 1 4 195 Oak
## 8 1 6 263 Oak
## 9 0 0 181 Oak
## 10 1 1 59 Oak
## 11 1 50 489 Maple
## 12 1 5 317 Maple
7 / 17

raw data

ggplot(orchiddata, aes(x = elevation, y = presence)) +
geom_point() +
scale_y_continuous("Orchid Occurrence") + scale_x_continuous("Elevation")

8 / 17

raw data

library(dplyr)
orchiddata %>% group_by(habitat) %>% summarise(group.prob = mean(presence)) %>%
ggplot(., aes(x = habitat, y = group.prob)) +
geom_col(fill = "grey70", color = "black") +
scale_y_continuous("Proportion of sites with orchids") + scale_x_discrete("Habitat")

9 / 17

the glm function


fm1 <- glm(presence ~ habitat + elevation,
family=binomial(link="logit"),
data = orchiddata)
broom::tidy(fm1)


term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
10 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
11 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
  • The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996))
11 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
  • The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996))

  • The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09))

11 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
  • The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996))

  • The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09))

    • The odds of occurrence in oak (relative to maple) are exp(-0.09) = 0.91
11 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
  • The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996))

  • The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09))

    • The odds of occurrence in oak (relative to maple) are exp(-0.09) = 0.91
  • The probability of occurrence in pine-dominated forest at sea level = 0.21 (plogis(-0.996 - 0.34))

11 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
  • The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996))

  • The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09))

    • The odds of occurrence in oak (relative to maple) are exp(-0.09) = 0.91
  • The probability of occurrence in pine-dominated forest at sea level = 0.21 (plogis(-0.996 - 0.34))

    • The odds of occurrence in pine (relative to maple) are exp(-0.34) = 0.71
11 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231
12 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231

What about elevation?

12 / 17

intepreting parameters

term estimate std.error statistic p.value
(Intercept) -0.9960 1.217 -0.8184 0.4131
habitatOak -0.0968 1.367 -0.0708 0.9436
habitatPine -0.3372 1.382 -0.2441 0.8072
elevation 0.0137 0.006 2.2723 0.0231

What about elevation?

First, let's visualize the fitted relationship

12 / 17

example


13 / 17

example

What's the change in probability of occurrence at 1m vs 0m elevation (in maple habitat)?

  • plogis(-0.996 + 0.014 * 1) - plogis(-0.996 + 0.014 * 0) = 0.002
14 / 17

example

What's the change in probability of occurrence at 1m vs 0m elevation (in maple habitat)?

  • plogis(-0.996 + 0.014 * 1) - plogis(-0.996 + 0.014 * 0) = 0.002

What's the change in probability of occurrence at 101m vs 100m elevation ?

  • plogis(-0.996 + 0.014 * 101) - plogis(-0.996 + 0.014 * 100) = 0.004
14 / 17

example

What's the change in probability of occurrence at 1m vs 0m elevation (in maple habitat)?

  • plogis(-0.996 + 0.014 * 1) - plogis(-0.996 + 0.014 * 0) = 0.002

What's the change in probability of occurrence at 101m vs 100m elevation ?

  • plogis(-0.996 + 0.014 * 101) - plogis(-0.996 + 0.014 * 100) = 0.004

Change in probability is not linear

14 / 17

example

Change in probability is not linear.

15 / 17

example

Change in probability is not linear.

But the change is odds is

  • Odds ratio: p21p2/p11p1
p1 <- plogis(-0.996 + 0.014 * 0)
p2 <- plogis(-0.996 + 0.014 * 1)
(OR1 <- (p2/(1-p2))/(p1/(1-p1)))
## [1] 1.014
p3 <- plogis(-0.996 + 0.014 * 400)
p4 <- plogis(-0.996 + 0.014 * 401)
(OR2 <- (p4/(1-p4))/(p3/(1-p3)))
## [1] 1.014
15 / 17

example

Change in probability is not linear.

But the change is odds is

  • Odds ratio: p21p2/p11p1
p1 <- plogis(-0.996 + 0.014 * 0)
p2 <- plogis(-0.996 + 0.014 * 1)
(OR1 <- (p2/(1-p2))/(p1/(1-p1)))
## [1] 1.014
p3 <- plogis(-0.996 + 0.014 * 400)
p4 <- plogis(-0.996 + 0.014 * 401)
(OR2 <- (p4/(1-p4))/(p3/(1-p3)))
## [1] 1.014

What is the change in odds for one unit change in elevation?

15 / 17

example

Change in probability is not linear.

But the change is odds is

  • Odds ratio: p21p2/p11p1
p1 <- plogis(-0.996 + 0.014 * 0)
p2 <- plogis(-0.996 + 0.014 * 1)
(OR1 <- (p2/(1-p2))/(p1/(1-p1)))
## [1] 1.014
p3 <- plogis(-0.996 + 0.014 * 400)
p4 <- plogis(-0.996 + 0.014 * 401)
(OR2 <- (p4/(1-p4))/(p3/(1-p3)))
## [1] 1.014

What is the change in odds for one unit change in elevation?

  • eβElev = exp(0.014) = 1.0141
15 / 17

summary

Logistic regression is used when we want to model a binary response as a function of explanatory variables

16 / 17

summary

Logistic regression is used when we want to model a binary response as a function of explanatory variables

The response yi is modeled as arising from a Bernoulli distribution with probability pi and the logit link is used to map the linear predictor to the probability scale

16 / 17

summary

Logistic regression is used when we want to model a binary response as a function of explanatory variables

The response yi is modeled as arising from a Bernoulli distribution with probability pi and the logit link is used to map the linear predictor to the probability scale

All of the linear modeling concepts we have learned this semester (continuous/categorical explanatory variables, multiple regression, interactions, random effects) can be used within the logistic regression framework

16 / 17

summary

Logistic regression is used when we want to model a binary response as a function of explanatory variables

The response yi is modeled as arising from a Bernoulli distribution with probability pi and the logit link is used to map the linear predictor to the probability scale

All of the linear modeling concepts we have learned this semester (continuous/categorical explanatory variables, multiple regression, interactions, random effects) can be used within the logistic regression framework

Parameter estimates from the logistic regression measure the change in log odds for one unit change in the explanatory variables

  • Log odds are constant across all values of x, but the probabilities are not
16 / 17

looking ahead


Next time: Poisson regression


Reading: Fieberg chp. 15

17 / 17

outline

1) Motivation


2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow