1) Motivation
1) Motivation
2) Model
1) Motivation
2) Model
3) Model fitting
1) Motivation
2) Model
3) Model fitting
4) Model interpretation
Logistic regression is a specific type of GLM in which the response variable follows a binomial distribution and the link function is (usually) the logit
Logistic regression is a specific type of GLM in which the response variable follows a binomial distribution and the link function is (usually) the logit
Logistic regression is used to model a binary response variable as a function of explanatory variables
Logistic regression is a specific type of GLM in which the response variable follows a binomial distribution and the link function is (usually) the logit
Logistic regression is used to model a binary response variable as a function of explanatory variables
Is presence of a focal species related to habitat type?
Is survival a function of body condition?
Is disease prevalence related to population density?
yi∼Bernoulli(pi)
log(pi1−pi)=logit(pi)=β0+β1xi1+β2xi2+...
yi∼Bernoulli(pi)
log(pi1−pi)=logit(pi)=β0+β1xi1+β2xi2+...
yi is the response (coded as 0/1)
pi is the probability of success for sample unit i
pi1−pi is the odds of success
log(pi1−pi) is the log odds
The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( −∞ to ∞)
log(0.51−0.5)=log(1)=0
The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( −∞ to ∞)
log(0.51−0.5)=log(1)=0 The inverse logit eμ1+eμ allows us to transform the linear predictor to the probability scale
e01+e0=12
The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( −∞ to ∞)
log(0.51−0.5)=log(1)=0 The inverse logit eμ1+eμ allows us to transform the linear predictor to the probability scale
e01+e0=12
The odds ratio p1−p varies from 0 to ∞ and indicate the probability of success relative to the probability of failure
The logit link allows us to transform the probability pi (constrained to 0 to 1) to the real scale ( −∞ to ∞)
log(0.51−0.5)=log(1)=0 The inverse logit eμ1+eμ allows us to transform the linear predictor to the probability scale
e01+e0=12
The odds ratio p1−p varies from 0 to ∞ and indicate the probability of success relative to the probability of failure
1:2 odds = 0.33/(1−0.33) = failure is twice as likely as success
4:1 odds = 0.8/(1−0.8) = success if four times a likely as failure
log odds = log(p/(1-p)) | -2.20 | -1.10 | 0.0 | 1.39 | 4.60 |
odds = p/(1-p) | 0.11 | 0.33 | 1.0 | 4.00 | 99.00 |
p | 0.10 | 0.25 | 0.5 | 0.80 | 0.99 |
log odds = log(p/(1-p)) | -2.20 | -1.10 | 0.0 | 1.39 | 4.60 |
odds = p/(1-p) | 0.11 | 0.33 | 1.0 | 4.00 | 99.00 |
p | 0.10 | 0.25 | 0.5 | 0.80 | 0.99 |
Imagine we are interested in the effects of elevation and habitat on the probability of occurrence for a rare orchid
data("orchidata")head(orchiddata, n=12)
## presence abundance elevation habitat## 1 0 0 58 Oak## 2 1 7 191 Oak## 3 0 0 43 Oak## 4 1 11 374 Oak## 5 1 11 337 Oak## 6 1 1 64 Oak## 7 1 4 195 Oak## 8 1 6 263 Oak## 9 0 0 181 Oak## 10 1 1 59 Oak## 11 1 50 489 Maple## 12 1 5 317 Maple
ggplot(orchiddata, aes(x = elevation, y = presence)) + geom_point() + scale_y_continuous("Orchid Occurrence") + scale_x_continuous("Elevation")
library(dplyr)orchiddata %>% group_by(habitat) %>% summarise(group.prob = mean(presence)) %>% ggplot(., aes(x = habitat, y = group.prob)) + geom_col(fill = "grey70", color = "black") + scale_y_continuous("Proportion of sites with orchids") + scale_x_discrete("Habitat")
glm
functionfm1 <- glm(presence ~ habitat + elevation, family=binomial(link="logit"), data = orchiddata)broom::tidy(fm1)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
plogis(-0.996)
)term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996)
)
The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09)
)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996)
)
The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09)
)
exp(-0.09)
= 0.91term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996)
)
The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09)
)
exp(-0.09)
= 0.91The probability of occurrence in pine-dominated forest at sea level = 0.21 (plogis(-0.996 - 0.34)
)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
The probability of occurrence in maple-dominated forest at sea level = 0.27 (plogis(-0.996)
)
The probability of occurrence in oak-dominated forest at sea level = 0.25 (plogis(-0.996 - 0.09)
)
exp(-0.09)
= 0.91The probability of occurrence in pine-dominated forest at sea level = 0.21 (plogis(-0.996 - 0.34)
)
exp(-0.34)
= 0.71term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
What about elevation?
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.9960 | 1.217 | -0.8184 | 0.4131 |
habitatOak | -0.0968 | 1.367 | -0.0708 | 0.9436 |
habitatPine | -0.3372 | 1.382 | -0.2441 | 0.8072 |
elevation | 0.0137 | 0.006 | 2.2723 | 0.0231 |
What about elevation?
First, let's visualize the fitted relationship
What's the change in probability of occurrence at 1m vs 0m elevation (in maple habitat)?
plogis(-0.996 + 0.014 * 1) - plogis(-0.996 + 0.014 * 0)
= 0.002What's the change in probability of occurrence at 1m vs 0m elevation (in maple habitat)?
plogis(-0.996 + 0.014 * 1) - plogis(-0.996 + 0.014 * 0)
= 0.002What's the change in probability of occurrence at 101m vs 100m elevation ?
plogis(-0.996 + 0.014 * 101) - plogis(-0.996 + 0.014 * 100)
= 0.004What's the change in probability of occurrence at 1m vs 0m elevation (in maple habitat)?
plogis(-0.996 + 0.014 * 1) - plogis(-0.996 + 0.014 * 0)
= 0.002What's the change in probability of occurrence at 101m vs 100m elevation ?
plogis(-0.996 + 0.014 * 101) - plogis(-0.996 + 0.014 * 100)
= 0.004Change in probability is not linear
Change in probability is not linear.
Change in probability is not linear.
But the change is odds is
p1 <- plogis(-0.996 + 0.014 * 0)p2 <- plogis(-0.996 + 0.014 * 1)(OR1 <- (p2/(1-p2))/(p1/(1-p1)))
## [1] 1.014
p3 <- plogis(-0.996 + 0.014 * 400)p4 <- plogis(-0.996 + 0.014 * 401)(OR2 <- (p4/(1-p4))/(p3/(1-p3)))
## [1] 1.014
Change in probability is not linear.
But the change is odds is
p1 <- plogis(-0.996 + 0.014 * 0)p2 <- plogis(-0.996 + 0.014 * 1)(OR1 <- (p2/(1-p2))/(p1/(1-p1)))
## [1] 1.014
p3 <- plogis(-0.996 + 0.014 * 400)p4 <- plogis(-0.996 + 0.014 * 401)(OR2 <- (p4/(1-p4))/(p3/(1-p3)))
## [1] 1.014
What is the change in odds for one unit change in elevation?
Change in probability is not linear.
But the change is odds is
p1 <- plogis(-0.996 + 0.014 * 0)p2 <- plogis(-0.996 + 0.014 * 1)(OR1 <- (p2/(1-p2))/(p1/(1-p1)))
## [1] 1.014
p3 <- plogis(-0.996 + 0.014 * 400)p4 <- plogis(-0.996 + 0.014 * 401)(OR2 <- (p4/(1-p4))/(p3/(1-p3)))
## [1] 1.014
What is the change in odds for one unit change in elevation?
exp(0.014)
= 1.0141Logistic regression is used when we want to model a binary response as a function of explanatory variables
Logistic regression is used when we want to model a binary response as a function of explanatory variables
The response yi is modeled as arising from a Bernoulli distribution with probability pi and the logit link is used to map the linear predictor to the probability scale
Logistic regression is used when we want to model a binary response as a function of explanatory variables
The response yi is modeled as arising from a Bernoulli distribution with probability pi and the logit link is used to map the linear predictor to the probability scale
All of the linear modeling concepts we have learned this semester (continuous/categorical explanatory variables, multiple regression, interactions, random effects) can be used within the logistic regression framework
Logistic regression is used when we want to model a binary response as a function of explanatory variables
The response yi is modeled as arising from a Bernoulli distribution with probability pi and the logit link is used to map the linear predictor to the probability scale
All of the linear modeling concepts we have learned this semester (continuous/categorical explanatory variables, multiple regression, interactions, random effects) can be used within the logistic regression framework
Parameter estimates from the logistic regression measure the change in log odds for one unit change in the explanatory variables
1) Motivation
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |