Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide

Lecture 2

Probability refresher (or introduction)




WILD6900 (Spring 2021)

1 / 59

Readings:

Hobbs & Hooten 29-70

2 / 59

Warning: The material presented in this lecture is tedious. But the concepts in this lecture are critical to everything that will follow in this course. So push through and try your best to understand these topics. You do not need to be an expert in probability at the end of this lecture - we will reinforce these concepts over and over again throughout the semester - but getting the gist now will help you grasp other topics as we move forward

3 / 59

Stochasticity and uncertainty in ecological models

In each level of our models, we differentiate between:

  • a deterministic model g(), and

  • a stochastic model [a|b,c]

4 / 59

Stochasticity and uncertainty in ecological models

In each level of our models, we differentiate between:

  • a deterministic model g(), and

  • a stochastic model [a|b,c]

The deterministic portion of the model contains no uncertainty 1

4 / 59

Stochasticity and uncertainty in ecological models

In each level of our models, we differentiate between:

  • a deterministic model g(), and

  • a stochastic model [a|b,c]

The deterministic portion of the model contains no uncertainty 1

Stochastic processes are different:

  • given an input, the model will not always return the same answer

  • the output of stochastic processes are uncertain

  • Even though stochastic processes are inherently uncertain, they are not unpredictable.

4 / 59

1 Given a certain input, the deterministic model will always return the same answer.

Stochasticity and uncertainty in ecological models

In Bayesian models, all unobserved quantities are treated as a random variables, that is they can take on different values due to chance (i.e., stochastic)
Each random variable in our model is governed by a probability distribution 2
5 / 59

Stochasticity and uncertainty in ecological models

In Bayesian models, all unobserved quantities are treated as a random variables, that is they can take on different values due to chance (i.e., stochastic)
Each random variable in our model is governed by a probability distribution 2

Our goal is to use our data to learn about those distributions

5 / 59

2 Because probability distributions for the basis of Bayesian methods, a good understanding of probability is critical to everything that will follow.

Probability

Uncertain events are not necessarily unpredictable

6 / 59

Probability

Uncertain events are not necessarily unpredictable

Probability allows us to summarize how likely each possible value of a random variable is to occur

6 / 59

Sample space

For any given random variable, the sample space S includes all of the possible values the variable can take

For example, for an single-species occupancy model, S would be present or absent. For a model of species abundance, S would be 0,1,2,3,...,.

7 / 59

For a random process to truly be a probability, the sum of the probabilities of all must equal 1: ni=1Pr(Ai)=1 (if the outcomes are continuous we have to take the integral instead of the sum).

Sample space

Example

Imagine an occupancy model in which we want to know if species x is present at a given location
We will denote the occupancy status zx and the sample space includes just two possible values:

Szx={0,1}

8 / 59

Probability of single events

9 / 59

Probability of single events

The probability of A is the area of A divided by the area of S 3

Pr(A)=areaofAareaofS

10 / 59

3 Within the sample space, we can define a smaller polygon A which represents one possible outcome

A is smaller than S because it does not contain all possible outcomes, just a subset.

What is the probability that A does not occur? It's the area outside of A:

Pr(notA)=areaofSareaofAareaofS=1areaofAareaofS=1Pr(A)

Probability of single events

Example

In our example, let's say that the probability of occupancy for species x is:

Pr(zx=1)=0.4

11 / 59

Probability of single events

Example

In our example, let's say that the probability of occupancy for species x is:

Pr(zx=1)=0.4

This means that the probability that the site in not occupied is:

11 / 59

Probability of single events

Example

In our example, let's say that the probability of occupancy for species x is:

Pr(zx=1)=0.4

This means that the probability that the site in not occupied is:

Pr(zx=0)=10.4=0.6

11 / 59

Probability of multiple events

12 / 59

Probability of multiple events

Often, we are not interested in the probability of a single event happening but instead of more than one events

13 / 59

Probability of multiple events

Often, we are not interested in the probability of a single event happening but instead of more than one events

The joint probability refers to the probability that two or more events occur and is usually denoted Pr(A,B) 4

13 / 59

4 Estimating joint probabilities is more challenging than estimating the probability of single events but is critical to understanding the logic behind Bayesian methods

Probability of multiple events

Example

To extend our simple example, let's imagine we are interested in the occupancy status of two species - x and y. Our sample space is now:

Szx,zy={(0,0),(0,1),(1,0),(1,1)}

14 / 59

Probability of multiple events

Example

To extend our simple example, let's imagine we are interested in the occupancy status of two species - x and y. Our sample space is now:

Szx,zy={(0,0),(0,1),(1,0),(1,1)} The question we want to know now is: what is the probability that a site is occupied by both species:

Pr(zx=1,zy=1)=Pr(zx,zy)

14 / 59

Probability of multiple events

Example

To extend our simple example, let's imagine we are interested in the occupancy status of two species - x and y. Our sample space is now:

Szx,zy={(0,0),(0,1),(1,0),(1,1)} The question we want to know now is: what is the probability that a site is occupied by both species:

Pr(zx=1,zy=1)=Pr(zx,zy)
The answer to that question depends on the relationship between Pr(zx) and Pr(zy)

14 / 59

Conditional probability

15 / 59

Conditional probability

In some cases, knowing the status of one random variable tells us something about the status of another random variable

Let's say we know that species x is present, that is zx=1

Knowing that zx=1 does two things:

16 / 59

Conditional probability

In some cases, knowing the status of one random variable tells us something about the status of another random variable

Let's say we know that species x is present, that is zx=1

Knowing that zx=1 does two things:

1) It shrinks the possible range of sample space (if zx=1 occurred, the remainder of our sample space (in this case zx=0) did not occur)

16 / 59

Conditional probability

In some cases, knowing the status of one random variable tells us something about the status of another random variable

Let's say we know that species x is present, that is zx=1

Knowing that zx=1 does two things:

1) It shrinks the possible range of sample space (if zx=1 occurred, the remainder of our sample space (in this case zx=0) did not occur)

2) It effectively shrinks the area zy - we know that the area of zy outside of zx didn't occur

16 / 59

Conditional probability

In some cases, knowing the status of one random variable tells us something about the status of another random variable

Let's say we know that species x is present, that is zx=1

Knowing that zx=1 does two things:

1) It shrinks the possible range of sample space (if zx=1 occurred, the remainder of our sample space (in this case zx=0) did not occur)

2) It effectively shrinks the area zy - we know that the area of zy outside of zx didn't occur

You can see this very clearly in this awesome visualization

16 / 59

Conditional probability

Pr(zy|zx) is the area shared by the two events divided by the area of zx (not S!) 5

Pr(zy|zx)=areasharedbyzxandzyareaofzx=Pr(zxzy)Pr(zx)

likewise,

Pr(zx|zy)=Pr(zxzy)Pr(zy)

17 / 59

5 Read Pr(zy|zx) as "the probability of zy conditional on zx"

means "intersection" and it is the area shared by both A and B

Conditional probability

For conditional events, the joint probability is:

Pr(zy,zx)=Pr(zy|zx)Pr(zx)=Pr(zx|zy)Pr(zy)

18 / 59

Probability of independent events

In some cases, the probability of one event occurring is independent of whether or not the other event occurs 6

19 / 59

Probability of independent events

In some cases, the probability of one event occurring is independent of whether or not the other event occurs 6
In our example, the occupancy of the two species may be totally unrelated

  • if they occur together, it happens by complete chance 7
19 / 59

Probability of independent events

In some cases, the probability of one event occurring is independent of whether or not the other event occurs 6
In our example, the occupancy of the two species may be totally unrelated

  • if they occur together, it happens by complete chance 7

In this case, knowing that zx=1 gives us no new information about the probability of zy=1

Mathematically, this means that:

Pr(zy|zx)=Pr(zy)

and

Pr(zx|zy)=Pr(zx)

19 / 59

Probability of independent events

In some cases, the probability of one event occurring is independent of whether or not the other event occurs 6
In our example, the occupancy of the two species may be totally unrelated

  • if they occur together, it happens by complete chance 7

In this case, knowing that zx=1 gives us no new information about the probability of zy=1

Mathematically, this means that:

Pr(zy|zx)=Pr(zy)

and

Pr(zx|zy)=Pr(zx) Thus,

Pr(zx,zy)=Pr(zx)Pr(zy)

19 / 59

6For example, the probability of a coin flip being heads is not dependent on whether or not the previous flip was heads.

7This maybe unlikely since even if they don't interact with each other, habitat preferences alone might lead to non-independence but we'll discuss that in a more detail shortly

Disjoint events

A special case of conditional probability occurs when events are disjoint

20 / 59

Disjoint events

A special case of conditional probability occurs when events are disjoint

In our example, maybe species x and species y never occur together 8

20 / 59

Disjoint events

A special case of conditional probability occurs when events are disjoint

In our example, maybe species x and species y never occur together 8

In this case, knowing that zx=1 means that zy=0. In other words,

Pr(zy|zx)=Pr(zx|zy)=0

20 / 59

8Perhaps they are such fierce competitors that they will exclude each other from their territories

Probability of one event or the other

In some cases, we might want to know the probability that one event or the other occurs

21 / 59

Probability of one event or the other

In some cases, we might want to know the probability that one event or the other occurs

For example, what is the probability that species x or species y is present but not both?

21 / 59

Probability of one event or the other

In some cases, we might want to know the probability that one event or the other occurs

For example, what is the probability that species x or species y is present but not both?

This is the area in zx and zy not including the area of overlap:

Pr(zxzy)=Pr(zx)+Pr(zy)Pr(zx,zy)

21 / 59

Probability of one event or the other

When zx and zy are independent,

Pr(zxzy)=Pr(zx)+Pr(zy)Pr(zx)Pr(zy)

22 / 59

Probability of one event or the other

When zx and zy are independent,

Pr(zxzy)=Pr(zx)+Pr(zy)Pr(zx)Pr(zy) If they are conditional,

Pr(zxzy)=Pr(zx)+Pr(zy)Pr(zx|zy)Pr(zy)=Pr(zx)+Pr(zy)Pr(zy|zx)Pr(zx)

22 / 59

Probability of one event or the other

When zx and zy are independent,

Pr(zxzy)=Pr(zx)+Pr(zy)Pr(zx)Pr(zy) If they are conditional,

Pr(zxzy)=Pr(zx)+Pr(zy)Pr(zx|zy)Pr(zy)=Pr(zx)+Pr(zy)Pr(zy|zx)Pr(zx) If they are disjoint,

Pr(zxzy)=Pr(zx)+Pr(zy)

22 / 59

Marginal probability

A critical concept in Bayesian models is marginal probability, that is the probability of one event happening regardless of the state of other events

23 / 59

Marginal probability

A critical concept in Bayesian models is marginal probability, that is the probability of one event happening regardless of the state of other events
Imagine that our occupancy model includes the effect of 3 different habitats on the occupancy probability of species x, so:

Pr(zx|Hi)=Pr(zxHi)Pr(Hi)

23 / 59

Marginal probability

A critical concept in Bayesian models is marginal probability, that is the probability of one event happening regardless of the state of other events
Imagine that our occupancy model includes the effect of 3 different habitats on the occupancy probability of species x, so:

Pr(zx|Hi)=Pr(zxHi)Pr(Hi) What is the overall probability that species x occurs regardless of habitat type? That is, Pr(zx)?

23 / 59

Marginal probability

A critical concept in Bayesian models is marginal probability, that is the probability of one event happening regardless of the state of other events
Imagine that our occupancy model includes the effect of 3 different habitats on the occupancy probability of species x, so:

Pr(zx|Hi)=Pr(zxHi)Pr(Hi) What is the overall probability that species x occurs regardless of habitat type? That is, Pr(zx)?
In this case, we marginalize over the different habitat types by summing the conditional probabilities weighted by probability of each Hi:

Pr(zx)=3i=1Pr(zx|Hi)Pr(Hi)

23 / 59

Marginal probability

A critical concept in Bayesian models is marginal probability, that is the probability of one event happening regardless of the state of other events
Imagine that our occupancy model includes the effect of 3 different habitats on the occupancy probability of species x, so:

Pr(zx|Hi)=Pr(zxHi)Pr(Hi) What is the overall probability that species x occurs regardless of habitat type? That is, Pr(zx)?
In this case, we marginalize over the different habitat types by summing the conditional probabilities weighted by probability of each Hi:

Pr(zx)=3i=1Pr(zx|Hi)Pr(Hi) Think of this as a weighted average - the probability that zx=1 in each habitat type weighted by the probability that each habitat type occurs.

23 / 59

Marginal probability

24 / 59

Marginal probability

25 / 59

Marginal probability

H1 H2 H3 Total
Occupied 60 10 10 80
Unoccupied 20 70 250 340
Total 80 80 260 420
26 / 59

This is the reason random or stratified sampling is so important if you want to know Pr(z) - if you do not sample habitats in proportion to Pr(Hi), you will get biased estimates of Pr(z)!

Factoring joint probabilities

27 / 59

Factoring joint probabilities

Many of the models you will work with as an ecologist will contain multiple random variables

[z,θp,θo,σ2p,σ2s,σ2o,ui|yi][yi|d(Θo,ui),σ2o][ui|z,σ2s][z|g(θp,x),σ2p][θp][θo][σ2p][σ2s][σ2o]

28 / 59

Factoring joint probabilities

Many of the models you will work with as an ecologist will contain multiple random variables

[z,θp,θo,σ2p,σ2s,σ2o,ui|yi][yi|d(Θo,ui),σ2o][ui|z,σ2s][z|g(θp,x),σ2p][θp][θo][σ2p][σ2s][σ2o] The rules of probability allow us to express the complex joint probabilities as a series of more simple conditional probabilities

  • These concepts may feel a little abstract now but they will be very important when we learn about implementing Bayesian models
28 / 59

Factoring joint probabilities

Many of the models you will work with as an ecologist will contain multiple random variables

[z,θp,θo,σ2p,σ2s,σ2o,ui|yi][yi|d(Θo,ui),σ2o][ui|z,σ2s][z|g(θp,x),σ2p][θp][θo][σ2p][σ2s][σ2o] The rules of probability allow us to express the complex joint probabilities as a series of more simple conditional probabilities

  • These concepts may feel a little abstract now but they will be very important when we learn about implementing Bayesian models

Determining the dependencies between parameters in the models is aided by Bayesian network models

28 / 59

Bayesian networks

Bayesian networks graphically display the dependence among random variables

  • Random variables are nodes

  • Arrows point from parents to children

29 / 59

Bayesian networks

Bayesian networks graphically display the dependence among random variables 9

  • Children nodes are on the LHS of conditioning symbols

  • Parent nodes are on the RHS of conditioning symbols

  • Nodes without a parent are expressed unconditionally

30 / 59

Bayesian networks

Bayesian networks graphically display the dependence among random variables 9

  • Children nodes are on the LHS of conditioning symbols

  • Parent nodes are on the RHS of conditioning symbols

  • Nodes without a parent are expressed unconditionally

Pr(A,B)=Pr(A|B)Pr(B)

30 / 59

9 These rules extend directly from the rule of probability we already learned

We can generalize the simple model in this slide to more than two events, which we will call z1,z2,...,zn:

Pr(z1,z2,...,zn)=Pr(zn|zn1,...,z1)...Pr(z3|z2,z1)Pr(z2|z1)Pr(z1)

The order of conditioning (i.e., the dependencies in the graph) are determined by the biology, not the statistics

Bayesian networks

31 / 59

Bayesian networks

Pr(A,B,C)=Pr(A|B,C)Pr(B|C)Pr(C)

31 / 59

Bayesian networks

32 / 59

Bayesian networks

Pr(A,B,C,D)=Pr(A|C)Pr(B|C)Pr(C|D)Pr(D)

32 / 59

Bayesian networks

33 / 59

Bayesian networks

Pr(A,B,C,D,E)=Pr(A|C)Pr(B|C)Pr(C|D,E)Pr(D)Pr(E)

33 / 59

Bayesian networks

34 / 59

Bayesian networks

Pr(A,B,C,D)=Pr(A|B,C,D)Pr(B|C,D)Pr(C|D)Pr(D)

34 / 59

Bayesian networks

35 / 59

Bayesian networks

Pr(A,B,C,D)=Pr(A|B,C,D)Pr(C|D)Pr(D)Pr(B)

35 / 59

If we know that B is independent of C and D, we can simplify the conditional expressions because (for independent events):

Pr(z1|z2)=Pr(z1)

Properties of probability distributions

Because all unobserved quantities are treated as random variables governed by probability distributions, using and understanding Bayesian methods requires understanding probability distributions.

As ecologists, there are a number of very common probability distributions that we encounter and use regularly 10:

  • normal

  • Poisson

  • binomial

  • gamma

36 / 59

10We are not going to go over the properties of each of these distributions in lecture. Instead, I will talk about specific distributions as they come up in examples.

Even though I will discuss specific distributions as they come up, I highly recommend you read the chapter of Hobbs & Hooten on probability functions to familiarize yourself with the distributions we'll use throughout the semester. If you don't have that book, just google each distribution and read the wikipedia page.

Discrete vs. continuous distributions

37 / 59

Discrete vs. continuous distributions

Continuous random variables can take on an infinite number of values on a specific interval 11

  • Normal ( to )

  • Gamma (0 to )

  • Beta (0 to 1)

  • Uniform (? to ?)

37 / 59

Discrete vs. continuous distributions

Continuous random variables can take on an infinite number of values on a specific interval 11

  • Normal ( to )

  • Gamma (0 to )

  • Beta (0 to 1)

  • Uniform (? to ?)

Discrete random variables are those that take on distinct values, usually integers 12

  • Poisson (integers 0)

  • Bernoulli (0 or 1)

  • Binomial

  • Multinomial

37 / 59

11We usually encounter continuous variables in the form of regression coefficients (slope and intercepts), measurements (mass, lengths, etc.), and probabilities

12We usually encounter discrete variables in the form of counts (the number of individuals can only be positive integers, you can't have 4.234 individuals) or categories (alive vs. dead, present in location A vs. B vs. C)

Probability functions

Very often we want to know the probability that a random variable will take a specific value z

38 / 59

Probability functions

Very often we want to know the probability that a random variable will take a specific value z

Answering this question requires the use of probability functions, which we will denote [z]13

38 / 59

Probability functions

Very often we want to know the probability that a random variable will take a specific value z

Answering this question requires the use of probability functions, which we will denote [z]13

Probability functions differ between continuous and discrete distributions so we will discuss these separately

38 / 59

13Probability functions tell us [z]=Pr(z)

Probability mass functions

For discrete random variables, the probability that the variable will take a specific value z is defined by the probability mass function (pmf)

All pmf's share two properties:

0[z]1 zS[z]=1

where S is the set of all z for which [z]>0 (the range of possible values of z).

39 / 59

Probability mass functions

As an example, let's assume a random variable that follows a Poisson distribution

  • Poisson random variables can take any integer value > 0 (0,1,2,...)

  • e.g., the number of individuals at a site or the number of seeds produced by a flower

40 / 59

Probability mass functions

As an example, let's assume a random variable that follows a Poisson distribution

  • Poisson random variables can take any integer value > 0 (0,1,2,...)

  • e.g., the number of individuals at a site or the number of seeds produced by a flower

The shape of the Poisson distribution is determined by 1 parameter called λ

  • λ is the expected value (the most likely value) of a random variable generated from the Poisson distribution

  • larger λ means larger values of the variable

40 / 59

Probability mass functions

If λ=10, what is the probability that z will equal 10? Or 8? Or 15?

41 / 59

Probability mass functions

If λ=10, what is the probability that z will equal 10? Or 8? Or 15?

In R, probability mass is estimating using the dpois() function (or the equivalent for other discrete distributions)

  • takes two arguments: the value we are interested in estimating the probability of (z)14 and the expected value of our distribution (λ)

  • dpois(x = seq(0,25), lambda = 10)

41 / 59

14R will let us put in a vector of values so we can also do the following to estimate the probability of all values from 0 to 25: dpois(x = seq(0, 25), lambda = 10)

Probability density functions

Probability mass functions provide the probability that a discrete random variable takes on a specific value z

For continuous variables, estimating probabilities is a little trickier because Pr(z)=0 for any specific value z

Why? Let's look at the probability distribution for a normal random variable with mean =0 and standard deviation =3:

42 / 59

Probability density functions

The probability density is the area under the curve for an interval between a and b, which we'll call Δz =(ab).

For example, the shaded area below shows the probability density Pr(2z1):

43 / 59

Probability density functions

This area can be approximated by multiplying the width times the (average) height of the rectangle:

Pr(azb)Δz[(a+b)/2]

By making the range Δz=ab smaller and smaller, we get closer to Pr(z):

44 / 59

Probability density functions

At z, Δz=0, thus [z]=0

45 / 59

Probability density functions

At z, Δz=0, thus [z]=0

However, we can use calculus to estimate the height of the line ([z]) as Δz approaches 0

45 / 59

Probability density functions

At z, Δz=0, thus [z]=0

However, we can use calculus to estimate the height of the line ([z]) as Δz approaches 0

So for continuous random variables, the probability density of z as the area under the curve between azb as Δz approaches zero

45 / 59

Probability density functions

At z, Δz=0, thus [z]=0

However, we can use calculus to estimate the height of the line ([z]) as Δz approaches 0

So for continuous random variables, the probability density of z as the area under the curve between azb as Δz approaches zero

Estimating probability density in R is the same as for discrete variables: dnorm() 15

45 / 59

15Now you know why the function starts with d!

Moments

46 / 59

Moments

Every probability distribution we will use in the course can be described by its moments

  • 1st moment is the expected value (i.e., mean)

  • 2nd moment is the variance

47 / 59

Expected value (i.e., the mean)

The first moment of a distribution describes its central tendency (denoted μ) or expected value (denoted E(z))

48 / 59

Expected value (i.e., the mean)

The first moment of a distribution describes its central tendency (denoted μ) or expected value (denoted E(z))

This is the most probable value of z

48 / 59

Expected value (i.e., the mean)

The first moment of a distribution describes its central tendency (denoted μ) or expected value (denoted E(z))

This is the most probable value of z

Think of this as a weighted average - the mean of all possible values of z weighted by the probability mass or density of each value [z]

48 / 59

Expected value (i.e., the mean)

The first moment of a distribution describes its central tendency (denoted μ) or expected value (denoted E(z))

This is the most probable value of z

Think of this as a weighted average - the mean of all possible values of z weighted by the probability mass or density of each value [z]

For discrete variables, the first moment can be calculated as

μ=E(z)=zSz[z]

48 / 59

Expected value (i.e., the mean)

The first moment of a distribution describes its central tendency (denoted μ) or expected value (denoted E(z))

This is the most probable value of z

Think of this as a weighted average - the mean of all possible values of z weighted by the probability mass or density of each value [z]

For discrete variables, the first moment can be calculated as

μ=E(z)=zSz[z]

For continuous variables, we need to use an integral instead of a sum:

μ=E(z)=z[z]dz

48 / 59

Variance

The second moment of a distribution describes the variance - that is, the spread of the distribution around its mean

49 / 59

Variance

The second moment of a distribution describes the variance - that is, the spread of the distribution around its mean

On average how far is a random value drawn from the distribution from the mean of the distribution

49 / 59

Variance

The second moment of a distribution describes the variance - that is, the spread of the distribution around its mean

On average how far is a random value drawn from the distribution from the mean of the distribution

For discrete variables, variance can be estimated as the weighted average of the squared difference (squared to prevent negative values) between each value z and the mean μ of the distribution:

σ2=E((zμ)2)=zS(zμ)2[z]

49 / 59

Variance

The second moment of a distribution describes the variance - that is, the spread of the distribution around its mean

On average how far is a random value drawn from the distribution from the mean of the distribution

For discrete variables, variance can be estimated as the weighted average of the squared difference (squared to prevent negative values) between each value z and the mean μ of the distribution:

σ2=E((zμ)2)=zS(zμ)2[z]

and for continuous variables:

σ2=E((zμ)2)=(zμ)2[z]dz

49 / 59

Exercise: Estimating moments using Monte Carlo integration 16

One way to estimate moments is by simulating a large number of values from a probability distribution and then using these samples to calculate the first and second moments 17

This approach is very easy to do in R using the r class of functions (e.g., rnorm(), rpois(), etc.)

  • These functions generate specified number of random draws (r for random) from a given probability distribution
50 / 59

16 Monte Carlo integration is a form of simulation where we draw many random samples from a probability distribution and then use those samples to learn about properties of the distribution

17This is a useful approach to understand because it is very similar to how we learn about parameter distributions in Bayesian analyses

Exercise: Estimating moments using Monte Carlo integration

Let's estimate the first and second moments of a gamma distribution 18

51 / 59

Exercise: Estimating moments using Monte Carlo integration

Let's estimate the first and second moments of a gamma distribution 18

The shape of the gamma distribution is governed by two parameters, α (referred to as the shape) and β (referred to as the rate or sometimes the scale) 19

51 / 59

Exercise: Estimating moments using Monte Carlo integration

Let's estimate the first and second moments of a gamma distribution 18

The shape of the gamma distribution is governed by two parameters, α (referred to as the shape) and β (referred to as the rate or sometimes the scale) 19

In R, we can generate and visualize a large number (e.g., 10000) random draws from the gamma distribution using the following code:

n <- 10000 # Sample size
samp <- rgamma(n, shape = 0.5, rate = 2)
51 / 59

18The gamma distribution is continuous probability distribution that produces non-negative random variables

19Both α and β must be >0

Exercise: Estimating moments using Monte Carlo integration

Now let's use these sample to estimate the first moment (the mean) and the second moment (the variance) of the distribution

52 / 59

Exercise: Estimating moments using Monte Carlo integration

Now let's use these sample to estimate the first moment (the mean) and the second moment (the variance) of the distribution

We estimate the first moment by taking the arithmetic mean of our samples (1nni=1zi) and the variance as (1nni=1(ziμ)2):

mu <- sum(samp)/n # mean of the sample
sigma2 <- sum((samp - mu)^2)/n # variance of the sample
52 / 59

Exercise: Estimating moments using Monte Carlo integration

How close are these values to the true moments? For the gamma distribution: μ=αβ

σ2=αβ2

For our samples: 1

mu # Estimated mean
## [1] 0.2491
0.5/2 # True mean
## [1] 0.25
53 / 59

Your answer won't exactly match the ones here but they should be pretty close

Exercise: Estimating moments using Monte Carlo integration

How close are these values to the true moments? For the gamma distribution: μ=αβ

σ2=αβ2

For our distribution:

sigma2 # Estimated variance
## [1] 0.1278
0.5/2^2
## [1] 0.125
54 / 59

Exercise: Estimating moments using Monte Carlo integration

Try this on your own - simulate data from a Poisson distribution and see if the moments you estimate from the sample are close to the true moments

Hint - the Poisson distribution has a single parameter λ, which is both the mean and the variance of the distribution

55 / 59

Exercise: Estimating moments using Monte Carlo integration

Try this on your own - simulate data from a Poisson distribution and see if the moments you estimate from the sample are close to the true moments

Hint - the Poisson distribution has a single parameter λ, which is both the mean and the variance of the distribution

Change both λ and n. Does varying these values change how well your sample estimates the moments? 20

55 / 59

20 Question - in the above simulations, we use the arithmetic mean to estimate the first moment of the distribution. But in the definition of the moment, we defined the mean as the weighted average of the z's. Why don't we have to take the weighted average of our sample?

Moment matching

What if you know the mean and variance of a distribution and need the parameters?

56 / 59

Moment matching

What if you know the mean and variance of a distribution and need the parameters?

Rather than using simulation, each distribution has a set of formulas for converting between parameters and moments (called moment matching)

56 / 59

Moment matching

What if you know the mean and variance of a distribution and need the parameters?

Rather than using simulation, each distribution has a set of formulas for converting between parameters and moments (called moment matching)

Moment matching is very important because often we have the mean and variance of distributions but need to convert those summaries into the parameters of the underlying distribution 21,22

56 / 59

21If this is not obvious right now, don't worry. You'll see why later in the semester as we work through examples

22 Of course, this does not mean you need to memorize the moment equations - that's what google is for.

Moment matching

For the normal distribution, it is relatively easy to understand moments because the parameters of the distribution (mean and standard deviation) are the first and second moments

57 / 59

Moment matching

The normal distribution has an interesting property - you can change the first moment without changing the second moment

58 / 59

Moment matching

The normal distribution has an interesting property - you can change the first moment without changing the second moment

This is not true of all probability distributions

58 / 59

Moment matching

The normal distribution has an interesting property - you can change the first moment without changing the second moment

This is not true of all probability distributions

For example, the beta distribution is a continuous distribution with values between 0 and 1 23,24. Its first and second moments are:

μ=αα+β

σ2=αβ(α+β)2(α+β+1)

58 / 59

23This makes it useful for modeling random variables that are probabilities (e.g., detection probability in an occupancy model)

24The shape of the beta distribution is governed by two parameters α and β

Moment matching

What is you know the mean of a beta distribution is 0.3 and the variance is 0.025? What are α and β?

59 / 59

Moment matching

What is you know the mean of a beta distribution is 0.3 and the variance is 0.025? What are α and β?

α=(1μσ21μ)μ2

β=α(1μ1)

59 / 59

Moment matching

What is you know the mean of a beta distribution is 0.3 and the variance is 0.025? What are α and β?

α=(1μσ21μ)μ2

β=α(1μ1)For our model, that means 25:

(alpha <- ( (1 - 0.3)/0.025 - (1/0.3) )*0.3^2)
## [1] 2.22
(beta <- alpha * ( (1/0.3) - 1))
## [1] 5.18
59 / 59

25 On your own, use our simulation method to check that our estimates are correct:

samp <- rbeta(n, alpha, beta)
(mu <- sum(samp)/n)
(sigma <- sum((samp-mu)^2)/n)

Readings:

Hobbs & Hooten 29-70

2 / 59
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow