class: center, middle, inverse, title-slide # Lecture 2 ## Probability refresher (or introduction) ###
WILD6900 (Spring 2021) --- ## Readings: > Hobbs & Hooten 29-70 --- class: inverse, center, middle **Warning:** The material presented in this lecture is tedious. But the concepts in this lecture are critical to everything that will follow in this course. So push through and try your best to understand these topics. You do not need to be an expert in probability at the end of this lecture - we will reinforce these concepts over and over again throughout the semester - but getting the gist now will help you grasp other topics as we move forward --- ## Stochasticity and uncertainty in ecological models #### In each level of our models, we differentiate between: + a *deterministic* model `\(\large g()\)`, and + a *stochastic* model `\(\large [a|b,c]\)` <br/> -- #### The deterministic portion of the model contains no uncertainty `\(^1\)` -- #### *Stochastic* processes are different: + given an input, the model will not always return the same answer + the output of stochastic processes are *uncertain* + Even though stochastic processes are inherently uncertain, they are not unpredictable. ??? `\(^1\)` Given a certain input, the deterministic model will always return the same answer. --- ## Stochasticity and uncertainty in ecological models ##### In Bayesian models, all unobserved quantities are treated as a **random variables**, that is they can take on different values due to chance (i.e., stochastic) ##### Each random variable in our model is governed by a **probability distribution** `\(^2\)` -- #### Our goal is to *use our data to learn about those distributions* ??? `\(^2\)` Because probability distributions for the basis of Bayesian methods, a good understanding of probability is critical to everything that will follow. --- ## Probability #### Uncertain events are not necessarily unpredictable -- #### *Probability* allows us to summarize how likely each possible value of a random variable is to occur --- ## Sample space #### For any given random variable, the *sample space* `\(S\)` includes all of the possible values the variable can take ##### For example, for an single-species occupancy model, `\(S\)` would be present or absent. For a model of species abundance, `\(S\)` would be `\({0,1,2,3,...,\infty}\)`. <img src="S.png" width="450" style="display: block; margin: auto;" /> ??? For a random process to truly be a probability, the sum of the probabilities of all must equal 1: `\(\sum_{i=1}^n Pr(A_i) = 1\)` (if the outcomes are continuous we have to take the integral instead of the sum). --- ## Sample space #### **Example** ##### Imagine an occupancy model in which we want to know if species `\(\Large x\)` is present at a given location ##### We will denote the occupancy status `\(\Large z_x\)` and the sample space includes just two possible values: `$$\huge S_{z_x}=\{0, 1\}$$` --- class: inverse, middle, center # Probability of single events --- ## Probability of single events #### The probability of `\(\large A\)` is the area of `\(\large A\)` divided by the area of `\(\large S\)` `\(^3\)` `$$\Large Pr(A) = \frac{area\; of\; A}{area\; of \;S}$$` <img src="SA.png" width="500" style="display: block; margin: auto;" /> ??? `\(^3\)` Within the sample space, we can define a smaller polygon `\(A\)` which represents one possible outcome `\(A\)` is smaller than `\(S\)` because it does not contain all possible outcomes, just a subset. What is the probability that `\(A\)` does not occur? It's the area outside of `\(A\)`: `$$Pr(not \; A) = \frac{area\; of \;S - area\; of\; A}{area\; of \;S} = 1 - \frac{area\; of\; A}{area\; of \;S} = 1 - Pr(A)$$` --- ## Probability of single events #### **Example** #### In our example, let's say that the probability of occupancy for species `\(\large x\)` is: `$$\LARGE Pr(z_x = 1) = 0.4$$` -- #### This means that the probability that the site in not occupied is: -- `$$\LARGE Pr(z_x = 0) =1-0.4=0.6$$` --- class: inverse, middle, center # Probability of multiple events --- ## Probability of multiple events #### Often, we are not interested in the probability of a single event happening but instead of more than one events -- #### The *joint* probability refers to the probability that two or more events occur and is usually denoted `\(\large Pr(A,B)\)` `\(^4\)` <img src="SAB.png" width="500" style="display: block; margin: auto;" /> ??? `\(^4\)` Estimating joint probabilities is more challenging than estimating the probability of single events but is critical to understanding the logic behind Bayesian methods --- ## Probability of multiple events #### **Example** To extend our simple example, let's imagine we are interested in the occupancy status of two species - `\(\large x\)` and `\(\large y\)`. Our sample space is now: `$$\Large S_{z_x,z_y} = \{(0,0), (0,1), (1,0), (1,1)\}$$` -- The question we want to know now is: what is the probability that a site is occupied by both species: `$$\Large Pr(z_x = 1, z_y = 1) = Pr(z_x, z_y)$$` -- The answer to that question depends on the relationship between `\(\large Pr(z_x)\)` and `\(\large Pr(z_y)\)` --- class: inverse, center, middle # Conditional probability --- ## Conditional probability #### In some cases, knowing the status of one random variable tells us something about the status of another random variable Let's say we know that species `\(\large x\)` is present, that is `\(\large z_x=1\)` Knowing that `\(\large z_x=1\)` does two things: <br/> -- 1) It shrinks the possible range of sample space (if `\(\large z_x=1\)` occurred, the remainder of our sample space (in this case `\(\large z_x=0\)`) did not occur) <br/> -- 2) It effectively shrinks the area `\(\large z_y\)` - we know that the area of `\(\large z_y\)` outside of `\(\large z_x\)` didn't occur <br/> -- You can see this very clearly in this [awesome visualization](http://setosa.io/conditional/) --- ## Conditional probability `\(\large Pr(z_y|z_x)\)` is the area shared by the two events divided by the area of `\(\large z_x\)` (not `\(\large S\)`!) `\(^5\)` `$$\large Pr(z_y|z_x) = \frac{area\; shared\; by\; z_x\; and\; z_y}{area\; of \;z_x} = \frac{Pr(z_x\; \cap\; z_y)}{Pr(z_x)}$$` likewise, `$$\large Pr(z_x|z_y) = \frac{Pr(z_x\; \cap\; z_y)}{Pr(z_y)}$$` ??? `\(^5\)` Read `\(Pr(z_y|z_x)\)` as "the probability of `\(z_y\)` conditional on `\(z_x\)`" `\(\cap\)` means "intersection" and it is the area shared by both `\(A\)` and `\(B\)` --- ## Conditional probability #### For conditional events, the joint probability is: `$$\LARGE Pr(z_y, z_x) = Pr(z_y|z_x)Pr(z_x) = Pr(z_x|z_y)Pr(z_y)$$` --- ## Probability of independent events In some cases, the probability of one event occurring is *independent* of whether or not the other event occurs `\(^6\)` -- In our example, the occupancy of the two species may be totally unrelated + if they occur together, it happens by complete chance `\(^7\)` -- In this case, knowing that `\(\large z_x=1\)` gives us no new information about the probability of `\(\large z_y=1\)` Mathematically, this means that: `$$\large Pr(z_y|z_x) = Pr(z_y)$$` and `$$\large Pr(z_x|z_y) = Pr(z_x)$$` -- Thus, `$$\large Pr(z_x,z_y) = Pr(z_x)Pr(z_y)$$` ??? `\(^6\)`For example, the probability of a coin flip being heads is not dependent on whether or not the previous flip was heads. `\(^7\)`This maybe unlikely since even if they don't interact with each other, habitat preferences alone might lead to non-independence but we'll discuss that in a more detail shortly --- ## Disjoint events A special case of conditional probability occurs when events are *disjoint* <br/> -- In our example, maybe species `\(\large x\)` and species `\(\large y\)` **never** occur together `\(^8\)` <br/> -- In this case, knowing that `\(\large z_x = 1\)` means that `\(\large z_y = 0\)`. In other words, `$$\Large Pr(z_y|z_x) = Pr(z_x|z_y) = 0$$` <img src="S_disjoint.png" width="300" style="display: block; margin: auto;" /> ??? `\(^8\)`Perhaps they are such fierce competitors that they will exclude each other from their territories --- ## Probability of one event or the other #### In some cases, we might want to know the probability that one event *or* the other occurs -- #### For example, what is the probability that species `\(\large x\)` or species `\(\large y\)` is present but not both? -- #### This is the area in `\(\large z_x\)` and `\(\large z_y\)` not including the area of overlap: `$$\Large Pr(z_x \cup z_y) = Pr(z_x) + Pr(z_y) - Pr(z_x,z_y)$$` --- ## Probability of one event or the other When `\(\large z_x\)` and `\(\large z_y\)` are independent, `$$\large Pr(z_x \cup z_y) = Pr(z_x) + Pr(z_y) - Pr(z_x)Pr(z_y)$$` -- If they are conditional, `$$Pr(z_x \cup z_y) = Pr(z_x) + Pr(z_y) - Pr(z_x|z_y)Pr(z_y) = Pr(z_x) + Pr(z_y) - Pr(z_y|z_x)Pr(z_x)$$` -- If they are disjoint, `$$\Large Pr(z_x \cup z_y) = Pr(z_x) + Pr(z_y)$$` --- ## Marginal probability A critical concept in Bayesian models is **marginal probability**, that is the probability of one event happening regardless of the state of other events -- Imagine that our occupancy model includes the effect of 3 different habitats on the occupancy probability of species `\(\large x\)`, so: `$$\Large Pr(z_x|H_i) = \frac{Pr(z_x \cap H_i)}{Pr(H_i)}$$` -- What is the overall probability that species `\(\large x\)` occurs regardless of habitat type? That is, `\(P\large r(z_x)\)`? -- In this case, we *marginalize* over the different habitat types by summing the conditional probabilities weighted by probability of each `\(\large H_i\)`: `$$Pr(z_x) = \sum_{i=1}^3 Pr(z_x|H_i)Pr(H_i)$$` -- Think of this as a weighted average - the probability that `\(\large z_x=1\)` in each habitat type weighted by the probability that each habitat type occurs. --- ## Marginal probability <img src="S_marginal1.png" width="600" style="display: block; margin: auto;" /> --- ## Marginal probability <img src="S_marginal2.png" width="600" style="display: block; margin: auto;" /> --- ## Marginal probability <table class="table table-striped table-hover table-condensed table-responsive" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> </th> <th style="text-align:center;"> H1 </th> <th style="text-align:center;"> H2 </th> <th style="text-align:center;"> H3 </th> <th style="text-align:center;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Occupied </td> <td style="text-align:center;"> 60 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 80 </td> </tr> <tr> <td style="text-align:center;"> Unoccupied </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 70 </td> <td style="text-align:center;"> 250 </td> <td style="text-align:center;"> 340 </td> </tr> <tr> <td style="text-align:center;"> Total </td> <td style="text-align:center;"> 80 </td> <td style="text-align:center;"> 80 </td> <td style="text-align:center;"> 260 </td> <td style="text-align:center;"> 420 </td> </tr> </tbody> </table> ??? This is the reason random or stratified sampling is so important if you want to know `\(Pr(z)\)` - if you do not sample habitats in proportion to `\(Pr(H_i)\)`, you will get biased estimates of `\(Pr(z)\)`! --- class: inverse, middle, center # Factoring joint probabilities --- ## Factoring joint probabilities Many of the models you will work with as an ecologist will contain multiple random variables `$$\bigg[z, \theta_p, \theta_o, \sigma^2_p, \sigma^2_s, \sigma^2_o, u_i\big|y_i\bigg] \propto [y_i|d(\Theta_o,u_i), \sigma^2_o][u_i|z, \sigma^2_s][z|g(\theta_p, x), \sigma^2_p][\theta_p][\theta_o][\sigma^2_p][\sigma^2_s][\sigma^2_o]$$` -- The rules of probability allow us to express the complex *joint* probabilities as a series of more simple *conditional* probabilities - These concepts may feel a little abstract now but they will be very important when we learn about implementing Bayesian models -- Determining the dependencies between parameters in the models is aided by **Bayesian network models** --- ## Bayesian networks Bayesian networks graphically display the dependence among random variables - Random variables are *nodes* - Arrows point from *parents* to *children* <img src="AB.png" width="20" style="display: block; margin: auto;" /> --- ## Bayesian networks Bayesian networks graphically display the dependence among random variables `\(^9\)` - Children nodes are on the LHS of conditioning symbols - Parent nodes are on the RHS of conditioning symbols - Nodes without a parent are expressed unconditionally <img src="AB.png" width="20" style="display: block; margin: auto;" /> -- `$$\LARGE Pr(A, B) = Pr(A|B)Pr(B)$$` ??? `\(^9\)` These rules extend directly from the rule of probability we already learned We can generalize the simple model in this slide to more than two events, which we will call `\(z_1, z_2,...,z_n\)`: `$$Pr(z_1, z_2,...,z_n) = Pr(z_n|z_{n-1},...,z_1)...Pr(z_3|z_2, z_1)Pr(z_2|z_1)Pr(z_1)$$` The order of conditioning (i.e., the dependencies in the graph) are determined by the biology, not the statistics --- ## Bayesian networks <img src="ABC.png" width="200" style="display: block; margin: auto;" /> -- `$$\Large Pr(A, B, C) = Pr(A|B,C)Pr(B|C)Pr(C)$$` --- ## Bayesian networks <img src="ABCD.png" width="200" style="display: block; margin: auto;" /> -- `$$\Large Pr(A, B, C, D) = Pr(A|C)Pr(B|C)Pr(C|D)Pr(D)$$` --- ## Bayesian networks <img src="ABCDE.png" width="200" style="display: block; margin: auto;" /> -- `$$\Large Pr(A, B, C, D, E) = Pr(A|C)Pr(B|C)Pr(C|D, E)Pr(D)Pr(E)$$` --- ## Bayesian networks <img src="ABCD2.png" width="200" style="display: block; margin: auto;" /> -- `$$\Large Pr(A, B, C, D) = Pr(A|B,C,D)Pr(B|C,D)Pr(C|D)Pr(D)$$` --- ## Bayesian networks <img src="ABCD3.png" width="200" style="display: block; margin: auto;" /> -- `$$\Large Pr(A, B, C, D) = Pr(A|B,C,D)Pr(C|D)Pr(D)Pr(B)$$` ??? If we know that `\(B\)` is independent of `\(C\)` and `\(D\)`, we can simplify the conditional expressions because (for independent events): `$$Pr(z_1|z_2) = Pr(z_1)$$` --- ## Properties of probability distributions Because all unobserved quantities are treated as random variables governed by probability distributions, using and understanding Bayesian methods requires understanding probability distributions. As ecologists, there are a number of very common probability distributions that we encounter and use regularly `\(^{10}\)`: - normal - Poisson - binomial - gamma ??? `\(^{10}\)`We are not going to go over the properties of each of these distributions in lecture. Instead, I will talk about specific distributions as they come up in examples. Even though I will discuss specific distributions as they come up, I highly recommend you read the chapter of *Hobbs & Hooten* on probability functions to familiarize yourself with the distributions we'll use throughout the semester. If you don't have that book, just google each distribution and read the wikipedia page. --- ## Discrete vs. continuous distributions -- **Continuous** random variables can take on an infinite number of values on a specific interval `\(^{11}\)` - Normal `\((-\infty\)` to `\(\infty)\)` - Gamma (0 to `\(\infty)\)` - Beta (0 to 1) - Uniform (? to ?) -- **Discrete** random variables are those that take on distinct values, usually integers `\(^{12}\)` - Poisson (integers `\(\geq 0)\)` - Bernoulli (0 or 1) - Binomial - Multinomial ??? `\(^{11}\)`We usually encounter continuous variables in the form of regression coefficients (slope and intercepts), measurements (mass, lengths, etc.), and probabilities `\(^{12}\)`We usually encounter discrete variables in the form of counts (the number of individuals can only be positive integers, you can't have 4.234 individuals) or categories (alive vs. dead, present in location A vs. B vs. C) --- ## Probability functions #### Very often we want to know the probability that a random variable will take a specific value `\(\large z\)` -- #### Answering this question requires the use of probability functions, which we will denote `\(\large [z]^{13}\)` -- #### Probability functions differ between *continuous* and *discrete* distributions so we will discuss these separately ??? `\(^{13}\)`Probability functions tell us `\([z]=Pr(z)\)` --- ## Probability mass functions For *discrete* random variables, the probability that the variable will take a specific value `\(\large z\)` is defined by the *probability mass function* (pmf) #### All pmf's share two properties: `$$\Large 0 \leq [z] \leq 1$$` `$$\Large \sum_{z \in S}[z]=1$$` where `\(\large S\)` is the set of all `\(\large z\)` for which `\(\large [z] > 0\)` (the range of possible values of `\(\large z\)`). --- ## Probability mass functions As an example, let's assume a random variable that follows a Poisson distribution - Poisson random variables can take any integer value > 0 `\(\large (0, 1, 2,...)\)` - e.g., the number of individuals at a site or the number of seeds produced by a flower -- The shape of the Poisson distribution is determined by 1 parameter called `\(\large \lambda\)` - `\(\large \lambda\)` is the expected value (the most likely value) of a random variable generated from the Poisson distribution - larger `\(\large \lambda\)` means larger values of the variable --- ## Probability mass functions #### If `\(\large \lambda = 10\)`, what is the probability that `\(\large z\)` will equal 10? Or 8? Or 15? -- In R, probability mass is estimating using the `dpois()` function (or the equivalent for other discrete distributions) - takes two arguments: the value we are interested in estimating the probability of `\((z)^{14}\)` and the expected value of our distribution `\((\lambda)\)` - `dpois(x = seq(0,25), lambda = 10)` <img src="Lecture2_files/figure-html/unnamed-chunk-15-1.png" width="432" style="display: block; margin: auto;" /> ??? `\(^{14}\)`R will let us put in a vector of values so we can also do the following to estimate the probability of all values from 0 to 25: `dpois(x = seq(0, 25), lambda = 10)` --- ## Probability density functions Probability mass functions provide the probability that a discrete random variable takes on a specific value `\(\large z\)` For continuous variables, estimating probabilities is a little trickier because `\(\large Pr(z) = 0\)` for any specific value `\(\large z\)` Why? Let's look at the probability distribution for a normal random variable with mean `\(\large =0\)` and standard deviation `\(\large =3\)`: <img src="Lecture2_files/figure-html/unnamed-chunk-16-1.png" width="432" style="display: block; margin: auto;" /> --- ## Probability density functions The *probability density* is the area under the curve for an interval between `\(\large a\)` and `\(\large b\)`, which we'll call `\(\large \Delta_z\)` `\(\large =(a-b)\)`. For example, the shaded area below shows the probability density `\(\large Pr(-2 \leq z \leq -1)\)`: <img src="Lecture2_files/figure-html/unnamed-chunk-17-1.png" width="432" style="display: block; margin: auto;" /> --- ## Probability density functions This area can be approximated by multiplying the width times the (average) height of the rectangle: `$$\Large Pr(a \leq z \leq b) \approx \Delta_z [(a + b)/2]$$` By making the range `\(\Delta_z =a-b\)` smaller and smaller, we get closer to `\(Pr(z)\)`: <img src="Lecture2_files/figure-html/unnamed-chunk-18-1.png" width="432" style="display: block; margin: auto;" /> --- ## Probability density functions #### At `\(\large z\)`, `\(\large \Delta_z =0\)`, thus `\(\large [z] = 0\)` -- #### However, we can use calculus to estimate the height of the line `\(\large ([z])\)` as `\(\large \Delta_z\)` approaches 0 -- #### So for continuous random variables, the *probability density* of `\(\large z\)` as the area under the curve between `\(\large a \leq z \leq b\)` as `\(\large \Delta_z\)` approaches zero -- #### Estimating probability density in R is the same as for discrete variables: `dnorm()` `\(^{15}\)` ??? `\(^{15}\)`Now you know why the function starts with `d`! --- class: inverse, center, middle # Moments --- ## Moments Every probability distribution we will use in the course can be described by its *moments* - `\(1^{st}\)` moment is the expected value (i.e., mean) - `\(2^{nd}\)` moment is the variance <img src="Lecture2_files/figure-html/unnamed-chunk-19-1.png" width="432" style="display: block; margin: auto;" /> --- ## Expected value (i.e., the mean) #### The first moment of a distribution describes its central tendency (denoted `\(\large \mu\)`) or *expected value* (denoted `\(\large E(z)\)`) -- #### This is the most probable value of `\(\large z\)` -- #### Think of this as a weighted average - the mean of all possible values of `\(z\)` weighted by the probability mass or density of each value `\(\large [z]\)` -- #### For discrete variables, the first moment can be calculated as `$$\Large \mu = E(z) = \sum_{z \in S} z[z]$$` -- #### For continuous variables, we need to use an integral instead of a sum: `$$\Large \mu = E(z) = \int_{-\infty}^{\infty} z[z]dz$$` --- ## Variance #### The second moment of a distribution describes the *variance* - that is, the spread of the distribution around its mean -- #### On average how far is a random value drawn from the distribution from the mean of the distribution -- #### For discrete variables, variance can be estimated as the weighted average of the squared difference (squared to prevent negative values) between each value `\(\large z\)` and the mean `\(\large \mu\)` of the distribution: `$$\Large \sigma^2 = E((z - \mu)^2) = \sum_{z \in S} (z - \mu)^2[z]$$` -- #### and for continuous variables: `$$\Large \sigma^2 = E((z - \mu)^2) = \int_{-\infty}^{\infty} (z - \mu)^2[z]dz$$` --- ## Exercise: Estimating moments using *Monte Carlo integration* `\(^{16}\)` One way to estimate moments is by simulating a large number of values from a probability distribution and then using these samples to calculate the first and second moments `\(^{17}\)` This approach is very easy to do in `R` using the `r` class of functions (e.g., `rnorm()`, `rpois()`, etc.) - These functions generate specified number of random draws (`r` for random) from a given probability distribution ??? `\(^{16}\)` Monte Carlo integration is a form of simulation where we draw many random samples from a probability distribution and then use those samples to learn about properties of the distribution `\(^{17}\)`This is a useful approach to understand because it is very similar to how we learn about parameter distributions in Bayesian analyses --- ## Exercise: Estimating moments using *Monte Carlo integration* Let's estimate the first and second moments of a gamma distribution `\(^{18}\)` <br/> -- The shape of the gamma distribution is governed by two parameters, `\(\large \alpha\)` (referred to as the shape) and `\(\large \beta\)` (referred to as the rate or sometimes the scale) `\(^{19}\)` <br/> <br/> -- In `R`, we can generate and visualize a large number (e.g., 10000) random draws from the gamma distribution using the following code: ```r n <- 10000 # Sample size samp <- rgamma(n, shape = 0.5, rate = 2) ``` ??? `\(^{18}\)`The gamma distribution is continuous probability distribution that produces non-negative random variables `\(^{19}\)`Both `\(\alpha\)` and `\(\beta\)` must be `\(>0\)` --- ## Exercise: Estimating moments using *Monte Carlo integration* Now let's use these sample to estimate the first moment (the mean) and the second moment (the variance) of the distribution <br/> -- We estimate the first moment by taking the arithmetic mean of our samples `\((\frac{1}{n}{\sum_{i=1}^{n}z_i})\)` and the variance as `\((\frac{1}{n}{\sum_{i=1}^{n}(z_i-\mu)^2})\)`: ```r mu <- sum(samp)/n # mean of the sample sigma2 <- sum((samp - mu)^2)/n # variance of the sample ``` --- ## Exercise: Estimating moments using *Monte Carlo integration* How close are these values to the true moments? For the gamma distribution: `$$\Large \mu = \frac{\alpha}{\beta}$$` `$$\Large \sigma^2 = \frac{\alpha}{\beta^2}$$` For our samples: `\(^1\)` ```r mu # Estimated mean ``` ``` ## [1] 0.2491 ``` ```r 0.5/2 # True mean ``` ``` ## [1] 0.25 ``` ??? Your answer won't exactly match the ones here but they should be pretty close --- ## Exercise: Estimating moments using *Monte Carlo integration* How close are these values to the true moments? For the gamma distribution: `$$\Large \mu = \frac{\alpha}{\beta}$$` `$$\Large \sigma^2 = \frac{\alpha}{\beta^2}$$` For our distribution: ```r sigma2 # Estimated variance ``` ``` ## [1] 0.1278 ``` ```r 0.5/2^2 ``` ``` ## [1] 0.125 ``` --- ## Exercise: Estimating moments using *Monte Carlo integration* Try this on your own - simulate data from a Poisson distribution and see if the moments you estimate from the sample are close to the true moments **Hint** - the Poisson distribution has a single parameter `\(\lambda\)`, which is both the mean and the variance of the distribution <br/> -- Change both `\(\large \lambda\)` and `\(\large n\)`. *Does varying these values change how well your sample estimates the moments?* `\(^{20}\)` ??? `\(^{20}\)` **Question** - in the above simulations, we use the arithmetic mean to estimate the first moment of the distribution. But in the definition of the moment, we defined the mean as the weighted average of the `\(z\)`'s. Why don't we have to take the weighted average of our sample? --- ## Moment matching What if you know the mean and variance of a distribution and need the parameters? <br/> -- Rather than using simulation, each distribution has a set of formulas for converting between parameters and moments (called *moment matching*) <br/> -- Moment matching is very important because often we have the mean and variance of distributions but need to convert those summaries into the parameters of the underlying distribution `\(^{21,22}\)` ??? `\(^{21}\)`If this is not obvious right now, don't worry. You'll see why later in the semester as we work through examples `\(^{22}\)` Of course, this does not mean you need to memorize the moment equations - that's what google is for. --- ## Moment matching For the normal distribution, it is relatively easy to understand moments because the parameters of the distribution (mean and standard deviation) *are* the first and second moments <img src="Lecture2_files/figure-html/unnamed-chunk-24-1.png" width="360" style="display: block; margin: auto;" /> --- ## Moment matching The normal distribution has an interesting property - you can change the first moment without changing the second moment <br/> -- This is not true of all probability distributions <br/> -- For example, the beta distribution is a continuous distribution with values between 0 and 1 `\(^{23,24}\)`. Its first and second moments are: `$$\Large \mu = \frac{\alpha}{\alpha + \beta}$$` `$$\Large \sigma^2 = \frac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$$` ??? `\(^{23}\)`This makes it useful for modeling random variables that are probabilities (e.g., detection probability in an occupancy model) `\(^{24}\)`The shape of the beta distribution is governed by two parameters `\(\alpha\)` and `\(\beta\)` --- ## Moment matching #### What is you know the mean of a beta distribution is 0.3 and the variance is 0.025? What are `\(\large \alpha\)` and `\(\large \beta\)`? -- `$$\Large \alpha = \bigg(\frac{1-\mu}{\sigma^2}- \frac{1}{\mu} \bigg)\mu^2$$` `$$\Large \beta = \alpha \bigg(\frac{1}{\mu}-1\bigg)$$` -- For our model, that means `\(^{25}\)`: ```r (alpha <- ( (1 - 0.3)/0.025 - (1/0.3) )*0.3^2) ``` ``` ## [1] 2.22 ``` ```r (beta <- alpha * ( (1/0.3) - 1)) ``` ``` ## [1] 5.18 ``` ??? `\(^{25}\)` On your own, use our simulation method to check that our estimates are correct: ```r samp <- rbeta(n, alpha, beta) (mu <- sum(samp)/n) (sigma <- sum((samp-mu)^2)/n) ```