data:image/s3,"s3://crabby-images/f06a5/f06a55e9ab99a836afd02e313041457454a4d5d2" alt="Training Systems Using Python Statistical Modeling"
Conjugate priors for proportions
So, let's see this in action. For data that takes values of either 0 or 1, we're going to use the beta distribution as our conjugate prior. The notation that is used to refer to the beta distribution is B(α, β).
α - 1 can be interpreted as imaginary prior successes, and β - 1 can be interpreted as imaginary prior failures. That's if you have added the data to your dataset—imaginary successes and imaginary failures.
If α = β = 1, then we interpret this as being no prior successes or failures; therefore, every probability of success, θ, is equally likely in some sense. This is referred to as an uninformative prior. Let's now implement this using the following steps:
- First, we're going to import the beta function from scipy.stats; this is the beta distribution. In addition to this, we will import the numpy library and the matplotlib library, as follows:
data:image/s3,"s3://crabby-images/a2f8c/a2f8c228b15ce8030314327726540ad09491f8b2" alt=""
- We're then going to plot the function and see how it looks, using the following code:
data:image/s3,"s3://crabby-images/61b4a/61b4ada57b57e9a5bec5518721645467cea60f82" alt=""
This results in the following output:
data:image/s3,"s3://crabby-images/ee757/ee757da202b2c5687109eebf312f95cdba7bbb60" alt=""
So, if we plot β when α=1 and β=1, we end up with a uniform distribution. In some sense, each p is equally likely.
- Now, we will use a=3 and b=3, to indicate two imaginary successes and two imaginary failures, which gives us the following output:
data:image/s3,"s3://crabby-images/2d064/2d064f2121381b00684a43e800267ccddd63859e" alt=""
Now, our prior distribution biases our data toward 0.5—in other words, it is equally likely to succeed as it is to fail.
Given a sample size of N, if there are M successes, then the posterior distribution when the prior is β, with the parameters (α, β), will be B (α + M, β + N - M). So, let's reconsider an earlier example; we have a website with 1,126 visitors. 310 clicked on an ad purchased by a sponsor, and we want to know what proportion of individuals will click on the ad in general.
- So, we're going to use our prior distribution beta (3, 3). This means that the posterior distribution will be given by the beta distribution, with the first parameter, 313, and the second parameter, 819. This is what the prior distribution and posterior distribution looks like when plotted against each other:
data:image/s3,"s3://crabby-images/faf1d/faf1d8395a4452dcaaced834c974b10e3fe13ff2" alt=""
The blue represents the prior distribution, and red represents the posterior distribution.