
Choosing the likelihood
Let's assume that only two outcomes are possible—heads or tails—and let's also assume that a coin toss does not affect other tosses, that is, we are assuming coin tosses are independent of each other. We will further assume all coin tosses come from the same distribution. Thus the random variable coin toss is an example of an iid variable. I hope you agree these are very reasonable assumptions to make for our problem. Given these assumptions a good candidate for the likelihood is the binomial distribution:

This is a discrete distribution returning the probability of getting heads (or in general, successes) out of
coin tosses (or in general, trials or experiments) given a fixed value of
:
n_params = [1, 2, 4] # Number of trials
p_params = [0.25, 0.5, 0.75] # Probability of success
x = np.arange(0, max(n_params)+1)
f,ax = plt.subplots(len(n_params), len(p_params), sharex=True,
sharey=True,
figsize=(8, 7), constrained_layout=True)
for i in range(len(n_params)):
for j in range(len(p_params)):
n = n_params[i]
p = p_params[j]
y = stats.binom(n=n, p=p).pmf(x)
ax[i,j].vlines(x, 0, y, colors='C0', lw=5)
ax[i,j].set_ylim(0, 1)
ax[i,j].plot(0, 0, label="N = {:3.2f}\nθ =
{:3.2f}".format(n,p), alpha=0)
ax[i,j].legend()
ax[2,1].set_xlabel('y')
ax[1,0].set_ylabel('p(y | θ, N)')
ax[0,0].set_xticks(x)

The preceding figure shows nine binomial distributions; each subplot has its own legend indicating the values of the parameters. Notice that for this plot I did not omit the values on the y axis. I did this so you can check for yourself that if you sum the high of all bars you will get 1, that is, for discrete distributions, the height of the bars represents actual probabilities.
The binomial distribution is a reasonable choice for the likelihood. We can see that indicates how likely it is to obtain a head when tossing a coin (this is easier to see when
, but is valid for any value of
)—just compare the value of
with the height of the bar for
(heads).
OK, if we know the value of , the binomial distribution will tell us the expected distribution of heads. The only problem is that we do not know
! But do not despair; in Bayesian statistics, every time we do not know the value of a parameter, we put a prior on it, so let's move on and choose a prior for
.