Bayesian Analysis with Python
上QQ阅读APP看书,第一时间看更新

Independently and identically distributed variables

Many models assume that successive values of random variables are all sampled from the same distribution and those values are independent of each other. In such a case, we will say that the variables are independently and identically distributed (iid) variables for short. Using mathematical notation, we can see that two variables are independent if for every value of  and .

A common example of non-iid variables are temporal series, where a temporal dependency in the random variable is a key feature that should be taken into account. Take, for example, the following data coming from http://cdiac.esd.ornl.gov. This data is a record of atmospheric CO2 measurements from 1959 to 1997. We are going to load the data (including the accompanying code) and plot it:

data = np.genfromtxt('../data/mauna_loa_CO2.csv', delimiter=',')
plt.plot(data[:,0], data[:,1])
plt.xlabel('year')
plt.ylabel('$CO_2$ (ppmv)')
plt.savefig('B11197_01_02.png', dpi=300)
Figure 1.2

Each data point corresponds to the measured levels of atmospheric CO2 per month. The temporal dependency of data points is easy to see in this plot. In fact, we have two trends here: a seasonal one (this is related to cycles of vegetation growth and decay), and a global one, indicating an increasing concentration of atmospheric CO2.