Processing math: 100%

Piedmont municipalities: population density vs voter turnout

The voter turnout, i.e. the percentage of eligible voters who cast a ballot in an election, has been related with the population density of the 1206 municipalities of the italian region Piedmont.

The spots on the scatter plot above represent the 1206 municipalities, the x and y coordinates refer respectively to the logarithm of the population density (people/km^2) and to the voter turnout during the italian 2013 general election of the Chamber of Deputies. The area of the spots refers instead to the population size of the corresponding municipality.

The analysis is extended by the opportunity to shift, on the x-axis, between the logarithm of the population density to the logarithm of the population size and, on the y-axis, to the 2008 and 2006 general elections.

The green straight line shows a linear regression analysis on the set of points, the 95% confidence intervals on the y-intercept and slope are drawn on demand.

A slope greater than zero suggests a direct relationship between the logarithm of the population density (or size) and the voter turnout.

The voter uncertainty is calculated as the Shannon entropy of the partition defined by the votes assigned to every delegate, and it reflects the heterogeneity of the electoral preferences in a given municipality.

Municipalities with higher population size have an higher uncertainty on electoral preferences.

further details

Data Source

Data on general elections: elezionistorico.interno.it, © Ministero dell'Interno. Tutti i diritti riservati.
Population density and size of Piedmont municipalities: istat.it, Creative Commons - Attribuzione - versione 3.0 (CC BY 3.0 IT)

Mathematical Methods

Linear Regression

The following text is a brief summary of lectures 29,30 and 31 given by Dmitry Panchenko at MIT. Suppose we are given a set of observations {(X1,Y1),,(Xn,Yn)}, where Xi,YiR. We assume to model the random variable Y as a linear function of the random variable X with the presence of a random noise, i.e. we are assuming Yi=b0+b1Xi+ϵi where b0,b1R and ϵiN(0,σ2) (that is ϵi is assumed to have normal distribution) are unknown parameters. On the following we will estimate the unknown parameters and their confidence intervals, given the observations. Let us think of the points Xi as fixed and non random and deal only with the randomness that comes from the noise variables ϵi. In other words we deal only with the distribution P(Yi)=f(Xi,b0,b1,ϵi), which is normal because the randomness comes from the normal variables ϵi. We want to find the estimates ^b0,^b1 and ˆσ2 that fits the observations best and one can define the measure of the quality of fit in different ways. Here we use the maximum likelihood estimates, which maximize the probability P(Y1Y2Yn)=P(Y1)P(Y2)P(Yn) of the event Y1 AND Y2 AND  AND Yn. The maximum likelihood estimates are ^b1=¯XYˉXˉY¯X2ˉX2,^b0=ˉY^b1ˉX,ˆσ2=1nni=1(Yi^b0^b1Xi)2 where ˉX is the mean of X (note that b0 and b1 are the same as found with the least-squares method). The knowledge of the estimates is enough to draw the line that fits the observations, but an important further information (the confidence) comes from the distribution of the estimates. The estimates have a probability distribution because they are functions of Yi, which have distributions P(Yi). What are the confidence intervals? It will become apparent with an example. It can be proved that the random variable ˆσ2 is independent of ^b0 and ^b1 and that it has a chi squared distribution with n2 degrees of freedom. In formulas nˆσ2σ2χ2n2. Note that χ2n2 is a well known distribution, i.e. we can calculate probabilities with it. For example, let be α=0.025, if we find the values c1 and c2 such that c10χ2n2dx=α/2 and c2χ2n2dx=α/2, then the probability of the remaining interval is c2c1χ2n2dx=1α=0.95, which means that P(c1nˆσ2σ2c2)=0.95. Solving for σ2 we find the 1α confidence interval nˆσ2c2σ2nˆσ2c1. Note that the confidence interval is calculable. With the same meaning, the 1α confidence intervals of b1 and b0 are ^b1cˆσ2(n2)(¯X2ˉX2)b1^b1+cˆσ2(n2)(¯X2ˉX2) ^b0cˆσ2n2(1+ˉX2¯X2ˉX2)b0^b0+cˆσ2n2(1+ˉX2¯X2ˉX2) where the value c originates from the Student distribution with n2 degrees of freedom: tn2(c,c)=1α.

Entropy Analysis

The uncertainty is calculated by means of the Shannon entropy on the number of votes assigned to every delegate. For example, given the municipality x let's suppose that the three delegates a, b and c obtained vxa, vxb and vxc votes (the total number of votes is vx=vxa+vxb+vxc). Then the uncertainty H(x) is calculated as the Shannon entropy H(x)=p(a,b,c)vxpvxlnvxpvx=vxavxlnvxavxvxbvxlnvxbvxvxcvxlnvxcvx Note that the entropy is not normalized to the maximal entropy (lnvx), because the number of delegates is nearly the same for every municipality (such normalization would result in an entropy proportional to the population size).

Disclaimer

The previous analysis is not exhaustive and the underlined observations need further research in order to be established with higher statistical accuracy.

close
LICENSE CC BY 3.0