Tuesday, April 8, 2014

Public opinion polls

There will be elections in Brazil in 2014. This time we will elect the president, the state governors, part of the senate, the congress and the state assemblies. Due to Brazilian law, open campaign is still prohibited, but some movement already started. For example, everybody knows that president Dilma Roussef will apply for reelection and there are two declared "opposition" candidates. In this context the public opinion polls acquire a great importance. According to these polls, President Dilma is frankly favorite  to win the election in the first turn and her adversaries barely reach  16%  and 12% of the "vote intentions".

A recently published poll gained considerable attention in Brazil. Two different polls organized by the same institute (IBOPE, short for "Brazilian Institute for Public Opinion Research") gave different results concerning the same population, in one of the polls Pres. Dilma would have about 43% of vote intentions, while in the second  this number would be 38%.

There was a fuzz in the social networks about this result, first because the News Services (which are in the majority supporting the opposition) published headlines like "President Dilma fell in the IBOPE poll!". There was an outcry from Pres. Dilma's supporters because they  insinuated political use of the result (which is obvious), because the former poll range extended for a longer period and ended after the second poll (which gained full coverage from the press).

Apart from the political use of such results, one should look at the problem with a scientific point of view. What is a public opinion poll?

The answer to this question is: a statistical inference measure.

Let us consider what is in fact a poll. This process probes one population (the country voters) extracting a small sample and probes one question. Let us remain simple, let us suppose the question is binary, having only two possible outcomes. As every math, physics, engineering student learns, this problem is equivalent to probing a box containing a large number of pebbles colored black and white. Supposing the fraction of white peebles is p, that the sample size is N, the number n of white pebbles in the sample will be given by the Binomial probability distribution:

Let us look what this means for a typical sample size used in these polls (N=2600) and let us assume p=0.38 (that is, the population is composed of 38% of white pebbles). The result is given by the red line in the figure below.

 The fact that we measured p=0.38 means that we actually drew 988 white pebbles in the 2600 sized sample, but let us come back a little and ask before the measure what would be the probability to draw 988 pebbles provided p=0.38. This number is f=0.01612 (that is 1.6212%). Let us suppose now our measure of p is wrong and it is actually larger (it could be smaller too). I plotted in the figure (blue line) what would be the result if p=0.40. The probability to draw 988 white pebbles in this case would be f=0.00182 (or 0.182%). It looks small, but it is actually more than 1/10 of the ideal value. In fact, if we drew 100 N=2600 samples out of the box, any result between ~960 and ~1020 would be likely obtained in p=0.38.

Everybody who works with statistical inference knows this fact. In a statistical measure we will never be fully confident in the result. We will always run into the possibility to commit two types of errors, the first error, called "type I" error means that you accepts as true a proposition that is actually false, in the present case, that assuming p=0.38 is wrong. This is the origin of the "confidence interval" concept, which most laymen know as "error margin". There is, however, the type II error, which is rejecting a statement which is actually true, in the present case, rejecting p=0.40 after drawing 988 white pebbles. The type II error is more difficult to control and one could easily loose track of it if one tries to circumscribe the type I error to low probability values.

The determination of the confidence interval to this problem usually requires approximating the binomial distribution by the normal distribution (which is a good hypothesis in the present case) or maybe using more sophisticated methods, like using bayesian inference, but one should take care not to use the central limit theorem here, since the number of samples is actually 1.

Therefore the result of a public opinion poll is simply a (educated) guess. Its results must be analysed with care and never in the way described above (the "conclusion" that president Dilma is falling in the public preference). This use of statistics is simply political misuse. Naturally all this analysis is based on one premise, namely, that the sample is extracted from an homogeneous population. In my opinion this is one of the largest failures in the public opinion polls. I believe it is possible to fraud a public opinion poll by carefully choosing the place and the time in which the interviews are made. Naturally there are protocols which have to be followed, but even so, I believe one can direct the answers depending on the will of the institute. Naturally in a civilized country, a public opinion institute which resorts to this kind of strategy would end up loosing credibility, but the one sided nature of the Brazilian Press will surely ensure that any misuse of the public opinion polls will be unpunished.

Post a Comment