The right wing comes with an advantage to the first debate of the campaign in Madrid. Dozens of polls have been published that right now place the PP as the most voted party (around 41% -42% in votes), followed by PSOE (23%), Más Madrid (13% -14%), Vox (9 %), United We Can (7%) and Citizens (4%).

But, in addition, there are some tendencies on the left. The PSOE has dropped one point in two weeks, and Unidos Podemos perhaps a few tenths, while More Madrid rose: Mónica García’s party has gone from 12.5% to 13.5%.

The outcome of the elections depends on two keys: the entry of Citizens and the exact balance of votes between the blocks. The sum of PP and Vox is around 50% of votes and is six points ahead of the sum of PSOE, Más Madrid and Podemos (44%). But let’s see how that translates into seats and the odds of victory.

### The prediction of seats

The graph below represents our estimate of seats from the average number of polls. The PP would be around 60 deputies, followed by PSOE (33), Más Madrid (19), Vox (13), Unidas Podemos (10) and Ciudadanos (0 probable result; 2 on average).

To make this estimate we use a statistical model and simulate the elections 15,000 times. The model is fed by soundings and incorporates a key piece of information: its historical success. The result is wide forks, but not whimsical, because they represent the precision that soundings have had in the past. The detailed methodology can be consulted at the end of the text.

So it is easy to see the uncertainty. The polls are wrong by a couple of points per game, and it is normal to see errors of three or four points with some. That explains that, for example, the most likely result of the PP is to achieve 61 seats, but that its 90% probability interval goes from 50 to 69 seats. In other words, one out of every twenty times we would see the PP above (or below) those figures.

### The key: who will reach the majority

The main advantage of having a prediction model is that it allows you to attribute probabilities to different outcomes, something that polls cannot do on their own. In this way we can answer the key question of these elections: Which parties have the option of adding the 69 necessary deputies? The graph shows the summary:

**3 out of 4 times (77%) there will be a right-wing majority (PP and Vox).**Of the 15,000 simulations, that’s how often the two parties add up the 69 seats they need**1 out of 10 times (12%) there will be a majority of the left (PSOE, MM and UP).**It is the probability that something happens that gives them an advantage: a change in the next few days or an error in the polls. This includes the possibility that both Vox and Ciudadanos fall 5%, and Isabel Díaz Ayuso will be left without partners.**1 out of 10 times (8%) Citizens will be decisive.**This is the combined probability that two things will happen, (1) that Cs will exceed 5% of the vote and win seats (20% probability), and (2) that those seats will need right and left.**And… 1 in 50 times there will be a tie.**As the assembly distributes an even number of seats, it may happen that PP-Vox and PSOE-MM-UP tie at 68 seats.

This last graph summarizes the forecasts of the polls for these elections, taking into account what they all point together and also the probability they have of being wrong. I will update it from today until the latest polls are published.

*Subscribe here** to the Kiko Llaneras ‘newsletter’ where he analyzes and explains current events with data and graphs.*

### Methodology

Predictions are produced by a statistical model based on soundings and their historical accuracy. The model is similar to the one we used in the elections of April and November 2019, in Mexico, France, the United Kingdom, Andalusia or Catalonia. It works in three steps: 1) aggregate and average the polls, 2) incorporate the expected uncertainty, and 3) simulate 15,000 elections to distribute seats and calculate probabilities.

**Step 1. Average of surveys. **Our average takes dozens of probes into account to improve its accuracy. The average is weighted to give different weight to each survey according to three factors: the size of the sample, the survey house, and the date.

**Step 2. Incorporate the uncertainty of the surveys.** This is the most complicated and important step. The expected precision of the soundings needs to be estimated. How big are the usual errors? How likely are 2, 3, or 5 point errors to occur? To answer these questions, hundreds of surveys in Spain and thousands of international ones are studied.

*Calibrate the expected errors.* First, the error of the surveys in Spain is estimated. A database is built with all the elections since 1986. The mean absolute error (MAE) of the poll averages has been around 2 points per party. This means that deviations of 3 or 4 points were common and that the margin of error (at 95%) is close to seven points for parties with around 30% of the vote. These errors depend on at least two things: the size of the party and the proximity of the elections. To take these two factors into account, the Jennings and Wlezien database is used, published in Nature. The errors of more than 4,100 polls in 241 elections in 19 Western countries have been analyzed. Thus, a simple model is built that estimates the MAE error of the average votes estimated by the polls for each party, taking into account: 1) its size (it is easier to estimate a party that is around 5% in votes than one that exceeds the 30%), and 2) the days until the elections (because the polls improve in the end).

*Choice of the type of distribution.* To incorporate the uncertainty into the vote of each party in each simulation, a multivariate distribution is used. Student-t distributions are used instead of normal so that they have longer tails (kurtosis): this makes very extreme events more likely to happen. The advantages of that hypothesis la explica Nate Silver: “I have estimated the level of kurtosis with the database. Then I define the covariance matrix of these distributions so that the sum of the votes does not exceed 100% (a idea de Chris Hanretty). I incorporate the uncertainty with 53 distributions, one at the national level and another in each province. The first distribution introduces equal errors for the vote of a party in all of Spain. It is important to do so because, in general, survey errors are systemic and the same in all territories. If we assume them independent, the errors cancel between provinces and the model fails due to overconfidence. This happened with some models of the US elections in 2016. I incorporate the second part of the uncertainty about each province. Finally, the amplitude of the covariance matrices must be scaled so that the voting distributions that result in the end have the MAE and the standard deviation expected according to the calibration “.

**Step 3. Simulate.** The last step is to run the model 15,000 times. Each iteration is a simulation of the elections with percentages of vote that vary according to the distribution defined in the previous step. The results of these simulations allow us to calculate the probabilities that each party has of winning a certain number of seats, reaching a majority, finishing first, and so on.

**Why surveys.** This model is based entirely on surveys. There is a perception that polls are unreliable, but the truth is that the polls have not done badly lately. In the last two or three years they have been quite accurate in Spain, although with exceptions, such as the Andalusian elections of 2018. Polls are rarely perfect, but there is no alternative that has been better demonstrated.

elpais.com

Eddie is an Australian news reporter with over 9 years in the industry and has published on Forbes and tech crunch.