(cache)How our presidential prediction model works

The Economist’s statistical forecast of American presidential elections has returned for its second contest in 2024. Developed with the assistance of a team of political scientists led by Andrew Gelman of Columbia University, our model calculates Joe Biden’s and Donald Trump’s probabilities of winning each individual state and the election overall. Its projections will be updated every day at this site. For readers interested in a fuller understanding of how our forecast works, below is a detailed methodological summary.

Poll position

The first step in our model is to generate a prediction for the national popular vote on election day. We use two main sources of information: national polls and “fundamentals”, the term in political science for structural factors that influence voter decisions. During the early months of election years, the public has paid little attention to the race; campaign issues have not yet been defined; and voters who have a soft but consistent preference for one of the two major parties often say they are undecided or planning to vote for a third party. This makes polls in the first half of the year a surprisingly weak predictor of final results. For example, in June 1988, George H.W. Bush trailed Michael Dukakis by 12 percentage points in polling averages (he went on to win by eight). Exactly four years later, Mr Bush led Bill Clinton by ten percentage points, and wound up losing by seven.

In more recent years, polling errors have been a bit smaller—but they can still be substantial. In 2000 Mr Bush’s son George W. saw his ten-point lead over Al Gore in the popular vote turn into a deficit during the final three months of the campaign. It took the Electoral College and a disputed 537-vote victory in Florida to save his presidential bid. And notoriously, Hillary Clinton led Donald Trump by around eight points in June, August and even October of 2016, before she barely squeaked out a two-point edge in the popular vote.

By contrast, fundamentals-based forecasts tend to be quite stable, and often foreshadow how voters are likely to change their minds once they tune in to politics and their dormant partisan leanings kick in. One of the best-known examples, a model called “Time for Change”, was designed by Alan Abramowitz, a political scientist at Emory University. It predicts the popular vote (excluding third parties) using solely the president’s net approval rating, GDP growth, and whether or not a first-term incumbent is running for re-election. Historically, its predictions of the share of the popular vote won by the president’s party have had an average error comparable to that of polls taken late in the campaign.

On the regular

A common criticism of fundamentals models is that they are extremely easy to “over-fit”—the statistical term for deriving equations that provide a close match to historical data, but break down when used to predict the future. To avoid this risk, we borrow two techniques from the world of machine learning, with appropriately inscrutable names: “elastic-net regularisation” and “leave-one-out cross-validation”.

Elastic-net regularisation is a method of reducing the complexity of a model. In general, equations that are simpler—or more “parsimonious”, in statisticians’ lingo—tend to do a better job of predicting unseen data than convoluted ones do. “Regularisation” makes models less complicated, either by shrinking the impact of the variables used as predictors, or by removing weak ones entirely.

Next, in order to determine how much of this “shrinkage” to use, we deploy “leave-one-out cross-validation”. This technique involves chopping up a dataset into lots of pieces, training models on some chunks, and testing their performance on others. In this case, each chunk is one election year.

To test the accuracy resulting from one amount of shrinkage, we start by taking the data from the first post-war presidential election, held in 1948, and hiding it in a lockbox, where we can no longer see it. Next, we train a fundamentals model on the remaining elections, those held between 1952 and 2016. After simplifying the resulting equation using the specified amount of shrinkage whose performance we want to evaluate, we use this stripped-down model to predict what would have happened in 1948. We then repeat this process for the 18 remaining elections—fitting a simplified model on all years except 1952 and using it to predict 1952; fitting a simplified model on all years except 1956 and using it to predict 1956, and so on.

After completing this cycle, we are left with a list of 19 forecasts, one for each election year. Each prediction uses the same amount of shrinkage, and was generated solely using data from elections other than the one being projected—just as we will need to predict the results from 2024 without using data from that year. After recording the accuracy of the resulting predictions, we repeat this cycle 100 times, using a different degree of shrinkage each time. Whichever shrinkage factor proves most accurate is the winner.

Using this method, we tested a wide range of combinations of potential predictors to use in a fundamentals-based projection. After applying the optimal amount of shrinkage, the one that did best when forecasting “held-out” elections was a close cousin to Mr Abramowitz’s venerable approach. Rather than granting a benefit to a first-term incumbent, we assigned a penalty to parties that had already been in power for at least two terms (in keeping with the spirit of the “Time for Change” brand). And rather than simply using second-quarter GDP growth, we used a blend of the changes during the past year in a broad range of economic indicators. We found that these economic metrics only seemed to affect voter behaviour when incumbents were running for re-election, suggesting that term-limited presidents do not bequeath their economic legacies to their parties’ heirs apparent. Moreover, the magnitude of this effect has shrunk in recent years because the electorate has become more polarised, meaning that there are fewer “swing voters” whose decisions are influenced by economic conditions.

Moral victories are for losers

This analysis has focused exclusively on the national popular vote. However, as supporters of Mr Gore and Mrs Clinton will remember bitterly, getting more votes than your opponent provides no guarantee of occupying the White House. That’s because America elects its president through a unique “electoral-college” system of voting whereby states, not people, do the actual voting. To predict results in the individual states whose electors determine the victor, we repeat the exact same process as above, with a twist. Instead of seeking to forecast the absolute vote share, we project each state’s “partisan lean”: how much it more it favours Democrats or Republicans than America as a whole does, and thus how it would be expected to vote in the event of a nationwide tie. For example, Nevada, which a Republican has not won since 2004, has actually had a slight lean towards Republicans in the past two elections. Mr Biden won it by 2.4 percentage points, a smaller gap than his nationwide margin of victory of 4.5 percentage points.

To produce central estimates for each state’s partisan lean in each election, we use a state’s partisan lean in the previous two presidential elections; the home states of the presidential candidates and their running mates; its population density; the share of the nationwide electorate that has switched which party it supports for president in recent prior elections; and, crucially, the actual national popular vote in the year in question. Including this final predictor enables us to cast off the assumption of “uniform swing”—the notion that if a candidate gains or loses popularity nationally, that change will be reflected equally in every state—and allow its estimates of the impact of the national political environment on individual states’ preferences to vary. We also model the uncertainty in these predictions, based on estimates of the share of swing voters in the state and how far its central estimate is from 50/50 (vote shares in states that give lopsided margins to either side tend to be less predictable).

Back to Bayes-ics

Readers acquainted with the workings of similar forecasting models may be surprised that horse-race polls from 2024 have not yet entered the equation. This exclusion is by design. Our model follows a logical structure first developed by Thomas Bayes, an 18th-century reverend whose ideas have shaped a large and growing family of statistical techniques. His approach works in two stages. First, before conducting a study, researchers explicitly state what they believe to be true, and how confident they are in that belief. This is called a “prior”. Next, after acquiring data, they update this prior to reflect the new information—gaining more confidence if it confirms the prior, and generally becoming more uncertain if it refutes the prior (though not if the new numbers are so definitive that leave little room for doubt). In this framework, the expected distribution of potential vote shares in each state derived above is the prior, and polls that trickle in during the course of the campaign are the new data. The result—a “posterior”, in Bayesian lingo—is our forecast.

Just as there is uncertainty in our prior—five months from the election, its 95% confidence interval for Donald Trump’s vote share in North Carolina might stretch from, say, 46% to 58%–so too is there uncertainty in polls. Readers will probably be familiar with the official “margin of error” that pollsters state when reporting their results, typically of a few percentage points. However, this number only contemplates one potential source of error: the risk that a perfectly random sample of a given size may not reflect the characteristics of the population as a whole (known as “sampling error”). In fact, the group of people who participate in any given survey are virtually never an idealised random sub-set of the population that will actually turn out to vote. Instead, they can differ from the eventual mix in important ways, which collectively are known as “non-sampling error”.

First, polls are subject to the vagaries of voter turnout. Polls conducted among all adults will include the views of people who are ineligible or not registered to vote. Those limited to registered voters treat all respondents in this group as if they had an equal probability of showing up to vote, which they surely do not. And those that seek to filter out respondents unlikely to vote, or that grant more weight to the views of people who are more likely to show up, can get such calculations wrong. Although no two surveys are identical, ones that use a similar approach to predicting turnout are more likely to wind up with errors of a similar size and direction than are ones that handle it differently. In statistical terms, each of these different methods of turnout projection can produce a “bias”, which is likely to contaminate the results of all the pollsters that use it in a similar way.

The same is true of other sources of non-sampling error. The group of people pollsters can reach by using live telephone interviewers may have different voting intentions than those they can reach by automated phone calls, or via the internet. Individual pollsters may make methodological choices, such as weighting schemes, that consistently lead to more or less favourable results for a particular political party.

Ahead of time, it is impossible to know the direction or size of the bias that each of these characteristics may introduce. However, as the campaign goes on, different pollsters using different methods will wind up conducting surveys of the same place at similar times. By comparing the results of, say, all-adult versus likely-voter polls of Iowa taken in mid-May, and then comparing the results of all-adult versus likely-voter polls of Florida taken in early August, and repeating this process for all possible permutations of method, geography and time, our model estimates the impact of each of these factors on survey outcomes, and adjusts for them.

The final step in our treatment of polls is pooling the information they provide. Battleground states tend to be polled regularly; less competitive ones may be surveyed infrequently or not at all. Even if we lack recent polling in a given state, however, we can make educated guesses about the current state of its residents’ political preferences based on polling from elsewhere.

The simplest form of such information-sharing is just an adjustment for overall national trends. Let’s say that the most recent poll in Minnesota was taken six weeks ago, and gave Democrats a six-point lead at a time when Democrats led national polls by four points. Now suppose that in the intervening six weeks, Republicans have surged nationwide, and now sit on a three-point overall lead. It is highly unlikely that Minnesota voters were immune to this shift. The most probable scenario is that Republicans have gained the same seven percentage points in Minnesota that they gained everywhere else, and thus that they are in fact up by around one point in the state.

However, we can also extend this method to state polls. Some states are quite similar, either because they are neighbours, because they have comparable demography, or both—think of pairings like Minnesota and Wisconsin, or Alabama and Mississippi—and others are quite different (e.g., a pairing of Minnesota and Alabama, or Wisconsin and Mississippi). The more two states resemble each other, the better shifts in public opinion in one state can predict those in another. Our model thus allows every state poll to influence its estimate of where voter preferences stand in every other state, by varying amounts. The strength or weakness of this effect is determined by nine factors: how a state voted in past presidential elections; its racial makeup and level of educational attainment; the median age of all its residents; the average number of people living within five square miles of the average resident in the state; and the share of voters in the state that are white evangelical Christians. The result is that the model will treat a poll of Wisconsin almost as if it were a poll of Minnesota, and sharply update its estimate of Minnesotans’ views based on data from a neighbouring state. However, such a poll would have little impact on its prediction of the vote in Alabama.

Putting the pieces together

After making all of these adjustments to polls’ reported results, we are ready to use them to update our prior. Our method is an expansion of a technique first published by Drew Linzer, a political scientist, in 2013. It uses a statistical technique called Markov Chain Monte Carlo (MCMC), which explores thousands of different values for each parameter in our model, and evaluates both how well they explain the patterns in the data and how plausible they are given the expectations from our prior. For example, what would the election look like if all online pollsters over-estimated the Republicans’ vote share by five percentage points? How about if all national polls over-estimated Democrats by two? If state polls of Michigan are oscillating by ten percentage points at a time, the model will incorporate more uncertainty in its prediction of the vote there—and in its predictions of the vote in similar states, such as Ohio.

For every day that remains until the election, the MCMC process allows state polling averages to drift randomly by a small amount in each of its 10,001 simulations. Each step of this “random walk” can either favour Democrats or Republicans, but is more likely to be in the direction that the “prior” prediction would indicate than in the opposite one. These steps are correlated, so that a shift towards one candidate in a given state is likely to be mirrored by similar shifts in similar states. As the election draws near, there are fewer days left for this random drift to accumulate, reducing both the range of uncertainty surrounding the current polling average and the influence of the prior on the final forecast. In states that are heavily polled late in the race, the model will pay little attention to its prior forecast; conversely, it will emphasise the prior more early in the race or in thinly-polled states (particularly ones for which it cannot make reliable assumptions based on polls of similar states).

The ultimate result is a list of 10,001 hypothetical paths that the election could take. Some of them involve large nationwide, regional, or demographic polling errors benefiting one party or another. Some will show registered-voter polls suffering from a large bias in one direction; others little difference between types of survey populations or polling methods. The more likely a scenario, the more often it will appear in these simulations—but even highly improbable ones (such as Mr Biden winning the electoral college despite losing the popular vote) will show up every so often. The resulting probabilities of victory are simply the fraction of these simulations that a given candidate wins.

Like all models, our forecast relies on the assumption that the historical relationships that have governed voter behaviour and pollster accuracy in the past will continue into the future. In politics, unlike physics, this is not guaranteed. Sooner or later, voters will do something that past precedents implied was exceedingly unlikely, and models like ours will be subjected to a fresh round of criticism. But as long as such “black-swan” events happen roughly as much as we expect them to—neither too often nor too infrequently—our model will be doing its job. And if our stated probabilities do wind up diverging from the results, we welcome the opportunity to learn from our mistakes and do better next time.■

How The Economist’s presidential forecast works

Poll position

On the regular

Moral victories are for losers

Back to Bayes-ics

Putting the pieces together