The data doesn’t support the obsession with presidential prognostications that estimate the probability of winning an election

**COMMENT | JUSTIN GRIMMER | **Even as Joe Biden’s presidential candidacy teetered and polls showed him clearly losing to Donald Trump, the election forecasting site 538 was still estimating that Biden was likeliest to win. It was a conclusion based on odd modeling assumptions that led the site’s original founder, Nate Silver, to declare the 538 model “very obviously broken” and for the site’s new chief to acknowledge an adjustment to its model when it relaunched with Kamala Harris’ candidacy.

The episode is notable not just for the skirmishing between rival forecasters — but because it revealed how little value should be placed in these projections at all.

I’m a political scientist who develops and applies machine learning methods, like forecasts, to political problems. The truth is we don’t have nearly enough data to know whether these models are any good at making presidential prognostications. And the data we do have suggests these models may have real-world negative consequences in terms of driving down turnout.

Statistical models that aggregate polling data and use it to estimate the probability of each candidate winning an election have become extremely popular in recent years. Proponents claim they provide an unbiased projection of what will happen in November and serve as antidotes to the ad hoc predictions of talking-head political pundits. And of course, we all want to know who is going to win.

But the reality is there’s far less precision and far more punditry than forecasters admit.

Election forecasts have a long history in political science, but they entered the political mainstream because of Silver’s accurate predictions in the 2008 and 2012 elections. Now, many news outlets offer probabilistic forecasts and use those models to declare that candidates have an expected number of Electoral College votes and probability of winning the election. See ABC News’ 538, The Economist and Silver Bulletin, among others.

Are these calculated probabilities any good? Right now, we simply don’t know. In a new paper I’ve co-authored with the University of Pennsylvania’s Dean Knox and Dartmouth College’s Sean Westwood, we show that even under assumptions very favorable to forecasters, we wouldn’t know the answer for decades, centuries, or maybe even millenia.

To see why, consider one way to evaluate the forecasts: calibration. A forecast is considered calibrated if the estimated probability of an event happening corresponds to how often the event actually happens. So, if a model predicts Harris has a 59 percent chance of winning, then a calibrated model would expect her (or another candidate) to win 59 out of 100 presidential elections.

In our paper, we show that even under best-case scenarios, determining whether one forecast is better calibrated than another can take 28 to 2,588 years. Focusing on accuracy — whether the candidate the model predicted to win actually wins — doesn’t lower the needed time either. Even focusing on state-level results doesn’t help much, because the results are highly correlated. Again, under best-case settings, determining whether one model is better than another at the state level can take at least 56 years — and in some cases would take more than 4,000 years’ worth of elections.

The reason it takes so long to evaluate forecasts of presidential elections is obvious: There is only one presidential election every four years. In fact, we are now having only our 60th presidential election in U.S. history.

Compare the information available when forecasting presidential elections to the amount of information used when predicting stock prices, forecasting the weather or targeting online advertising. In those settings, forecasters commonly use millions of observations, which might be collected almost continuously. Given the difference, it isn’t surprising that forecasters in other settings are more easily able to identify the best performing model.

The paucity of outcome data means that election forecasters have to make educated guesses about how to build their statistical models.

Consider how forecasters use polling information: They often calculate a moving average of polling results. To make this average, forecasters assign different weights to polling firms, make assumptions about the kinds of polling errors that are likely to occur and even how those errors are correlated across states. Or consider how forecasters use “fundamentals” — factors like the state of the economy, the party currently in the White House or the president’s approval rating. Forecasters have to decide what factors to include in their model and which prior presidential elections are relevant for fitting their model.

Because of the lack of outcome data, each of these assumptions are made based on what forecasters find plausible — whether based on history or on what produces seemingly useful predictions for this election. Either way, these are choices made by the forecasters.

Statistical models do offer the chance for forecasters to be clear about these assumptions, whereas pundits’ assumptions are often unstated or difficult to determine. But without data to evaluate how the assumptions affect calibration or accuracy, the public simply does not know whether the modeling decisions of one forecaster are better than the modeling decisions of the other.

While we lack evidence that probabilistic forecasts are accurate, there is real evidence that they can create confusion and potentially deter voters from coming to the polls.

A large-scale survey experiment conducted by Westwood, New York University’s Solomon Messing and the University of Pennsylvania’s Yphtach Lelkes shows that forecasts are deeply confusing to Americans — causing them to mix up a candidate’s probability of winning with that candidate’s expected vote share.

In their experiment, they found that sometimes when people see a model forecast (say, a 58 percent chance of victory, or a 58 in 100 chance) they erroneously think that this means that a candidate will win 58 percent of the vote. Indeed, they write, “More than a third of people estimate a candidate’s likelihood of winning to be identical to her vote share, and on average people estimate that likelihood to be closer to the vote share than the probability of winning after they see both types of projections.”

These election forecasts may also create a false sense of security among some citizens about the odds of their side winning, which ultimately causes them not to vote because they feel it’s not necessary.

In a second experiment, Westwood, Messing and Lelkes determined what information people might use when deciding whether to participate in a fictitious election. They found that their participants were very responsive to information when it was provided in terms of probability. And a high probability that their side was likely to win would have made them less likely to cast a ballot. But the same information, provided in terms of vote share, made little difference to their participation.

The bottom line: Probabilistic forecasts are often misinterpreted, and when they are, they may cause voters to stay home.

It’s still possible that these forecasts may end up being the best way to predict the outcome of presidential elections. But right now, we simply do not know if these models are particularly accurate. And we certainly do not know if small fluctuations in the probability of a candidate winning represent anything other than modeling error or meaningless random variation.

*****

Justin Grimmer is the Morris M. Doyle Centennial Professor of Public Policy in the Department of Political Science and Senior Fellow at the Hoover Institution, Stanford University

Source: Politico