Well, I guess I picked the wrong year to get interested in forecasting presidential elections.
As we all know by now, Republican nominee Donald Trump has been elected the 45th President of the United States, with 279 electoral votes and roughly 47% of the popular vote.
However, my statistical model, which was based on national and state-level polls, suggested Democratic nominee Hillary Clinton would be the likely winner. Going into election day, I estimated she had about a 75% chance of coming out on top.
But fortunately for me (and unfortunately for the entire field of public opinion research), I wasn’t alone in making this mistake. All of the major polls and forecasters got this election wrong, some by much wider margins than others.
Below is a screenshot from the New York Time’s blog The Upshot, showing election predictions from each of the major prediction websites and forecasters. Each prediction reflects an estimate of Hillary Clinton’s chance of winning the Presidency as of the morning of November 8, 2016.
Notably (and perhaps this is a small measure of vindication for me), my model gave Trump greater odds of winning than most of the major sites, with the exception of FiveThirtyEight. On the morning of the election, I pegged Trump’s odds of winning at around 3:1 against. He seemed to need a fairly substantial (though not entirely unprecedented) error in polling to pull off a victory, and that’s exactly what he got.
So, what went so horribly wrong this election cycle?
The short – and probably overly simplistic – answer to that question seems to be that most public polling firms failed to include in their samples a large swath of likely voters who ultimately broke for Trump on election day. Specifically, pollsters seem to have underestimated Trump’s support among white non-college educated voters. Therefore, although the above models make differing assumptions and employ various different weighting strategies, all of these technicalities probably ended up being a moot point this year. The polls upon which the models were based systematically underestimated Trump’s support in key regions of the country. And a model is only as good as the data you put into it. “Garbage in, garbage out,” as they say.
It will likely take months of pouring over data before we finally have a satisfying and definitive answer for went went wrong. In the meantime, I’m working on compiling a list of various websites and news outlets that have so far discussed this year’s surprising polling miss. I’ll continue to add to the list (below) in the days and weeks to come, as more details and answers emerge.
What websites and news agencies are saying about what went wrong
Geoff Garin, a veteran Democratic pollster who worked for the pro-Clinton super PAC Priorities USA, said many surveys had under-sampled non-college-educated whites, a group that Trump appealed to. He also argued there had been on over-emphasis on the belief that the country’s rising demographic diversity would put Clinton over the top.
While the errors were nationwide, they were spread unevenly. The more whites without college degrees were in a state, the more Trump outperformed his FiveThirtyEight polls-only adjusted polling average,1suggesting the polls underestimated his support with that group. And the bigger the lead we forecast for Trump, the more he outperformed his polls.2In the average state won by Trump, the polls missed by an average of 7.4 percentage points (in either direction); in Clinton states, they missed by an average of 3.7 points. It’s typical for polls to miss in states that aren’t close, though. The most important concentration of polling errors was regional: Polls understated Trump’s margin by 4 points or more in a group of Midwestern states that he was expected to mostly lose but mostly won: Iowa, Ohio, Pennsylvania, Michigan, Wisconsin and Minnesota.
From The Guardian:
The polls were wrong. And because we are obsessed with predicting opinions rather than listening to them, we didn’t see it coming.
From The Atlantic:
The problem with finding accurate and random samples of voters to poll has plagued polling since cell phones came into wide use. Prior to that technological development, the ubiquity of landline telephones made finding reasonably-random and representative samples easy, as pollsters could just pick random names out of phone books, call potential voters, and talk them through interviews, which supplied the kinds of rich context and human understanding necessary for properly analyzing their responses. That method also ensured reasonably high response rates and helped control nonresponse bias, by which the polls themselves become skewed by the kinds of people who tend to answer.
From the Washington Post:
It’s going to take more than a few days, weeks, or even months to sort out, conclusively, the roles sampling and weighting may have played in polling errors. However, one thing is clear: How pollsters determine who is actually going to vote is broken, regardless of the various approaches taken by public pollsters and the campaigns themselves. At the end of the day, tens of thousands, hundreds of thousands, or millions of people tagged as likely to vote didn’t bother to do so; and perhaps some deemed unlikely to get to the polls did vote.
As it has done in the last several elections, AAPOR has already convened a panel of survey research and election polling experts to conduct a post-hoc analysis of the 2016 polls. The goal of this committee is to prepare a report that summarizes the accuracy of 2016 pre-election polling (for both primaries and the general election), reviews variation by different methodologies, and identifies differences from prior election years.
Brian Kurilla is a psychological scientist with a Ph.D. in cognitive psychology. You can follow Brian on Twitter @briankurilla