Well, the tallies from the 2016 Presidential Election are now finalized and the results certified. So, it’s official. Despite losing the popular vote to Hillary Clinton by approximately 2.9 million votes, Donald Trump will become the 45th President of the United States after securing 304 votes in the Electoral College.
I don’t think this outcome was ever seriously in doubt since the election on November 8th. So, let’s move on to the other reason why it matters that all the votes have now been counted up and certified.
For pollsters and numbers geeks such as myself, this is exciting because it means we can now finally go ahead and figure out, once and for all, just how badly election polls missed the mark this past year. Clearly, the polls underestimated support for Trump, but to what extent and where precisely?
Let’s dive in and take look.
In a recent article in the Huffington Post, senior polling editor Natalie Jackson concluded – after comparing official election outcomes in each state to pre-election projections from major forecasters, such as FiveThirtyEight, HuffPost Pollster, and RealClearPolitics – that polls underestimated support for Donald Trump in approximately 33-37 states.
Ouch! So, the polls weren’t just randomly off this past election cycle. They were systematically off. And across a wide and diverse region of the country.
Below is a chart showing the Huffington Post’s major findings.
Jackson describes the polling miss from this past year as follows:
The chart above illustrates just how lopsided the aggregate estimates were. The vertical black line at 0 represents the actual vote margin. Anything to the right of that line indicates that the aggregate polls missed in a way that underestimated Trump’s vote. Anything to the left of the line indicates that they underestimated Clinton.
In the few states where there seems to be no bar ― Virginia and Colorado, for example ― aggregates were very close to the actual vote.
The graph clearly shows there were far more Trump underestimates. But it also shows that the president-elect was underestimated by more than Clinton [emphasis added]. HuffPost Pollster underestimated Clinton most in California by about 6 points; FiveThirtyEight and Daily Kos underestimated her most in Hawaii by 8-9 points. Contrast that with the other end of the chart: HuffPost Pollster and Daily Kos underestimated Trump by more than 10 points in seven states, and FiveThirtyEight did the same in nine states.
Jackson’s analysis got me to thinking about how well my own model forecast the popular vote in each state, so I set out to perform a similar comparison. Perhaps unsurprisingly, the results of my own post-mortem are pretty similar to the findings reported above.
Errors in my popular vote projections in each state were also systematically off and lopsided in favor of Trump. Trump was underestimated more frequently than Clinton, and he was underestimated to a greater degree than Clinton (see below, though note that the values for “Disrepancy” on the x-axis are reversed relative to the Huffington Post chart).
Similar to the findings reported in the Huffington Post, my model underestimated Clinton’s performance in only eleven states, including the District of Columbia.
Meanwhile, the model underestimated Trump’s performance in a whopping 40 states. And for ten of these states, I underestimated the margin between Trump and Clinton by 10 points or more (so mostly on par with the other models and polling aggregator sites mentioned above).
For so-called battleground states, the situation was similar but errors were generally smaller.
As shown in the graphic above, my model’s projections underestimated Clinton’s performance in only a single battleground state, namely Nevada. For Trump on the other hand, projections were off in fourteen states, with the largest errors occurring in Missouri, Iowa, and Ohio. For Arizona and Colorado, the model’s estimates were relatively close, missing the mark by only about 1.4 percentage points or less.
Now, none of this is to say that my model – or any other major forecasting model – incorrectly predicted who would eventually win in these states where poll numbers were off. In fact, most forecasters correctly predicted the winner in most states. And my model correctly predicted the winner in nine of fourteen battleground states, offering up incorrect predictions only for Michigan, Ohio, Pennsylvania, North Carolina, and Wisconsin.*
However, an error is an error, regardless of whether it changes one’s prediction about who will win in a particular state. And those in the polling and forecasting business shouldn’t fool themselves into thinking an error is any less important or meaningful provided it’s in the right direction. If public polling research is to be improved, then we need to account for and explain all sources of error, not only those which change the outcome in a given state.
And on that note, it’s worth yet again highlighting the conclusions of Natalie Jackson over at Huffington Post:
It’s noteworthy ― but not unusual ― that the largest poll misses were actually in states where polls accurately predicted the winner [emphasis added]. So while polls had the right candidate, they didn’t predict what a landslide the results would be in California and Hawaii on the Democratic side and West Virginia and Tennessee on the Republican side. This is a pattern we’ve known about for many election cycles. Due to undecideds and “other” categories, polls almost always underestimate the winner’s vote share when it’s an overwhelming win.
That tells us something interesting about polling imprecision: We ignore errors when they’re in the right direction, even when they’re really big. That has to stop if polls are to regain any of the credibility they lost this year [emphasis added]. We can learn just as much by studying how polls underestimated Clinton by 8 points in California as we can from studying Trump’s 6-point underestimate in Wisconsin.
Here’s hoping pollsters heed this recommendation and learn from 2016 before gearing up for the next round of elections.
*Note: I’ve ignored New Hampshire here because Clinton and Trump split electoral votes 3 to 1. Since my model did not use congressional district-level polls, it did not allow for electoral votes to be split between candidates in states where this is permitted.
Brian Kurilla is a psychological scientist with a Ph.D. in cognitive psychology. You can follow Brian on Twitter @briankurilla