The Problem isn’t that Polling is Unscientific. It’s that Democracy Itself is Unscientific

In the wake of Donald Trump’s surprising victory in the 2016 U.S. Presidential Election, there’s been a lot of discussion in the news about what might have gone wrong this year with public opinion polling, which clearly did not foresee this outcome.

Most polls showed Democratic nominee Hillary Clinton holding onto a slim but steady lead for most of the election cycle. During the week prior to the election, the Real Clear Politics polling average showed Clinton with roughly a 3 point lead in the national polls. Furthermore, on the morning of the election, most major forecasting sites pegged Clinton’s chances of winning at 85% or higher (My forecast and that of FiveThirtyEight were two exceptions. We each gave Trump slightly better chances, at around 25-30%).

But a Clinton victory was not to be. For whatever reason (and we will likely not have a definitive and satisfying answer to this for months), polls underestimated Trump’s support in several key regions of the country. Specifically, Clinton lost in Pennsylvania, Michigan, and Wisconsin, despite the fact that polling averages showed her up in these states by 1.9 points, 3.4 points, and 6.5 points, respectively. Clinton does seem poised to win the popular vote, if that’s any consolation to her supporters. But she’ll become the 5th presidential candidate in history to lose the Electoral College despite winning the popular vote.

Although this was a surprising outcome to most, it was not the first major polling blunder to occur on the world’s stage. Nor was it even the first to occur this year. Most major polls also failed to foresee the outcome of the now infamous U.K. Brexit vote from earlier this summer.

Democracy is a Funny (and Frustrating) Thing Because It’s Not Science

No doubt, both these surprises will bring about important changes to the way large scale public opinion research is conducted. After all, precise and accurate pre-election polling is possible, but it’s difficult and expensive. And it’s made all the more difficult by the fact that not everyone votes come Election Day. Of course, polling firms know this full well. That’s why top quality pollsters attempt to screen out respondents who are unlikely to vote and focus instead only on the opinions of those who are believed to be likely voters.

Yet, there’s an obvious problem with this approach. Nobody ever really knows for sure who is going to be a “likely voter.” And while a pollster can make an educated (and perhaps mostly accurate) guess based on screening questions, this still amounts to estimation, which will be subject to a certain margin of error.

To a scientist, such as myself, elections can be rather frustrating because of the “unscientific” way in which they are conducted. In science, researchers work painstakingly to ensure that samples are representative of broader populations. And for good reason. If a sample of research participants is not representative of the population from which it was drawn, then you’ll most likely be unable to generalize your experimental results beyond the artificial laboratory setting. And if this is the case, then your results will be most likely useless.

Pollsters also generally work hard to ensure that their survey samples are representative of state and national populations. But when you then have to go through the extra step of figuring out just how much to bias your sample of respondents in order to make it representative of the fraction of the electorate that will eventually go out to vote, this adds yet another considerable source of error to estimates about voter preferences.  In fact, there’s even some evidence that screening for likely voters occasionally leads to systemic biases in poll results.

Meanwhile, in a democratic election, no attempt at all is made to ensure that the sample of eligible voters who cast ballots on Election Day is actually representative of the broader electorate. Preferences for elected officials are not determined by a poll of random voters. Rather, elections are determined by a self-selected sample of voters – individuals who have (mostly) made the decision for themselves about whether or not to vote (setting aside, for the moment, strategies to drive down voter turnout, such as voter intimidation and voter ID laws). This is important to point out because when a sample is self-selected rather than purely random, it may or may not be representative of the broader population from which it was drawn.

So, although we should be confident that pollsters will refine their methods and develop new safeguards to minimize (though probably not entire eliminate) future polling errors, I can’t help but think that there might be a simpler way to boost the reliability and accuracy of pre-election polling. And that is, quite simply, for more people to vote in elections. If more people simply decided to vote, this would ease the burden on pollsters of having to screen for and predict ahead of time who will be a “likely voter.”

But perhaps this is overly idealistic of me.

Indeed, according to preliminary estimates available at this time, 2016 is on track to have the lowest turnout in a U.S. Presidential Election since 1996. If that’s still true once election officials have finished tabulating all the votes, it will mean only about 55% of eligible voters cast ballots during the 2016 Presidential Election. In the end, Donald Trump may have been voted into the most powerful and influential office in the world by a mere 27% of eligible American voters.

So, Would the Outcome of the Election Have Been Any Different If More People Had Voted This Year?

No, I don’t think so. None of what I’ve written here so far is meant to suggest that Clinton lost the election merely because of low voter turnout.

In fact, if turnout was uniformly higher across the board, Clinton likely still would have lost, and possibly by an even wider margin. This is the conclusion I reached following an analysis of CNN exit poll data from 28 states.

The question I asked was fairly simple: What if the sample of people who decided to cast ballots in the 2016 Presidential Election actually resembled the broader American electorate? For instance, what if the percentage of men who voted in Pennsylvania was exactly the same as the percentage of men who are eligible to vote in Pennsylvania? And what if the percentage of college graduates who voted in North Carolina was exactly the same as the percentage of college graduates who are eligible to vote in North Carolina?

The specific demographic groups I chose to focus on included the following: Male voters vs. Female voters, Younger voters (ages 18-44) vs. Older voters (ages 45 and older), White voters vs. Non-White voters, and College Educated voters vs. Non-College Educated voters.

The graphic below shows the projected outcome in each of the 28 states polled by CNN, after weighting the vote for each demographic listed above according to state population data.

election-results-weighted

As you can see, even if voter turnout was high, and even if the sample of voters who cast ballots was demonstrably similar to the broader electorate, Clinton still likely would have lost the election (though I can’t say that definitively because my analysis only focused on 28 states).

Not only would she would have lost the same key battleground states she actually lost, including Florida, Michigan, North Carolina, Pennsylvania, and Wisconsin, she also would have lost Minnesota and New Hampshire, two states she went on to win by about 1-2 percentage points each.

In fact, as shown in the graphic below, Clinton would have performed more poorly in every state with the exception of South Carolina.

projected-shift

In retrospect, this all makes sense, considering the fact that, in general, Donald Trump won among majority groups, such as white voters, which comprise about 63-73% of the national population, and non-college educated voters, which also comprise about 70% of the national population.

Consider, as an example, the case of reliably Democratic California. According to CNN’s exit polls, nonwhite voters comprised about 52% of the vote in California, even though only about 33% of the state’s population is nonwhite. Therefore, nonwhite voters were over-represented among the sample of people who cast ballots and, as a result, carried a disproportionate influence. Once you adjust the sample of voters to make it more consistent with the broader state population, this disproportionate influence is transformed into a much more proportionate influence, shrinking Clinton’s victory in The Golden State from 29 percentage points (62% vs. 33%) to 26 percentage points (60% vs. 34%). This is a relatively small shift, but clearly enough to flip at least two states from blue to red.

So, What Should We Take Away From All This?

Between the U.S. Presidential Election and the U.K.’s Brexit vote, pre-election polling clearly suffered some major setbacks this year. But despite these high profile misses, I would hesitate to conclude that the entire field has now suddenly been thrown into crisis. Pre-election polls will be analyzed and methodologies will be re-evaluated, which will more than likely lead to important refinements and changes to minimize such misses in the future. The American Association for Public Opinion Research has already convened a panel of experts to carry out an analysis of the 2016 polls, as described in a recent press release:

Pre-election polling is critical to the industry. Such polling can support the democratic process and it offers a very public opportunity to showcase the benefits, and weaknesses, of survey research. Therefore, understanding and being able to articulate the overall outcomes of election polling, the changing methodologies being used, and the potential for variation in the accuracy of polls is vital for the industry.

As it has done in the last several elections, AAPOR has already convened a panel of survey research and election polling experts to conduct a post-hoc analysis of the 2016 polls. The goal of this committee is to prepare a report that summarizes the accuracy of 2016 pre-election polling (for both primaries and the general election), reviews variation by different methodologies, and identifies differences from prior election years.

But, as I stated above, public opinion polling is extremely difficult, as there are always innumerable sources of error in any poll that can never fully be eradicated. And the challenges facing pollsters and forecasters are only compounded by the fact that not everyone votes come Election Day.

So, as I see it, the simplest way to increase the reliability and validity of pre-election polling is for eligible voters to pull their weight and get out and vote. Doing so will ease the burden on pollsters of having to predict not only which candidate(s) will be elected, but also what fraction of the electorate will actually head to the polls to cast a ballot.

And even though higher turnout might not have changed the outcome of this year’s Presidential Election, it might nevertheless have helped pollsters and forecasters to see what was coming.

Data Sources:

  1. Data pertaining to Age & Gender in each state from:

Annual Estimates of the Resident Population for Selected Age Groups by Sex for the United States, States, Counties and Puerto Rico Commonwealth and Municipios: April 1, 2010 to July 1, 2015. U.S. Census Bureau, Population Division. Release Date: June 2016

  1. Data pertaining to Race and Educational Attainment in each state from:

U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates

Author

Brian Kurilla is a psychological scientist with a Ph.D. in cognitive psychology. You can follow Brian on Twitter @briankurilla 

Leave a Reply

Your email address will not be published. Required fields are marked *