Hey gang! It’s been a while, so let’s just jump in, shall we?
Every now and again (every four years, apparently) the obsessive statistical monster in me lashes out, and I start thinking obsessively about polls. Full disclosure: while I’m a bleeding heart liberal through and through, I’m mostly motivated by how well polls and other forecasting approaches are able to predict the future; there’s nothing partisan about my approach (though perhaps something partisan in how I react viscerally to it). I should also warn you ahead of time, that there’s a fair amount of weeds-adjacent wonkiness, so if you simply want to click on the picture above and play with the sliders, I wouldn’t blame you. That is the nature of the “technical” tag.
I’ve created an awesome new election widget at:
You can see a snapshot of it at the top of the page. It’s dynamically connected to a database which I keep synced with polling averages from the Huffington Post Pollster, a weighted estimate of statewide and national polling. I like Pollster because it weights by sampling size and has a smooth window for inclusion.
The widget allows you to make whatever assumptions you like about the systematic biases in polling in general, or about the amount of polling error, and the probabilities for each state adjust automatically. States with more than a 10% probability for either candidate are considered “tossups” (admittedly, that part is a bit arbitrary). A scrollover on the map will give you some useful information like electoral votes, the 2012 outcome, and what the current statewide polling says.
Knowing nothing else, I assume that 2016 will bear at least some resemblance to 2012. I therefore use as my starting point that state margins will be the same as in 2012, but corrected for whatever the national trend is. That is, if Clinton leads Trump by 7% in national polls, but Obama only led Romney at this point in time by 3%, then each state gets a 4% bump in the Democratic direction. Only a few states have polling at this time. For the moment I’m using the Real Clear Politics averages, but will move to Pollster once they start posting aggregates after the conventions. I’m only putting 50% confidence in state polls (also admittedly arbitrary), even if it exists, and am averaging with the national trends. Many polls are still small sample sizes, and vary wildly later in the cycle.
Some might argue that this entire project is a little premature as neither party is actually done selecting their candidate. I would disagree strongly on the Democratic side (where Sanders has run a great campaign, but would be required to win many states with large margins that he currently trails by double digits in the polls). Though the Republican side is a little less settled, Trump is currently the strong favorite, and the one whose candidacy is keeping me up nights. If I turn out to be wrong in either case, it’s easy enough to swap in two new candidates. Three, on the other hand, would be pretty unpredictable.
Random Errors and Outliers
How good are the polls?
Short answer: Collectively, they’re pretty damn good.
While my widget allows for setting random measurement errors or a systematic bias, it’s not self-evident how those numbers should be set or even if polling predictions are normally distributed. If polls followed a normal or Gaussian distribution, we’d expect that about 68% of the time, the true state outcome would be within 1 (the standard distribution) of the polled value, and 95% of the time, within 2 .
To investigate, I looked at the results of the 2008 and 2012 presidential elections (data here). I didn’t compare every state. In 2012, Romney won Utah by 48 points, and Obama won Washington DC by 83! There would have been little use in extensively polling either, since the outcomes were never in dispute. Instead, I’m focusing on the states which most closely mirror the national outcome. That is, Obama won the national vote by 3.90% in 2012. He won Virginia by 3.87%, making it the closest bellwether of the states as a whole. I sorted by deviation from the national average and then took the top 23 states (after which the polling got sparse) and compared them to the Pollster averages leading up to the election:
By visual inspection, the average of polls did a pretty good job. To quantify this somewhat, I made a histogram of the true versus the polled result:
Even the worst polled states only missed by a few percent. To give you a sense of the distribution:
- 0.7% bias for McCain (That is, Obama did slightly better than expected)
- 2.5% random error by state
- 1.9% bias for Romney
- 2.3% random error by state
The average of polls, especially in competitive states was a very good estimate of the final outcome. I wouldn’t read overly into the very slight Republican bias in the polling, but note the relatively small scale.
Finally, there’s the question of outliers. For 40 or so state results (over 2 elections), we’d expect approximately two “2-” results assuming a Gaussian distribution. Instead, there was only 1 (New Mexico in 2008). The statistical distribution of errors seems to be more or less Gaussian.
Yes, the polling could be very off in this cycle because of how unusual the election is, but given the Obama candidacy in 2008 and the hitherto unknown effect that race might have the on the race, it seems strange that the pollsters could have gotten it so right back then, but not this time.
The World of Prediction
These days, there’s a whole cottage industry around trying predict exactly what will happen on election day in every state. And with due respect to Nate Silver at 538, Sam Wang at the Princeton Election Consortium or even my dear friends Rich Gott and Wes Colley there’s a degree to which trying to predict each state with certainty smacks of trying to retroactively get the 2000 election right.
There’s an important question here. If an algorithm suggests that there’s a 55/45 chance of a state going to, say, Clinton over Trump, does it really make sense to make a prediction. To be fair to Silver and Wang, they do give confidence estimates, but the reality is that people tend to focus only on the bottom line predicted winner.
Rather, I think the most important question is: who will be the next president and (to a much lesser degree) will she/he win by a large enough margin to have real coattails and or claim a mandate?
There’s something almost pathological about posting the odds this far out. One thing that I haven’t really included is the variability between now and the election as the pictures of the candidates crystallize in the minds of the electorate (though the bias slider allows you to explore that manually). I will say that both Clinton and Trump have been national figures with enormous media attention for a very long time, so perhaps people’s views are already settled. On the other hand, based especially on the Republican primary and initial signs from the general, this promises to be a particularly nasty election. Opinions may swing dramatically.
In either case, I’ll close with a little more historical data. At this point in 2012, Obama led Romney nationally by 3.3% in the Pollster average. That gap was 1.5% by the end, with Romney never leading past this point, and with a maximum gap of only 4.2%. Obama, as I’ve noted, won the national vote by 3.9%. In other words, in 2012, the March snapshot told you almost everything you needed to know about the final state of the race.
On the other hand, at this point in 2008, Obama and McCain were statistically tied, but Obama led consistently after that, with the exception of a few days in August after the initial excitement about Palin’s nomination as VP.
There are some comparisons to be made with the 2008 race with this one. By March of 2008, Clinton and Obama were still in a tight race, with Obama up by about 100 delegates. Clinton didn’t concede the nomination until June 7 of that year. Clinton has a much larger lead over Sanders (about 300 delegates) than Obama did at a similar point in the 2008 nominating contest, so an argument could be made that the candidates are more settled at this point than in 2008 at the same point. Regardless, Obama held a typical lead of about 5%, with a final polling advantage of 7.6%. Obama won the actual popular vote by 7.3%.
Even in 2008, a year which saw the first female Republican VP candidate, the first African-American presidential candidate and president, a huge upheaval in the stock market, and much else, the variation in the polling at this point was considerably less than 10 points. I wouldn’t bet the farm, but a Clinton win in November looks very, very likely. Incidentally, the electronic futures markets agree. At present, the Iowa Electronic Markets give the Democrats a 70-30 chance of winning the election.