On Monday, we released our final estimate of the Virginia Gubernatorial Primary: 46% Perriello, 41% Northam, 13% undecided. On Tuesday, the final results: 55.92% Northam, 44.08% Perriello, 16.84% point difference between estimates and the outcome. What happened?
As bears repeating: polling is not an exact science. A sample of a population is being used to estimate what the whole thinks, and there are many factors that can make things hard to predict. For example: a sample may not be representative due to oversampling one candidate’s supporters. Respondent bias is another; when you poll 500 people, and you get 50 people who are misleading, this can swing the result some percentage points in either direction. Anyone who hits an exact margin is doing so by chance. Long, drawn-out elections that are played out on a national scale with many more people (and at much higher engagement) is easier to predict — like a presidential election. An off-election year is harder to predict. An off-election year primary is even harder. Determining the correct composition of the electorate has vastly different outcomes on the final result. One recent example: UK Pollsters suppressing the youth voting surge in their own data, since it looked unbelievable.
This may not sound like it matters very much, but knowing what the electorate is going to look like really does matter. The New York Times conducted a poll of Florida voters to see who would win the presidential election in 2016. They computed the results themselves, but also handed these to other polling firms to see what they would interpret the data to get.
Four respected pollsters, given the same raw polling data, produced different outcomes — incidentially, the outlier there of Trump+1 happened to be closest to the final result in Florida. It’s easy to say that they were all bunk except the one that had Trump winning, but they aren’t the result of bad math or funky logic. Assumptions about the composition of the electorate in November, likely informed by proprietary voter tracking data held by each firm, are the cause of the differences between each polling firm.
Now, let’s look at Virginia. We commented that Perriello’s numbers were a best-case scenario under the assumption that low-propensity/low-engagements voters (i.e. youth voters in this case) would form a larger-than-normal composition of the primary electorate in Virginia to propel him to victory. Pre-tracking data we had compiled prior to releasing the Virginia poll had indicated to us that older voters would form a smaller share of the electorate than what we would have expected in a previous primary year. In order for Perriello to win, he would have had needed to have a very strong turnout operation across the state. So we originally stated Perriello’s numbers were 46%, when you subtract the voters who would end up staying home, his numbers were closer to 40% vs Northam’s 41%. Northam’s voters were older, more consistent voters who almost always show up to vote on election day.
— FiveThirtyEight (@FiveThirtyEight) June 13, 2017
Most polls were heavily divided on this race.
Virginia also allows anyone to vote for any primary in either party. Virginia is becoming steadily more democratic, and as such the democratic primary got a lot more attention (and voters) than the GOP one. It’s certainly possible, even likely, that moderate republicans and right leaning centrists ended up voting in the democratic primary for Northam.
So after all that, what led to the large polling miss? Some combination of a unpredictable electorate, republicans switching sides, a margin of error, and a flood of undecideds going towards Northam. Despite the large margin miss, we were still the closest of any of the pollster to Perriello’s actual vote share: 46% is about the closest any pollster got to his final value of 44%.
Stay tuned for our GA-6 poll, we’re going to try and release a few so we can track the state of the race leading up to Tuesday!