If you’re watching the latest polls, make a note of something called “herding.” It could be relevant for discussions of polling after the election. The concept is straightforward. In the final days of an election, poll results tend to trend toward consensus. One possibility is that everyone is finally making up their mind and the picture and reality is coming into focus. But that’s not the only possibility. For a mix of good faith and maybe less than good faith reasons, pollsters can become increasingly leery of publishing an outlier poll. There’s a tendency to “herd” together for extra-statistical reasons.
Let’s say you’re five days out from the election and the polling averages say candidate Jones is up 2 points and you’ve got a poll which says candidate Smith is up 3 points. (Pardon may defaulting to anglo surnames.) Everyone has an outlier result sometimes. But do you really want your final poll to be a weird outlier? In the modern era with aggregators, pollsters are often graded on the predictive accuracy of their final polls. So it kind of matters. If you’re a bit shady maybe you just tweak your numbers and get them closer to the average. If you’re more on the level maybe you take a closer look at the data and find something that really looks like it needs adjusting. Maybe you just decide that you’re going to hold this one poll back.
We don’t know a lot about how this works in practice since no one’s got much interest in being forthcoming about it. I’m an outsider to the polling world. I’ll simply say that it’s treated as a given in that world that this does happen sometimes. I’ll take their word for it. The only thing I would add is that while some shops may just nudge the numbers, there are probably pressures that are kind of hard to resist. By which I mean, for a pollster it may be kind of hard in some cases to completely know whether you’re herding.
It certainly looks like there’s some herding going on at the moment on the national popular vote polls and on polls out of Pennsylvania. There are a lot of tied polls.
Now, maybe it’s just tied. And yes, that totally may be the case. What leads people to be suspicious is this: Let’s say we have perfect visibility into the souls of all Americans about to vote. And with this perfect visibility it’s actually 50-50. If you run a bunch of polls and they’re well designed you’re going to get a bunch of polls clustering around 50-50. They won’t all be 50-50. That’s just not how polls work. There’s a degree of variance involved. And right now in those two data sets, nationwide and in Pennsylvania, we’re getting maybe a bit too many 50-50s. There’s not enough variance.
Does this matter? Not that much. Maybe not at all. But if there’s a miss in either direction it’s possible that pollsters were sitting on outlier polls that just didn’t seem to make sense. We’ll know in a few days.
Postscript: Here’s an interesting Twitter thread questioning not the possibility of herding but some of the assumptions that lead people to conclude it’s happening. The gist is that pollsters really aren’t doing sampling any more. At least not all of them. And even when they do they’re doing so much weighting that the randomness is kind of broken. This reminds me of a different point I’ve seen made, though possibly related. Many have commented on the relative stability of this race going back many months, even back before Biden dropped out of the race. One simple explanation is that Americans are very divided into their camps. Swing voters these days are a very small part of the population whose defining characteristic is neither knowing nor caring much about politics. But some wonder whether another contributing factor could be the growth of weighting of samples. As it’s gotten harder and harder to get random samples there’s been more and more reliance on weighting to get a representative sample. Perhaps you’re weighting so much that you’re removing a lot of the movement in the polls. I have no ability to judge whether this is true. But it’s an interesting possibility.