Forecasting the 2020 US Presidential Election and the Impact of Differential Turnout

A Biden landslide or a redux of 2016?

Joseph Richards
5 min readOct 28, 2020

Source code for this project can be found on GitHub here.

One week out from the 2020 US presidential election, the question on everyone’s mind is: does Biden have this one in the bag, as the polls would suggest, or will Trump ride the wave of voter suppression and turnout to pull it out once again?

Way back in 2012, as part of a Bayesian Statistics course I taught, the final project called for the students to use the tools that they learned in class to predict the outcome of the 2012 presidential election between Barack Obama and Mitt Romney. The students’ approaches varied quite widely, but those who relied on the state-level polls pretty much nailed the result (Obama won with 332 electoral votes). The model that I coded up (more on this later) predicted every state correctly.

Flash forward to 2016. Applying the same model that worked so well in 2012 to the race between Hillary Clinton and Donald Trump yielded horrible results.

That model got seven states wrong, all of which ended up going to Trump. Those states included three that were deemed “Clinton safe” by the model (Michigan, Pennsylvania and Wisconsin), two “likely Clinton” (Florida and North Carolina) and two toss-ups that had a 60-70% chance of Clinton winning (Iowa and Ohio). Yikes!

So, what went wrong? How could a model that forecasted the 2012 election so well miss the mark so badly in 2016? Certainly, I wasn’t the only statistician whose model went off the rails in 2016 (embarrassing, I know!).

The primary culprit, I believe, is differential turnout, in favor of Trump. By differential turnout, I mean discrepancies between the people that turned out to vote in 2016 versus those that the pollsters thought would vote (their “likely voter”). And in 2016 those discrepancies occurred primarily along party lines — Democrats that usually vote stayed home (or voted 3rd party) and Republicans who rarely (or never) vote came out in droves. Because pollsters use lists of “likely voters” to construct their polls, the end result of this was that their polling did not reflect reality.

So how much differential turnout was there? Luckily we can estimate that using the deviation between our 2016 model forecasts and the actual election result. For the seven swing states that the model got wrong, Trump’s actual vote share was on average 4 percentage points higher than what the model predicted. In fact, applying a R+4 percentage point adjustment to every state’s predicted vote share results in a forecast that would have correctly called every single state with the exception of Nevada (which was actually won by Clinton).

This means that in 2016 the impact of differential turnout from the “likely voter” polling model was a roughly 4 percentage point swing in favor of Trump. This was just enough to swing a “solid Clinton” forecast to a Trump victory.

Applying these lessons to the 2020 election

Now, back to the election at hand. To get a handle on how the 2020 election will turn out, there are two questions to look at:

  1. What does our model forecast for the 2020 outcome? (Spoiler alert: Biden landslide).
  2. If we apply a R+4 differential turnout to the model, how might the outcome change?

2020 Presidential Election Forecast

Before showing the forecast for the 2020 election, a few words on my forecasting model and why it is a sound approach. The primary motivation for the model is from a 2010 paper from Lock & Gelman. In a nutshell, the model:

  1. Considers all state- and national-level polls, adapting its state-wise forecast to fit the combined set of polls in aggregate.
  2. Automatically estimates and adjusts for any pollster-level bias.
  3. Down-weights older polls compared to more recent ones.
  4. Uses the most recent presidential election results as prior information about each state’s voting tendency.

Source code for the model can be found on GitHub here. (The model code is written in R and uses the JAGS package.)

Applying this model to the 2020 presidential polls (from Real Clear Politics) up to and including October 24 yields the following forecast:

This is a Biden landslide of 358 electoral votes, and no reasonable path to victory for Trump.

The Impact of Differential Turnout on the 2020 Forecast

For anybody who has followed the polls this election cycle, the above is probably not too surprising. Just like in 2016, the polls have shown a consistent lead for the Democratic candidate. In fact, this time around, the lead for Biden is even bigger than Hillary’s lead in 2016.

The more interesting question, then, is how differential turnout could impact the election. Here’s the map applying an R+4 differential turnout to the model forecasts:

A much different story! In fact, this is exactly the same map as 2016. The path for Biden in this scenario is incredibly narrow (a 2% chance of winning).

So, even though so much has changed in the last four years, the result of this election will come down to the deciding factor of last election: turnout! The early indication is that the enthusiasm gap is much smaller this election cycle than in 2016 (and may actually be in the Democrats’ favor, given the reports on early voting turnout). Additionally, because of the polling catastrophe of 2016, pollsters have ramped up their efforts to increase the fidelity of their likely voter models, so presumably the polls (and thus my model) would already have differential turnout accounted for. However, once again we are in unprecedented times, with factors like COVID, court decisions, foreign interference, and limited in-person voting potentially working to suppress and bias vote counts.

I suppose that the lesson from all of this is to make sure you VOTE! Just a few percentage points of voter turnout one way or another can flip elections. Given the historical turnout levels that we’ve already seen in early voting, my prediction is that the differential turnout will return to a typical level around zero. If this happens, then we’ll have a result closer to the model forecast.

And because I can’t write a post about election forecasting and not actually give a final forecast, my prediction: Biden 334-Trump 204.

--

--

Joseph Richards

Statistician by training and cross-disciplinary by nature. Current COO of Down to Cook (downtocookfoods.com). Past co-founder & Head of Data Science at Wise.io.