Trackers
Trust
Share
@CornellPolls

US Election Forecast Methodology

Last updated: · Model: US-V2.1

Overview

The US election forecast is a comprehensive four-layer system that combines daily polling aggregation, Bayesian hierarchical modeling, and Monte Carlo simulation to produce probabilistic forecasts of US Senate and House elections. The model is designed to update daily and reflect current polling conditions while maintaining statistical rigor.

The model ingests polls from pollsters, with the most recent poll conducted on . The database currently contains individual polls across all 2026 races (Senate, House, and generic ballot).

Unlike simple polling averages, this model accounts for correlation between races, historical patterns, and uncertainty in polling data. The result is a set of credible intervals and control probabilities that reflect both current opinion and the irreducible uncertainty in measuring public preference.

Layer 1: Daily Polling Aggregation

The first layer consumes raw polling data and produces a weighted average for each race. This is the traditional "poll of polls" approach, but with careful attention to data quality and weighting methodology.

Data ingestion

Polls are sourced from publicly available records, primarily from academic polling databases and news aggregators. Each poll is validated for completeness: we require pollster name, fieldwork dates, sample size, and voting intention figures. Polls that fail validation are flagged and excluded.

Weighting: recency

Polls are weighted by how recently their fieldwork was conducted, using exponential decay with a 21-day half-life. This means a poll conducted 21 days ago receives half the weight of a poll conducted today. There is no hard cutoff—the exponential decay ensures old polls contribute negligibly (a 60-day-old poll has just 13% of a fresh poll's weight).

Formula: w_recency = exp(−ln(2) × days_ago / 14)

Weighting: sample size

Larger polls receive more weight, since they provide more precise estimates. The weight is proportional to the square root of the sample size, normalised to a reference poll of 1,000 respondents. This is capped at 3× to prevent very large polls from dominating the average.

Formula: w_sample = min(√(n / 1000), 3.0)

Final aggregate weight

The final weight for each poll is the product of recency and sample size: w_recency × w_sample. All US pollsters are treated equally in the current version (v2.1). Future versions may introduce pollster reliability weighting based on historical accuracy.

Layer 2: Bayesian Hierarchical Estimation

The second layer takes the weighted polling averages and fits a Bayesian hierarchical model to estimate the true underlying support for each party in each race. This accounts for several sources of uncertainty: polling error, sample variation, and genuine but temporary fluctuations in opinion.

The model structure

For each race, we assume observed polling data come from a normal distribution with mean equal to the true support and standard deviation equal to the polling error. The polling error is estimated from the weighted standard deviation of polls in the recency window, with a floor of 2 percentage points applied to reflect irreducible measurement uncertainty.

Hierarchical priors

The true support in each race is modeled using a hierarchical prior. The Senate and House both have national-level priors that reflect the "generic ballot" (national Democratic vs. Republican preference), plus regional deviations that account for state and district heterogeneity. This allows races to influence each other while still maintaining district-level specificity.

Posterior inference

The posterior distribution for each race is computed using Markov chain Monte Carlo (MCMC) sampling. This produces a full distribution of plausible values for Democratic and Republican support, not just point estimates. The median of the posterior (50th percentile) is reported as the central estimate, and the 5th and 95th percentiles form the 90% credible interval.

Layer 3: Outcome Simulation

The third layer uses the posterior distributions from Layer 2 to simulate many possible election outcomes. Each simulation draws a random sample from the posterior of every race, then counts the total Democratic and Republican seats. Running 40,000 simulations produces an empirical distribution of outcomes.

Correlation structure

The simulations respect correlations between races. All races are drawn from a multivariate normal distribution with a correlation structure estimated from historical election results and the hierarchical model. This means that if a national Democratic surge occurs, it affects Democratic chances in all races simultaneously, as we observe in real elections.

Outputs

From the 40,000 simulations, we extract:

  • Probability that Democrats control the Senate/House (percentage of simulations where Democrats win 51+ Senate seats or 218+ House seats)
  • Distribution of seat counts (percentiles: 10th, 50th, and 90th)
  • Correlation structure in outcomes (e.g., probability that Democrats win both chambers)

Layer 4: Uncertainty Quantification

The fourth layer provides transparent reporting of uncertainty. All estimates come with credible intervals and coverage diagnostics. We distinguish between different sources of uncertainty:

  • Polling error: Measurement uncertainty in the polls themselves (2–5 percentage points per race)
  • Estimation uncertainty: Posterior uncertainty about the true support level given the polls observed to date
  • Fundamental uncertainty: The irreducible unpredictability of elections (campaigns, surprises, turnout variation)

Key assumptions and limitations

Like all forecasts, this model rests on several assumptions. The key ones:

  • Polls are unbiased. We assume published polls measure voting intention accurately on average. If systemic polling bias exists (e.g., shy Trump voters), the model will reflect that bias.
  • Correlations are stable. We estimate race-to-race correlations from historical elections and apply them forward. If 2026 has an unusual correlation structure, forecasts will be off.
  • No exogenous shocks. The model does not account for major unforeseen events (wars, economic crises, scandals). These are genuine possibilities that shift elections unpredictably.
  • Turnout is as historical. We do not model turnout explicitly. If turnout differs significantly from 2020/2022, outcomes will differ from the forecast.

Interpreting probabilities

A 60% probability for Democrats does NOT mean "we are 60% confident Democrats will win." Rather, it means: "In a ensemble of 100 elections held under identical current conditions, Democrats would win about 60 of them."

Probabilities below 55% and above 45% indicate a genuine competitive race where either party could win. Probabilities above 65% or below 35% indicate a clear lean, but not certainty. No forecast has probability 100% or 0%—elections are inherently uncertain.

Accuracy and track record

This is the v2.1 model launched in March 2026. Historical accuracy metrics will accumulate as we move toward election day. The model's performance will be evaluated on:

  • Calibration: Among elections given a 60% probability, do 60% actually occur?
  • Interval coverage: Do 90% credible intervals contain the true outcome 90% of the time?
  • Point estimates: How close are median forecasts to actual seat counts?

Version history

v2.1 (March 2026) — Current release. Four-layer system: (1) daily polling aggregation with 21-day half-life exponential decay and sample-size weighting; (2) Bayesian hierarchical estimation with multivariate priors; (3) Monte Carlo simulation (40,000 runs) with race correlation structure; (4) transparent uncertainty quantification. Ingests polls from pollsters ( total polls). Produces daily updates for Senate, House, and generic ballot with 90% credible intervals and control probabilities.

Questions about the methodology? Contact [email protected]