Methodology

Last updated: — · Reliability scoring: v6.0

Overview

This tracker produces a daily weighted polling average of UK voting intention. It is not a prediction. It is an estimate of where public opinion currently stands, based on the best available polling data.

The model ingests polls from — pollsters, covering data from — to —. The database currently contains — individual polls.

What the model does

For each day, the model considers all published UK voting intention polls, weighting each by how recently it was conducted. Each poll receives a composite weight based on three factors: how recent it is, how large its sample was, and how reliable its pollster has historically been. The model then computes a weighted average for each party.

This approach is sometimes called a "poll of polls" or polling aggregate. It is designed to smooth out the noise from individual polls while still responding to genuine shifts in opinion.

The model reflects the polls. If polls are systematically biased — as they were in the 2024 general election, where they collectively overestimated Labour by approximately 7 percentage points — the model will reflect that bias. We do not apply ad-hoc corrections for past biases, because the direction and magnitude of polling error is unknown in advance. The model aggregates what pollsters report; it does not second-guess them.

Weighting factors

1. Recency

Polls are weighted by how recently their fieldwork was conducted, using exponential decay with a 14-day half-life. This means a poll conducted 14 days ago receives half the weight of a poll conducted today. There is no hard cutoff — the exponential decay ensures old polls contribute negligibly (a 60-day-old poll has just 4% of a fresh poll's weight). This parameter was validated by backtesting against the 2024 general election result.

Formula: w_recency = exp(−ln(2) × days_ago / 14)

2. Sample size

Larger polls receive more weight, since they provide more precise estimates. The weight is proportional to the square root of the sample size, normalised to a reference poll of 1,000 respondents. This is capped at 3× to prevent very large polls from dominating the average.

Formula: w_sample = min(√(n / 1000), 3.0)

3. Pollster reliability

Each pollster has a reliability score bounded between 0.30 and 0.70, calculated from four inputs: accuracy across multiple elections with systemic bias deflation (40% weight), consensus alignment measuring correlation with peer movements (20%), empirically derived methodological trust factors (25%), and house effect magnitude (15%).

The bounded range means the most trusted pollster gets 2.3× the weight of the least trusted — strong enough to meaningfully differentiate quality without destabilising the average. See the Pollster Ratings page for individual scores and breakdowns.

The final weight for each poll is the product of all three factors: recency × sample size × reliability.

Uncertainty bands

The tracker displays 90% uncertainty bands around each party estimate. These are calculated as the weighted standard deviation of polls in the recency window, multiplied by 1.645 (the z-score for a 90% interval). We call these "uncertainty bands" rather than "credible intervals" because the model is not fully Bayesian — these bands reflect pollster disagreement, not a formal posterior distribution.

Wide bands mean pollsters disagree significantly. Narrow bands mean there is strong consensus. A minimum band of ±1.5 percentage points is applied to every estimate, because even when pollsters agree, there is always some irreducible uncertainty in measuring public opinion.

Smaller parties — particularly the Greens — often show wider uncertainty bands than Labour or the Conservatives. This is expected: pollsters disagree more on smaller-party vote share due to differences in turnout assumptions, question prompting, and the greater sampling variability that comes with measuring a smaller proportion of the electorate.

Pollster reliability scoring

Rather than treating all pollsters equally, the model assigns each a reliability score that reflects how much trust to place in their readings. This is calculated from four inputs, each normalised to a 0–1 scale and combined with fixed weights:

Election accuracy (40% weight)

Each pollster's pre-election polls are compared against election results using vote-share-weighted RMSE across five major parties. The framework supports multiple elections (2024, 2019, 2017) with exponential cross-election decay — a 5-year half-life ensures recent performance dominates while historical accuracy still contributes. When industry-wide polling misses occur (systemic bias), a deflator reduces individual penalties proportionally — a pollster who missed Labour by 5pp when the whole industry missed by 4pp is penalised less than one who missed in isolation.

Consensus alignment (20% weight)

This measures whether a pollster's movements over time track the direction of the underlying opinion trend, or are idiosyncratic noise. For each consecutive pair of polls from the same pollster, we compute the change for each party and divide by the number of days between polls. We then compare against an exponentially smoothed (EMA) trend estimate computed from all other pollsters — this replaces the raw peer average used in earlier versions and provides a cleaner, latent-trend baseline that filters out single-poll noise. The Spearman rank correlation between pollster changes and trend changes becomes the alignment score. A graduated anti-herding penalty applies when correlation is suspiciously high and house effect stability is very low.

Methodological trust (25% weight)

This combines four sub-factors: BPC membership and transparency signals (30%), track record depth measured by polling volume (25%), sample quality combining average sample size and methodology type (25%), and recency of activity with exponential decay (20%). Method type scores are now empirically derived from 2024 election accuracy data — the hierarchy is grounded in observed RMSE by method, not expert assumption. MRP pollsters are flagged distinctly, acknowledging that model-based methodologies have different error properties.

House effect quality (15% weight)

This measures both the magnitude and stability of a pollster's systematic bias relative to consensus. A pollster with a consistent +2pp Labour bias is scored more favourably than one whose bias swings between +4pp and −2pp — consistent effects can be modelled, volatile ones cannot. Magnitude carries 60% of the sub-score weight and stability carries 40%.

Per-input Bayesian shrinkage

Each scoring input is shrunk independently toward an informed prior before composition, using input-specific shrinkage constants calibrated to the information content of each observation type: k=8 for election accuracy (sparse, high-information), k=15 for consensus alignment (many low-information pairs), and k=12 for house effects (moderate density). The prior is conditioned on BPC membership and methodology type — a BPC member with MRP starts from a higher prior (0.58) than an unknown non-BPC pollster (0.38).

Bounding and impact

The shrunk composite score is mapped to a bounded range of 0.30 to 0.70. No pollster is ever excluded from the average (the floor is 0.30), and no single pollster can dominate it (the ceiling is 0.70). The most trusted pollster receives 2.3× the weight of the least trusted. Pollsters with insufficient data for any input receive a differentiated default — new pollsters get a more generous default (0.40) than dormant ones (0.30), reflecting that a new entrant deserves a fairer starting position than an inactive one. View all scores on the Pollster Ratings page.

House effects

A house effect is a pollster's systematic tendency to read a party higher or lower than the consensus of other pollsters at the same point in time. We calculate house effects by comparing each pollster's readings against the average of all other pollsters within a ±14 day window around each of their polls.

House effects are displayed on the Pollster Ratings page for transparency. In model version 1.0, house effects are reported but not used to adjust the polling average. Future versions may incorporate house-effect corrections.

What this model does not do

This model does not predict elections. It does not account for likely voter turnout, tactical voting, or campaign effects. It simply reports what current polls suggest about national voting intention.

House effects are calculated and displayed for transparency, but the model does not apply house-effect adjustments to the polling average in the current version. It does not use a Bayesian framework or state-space model — those are planned for future versions.

The tracker includes a simple seat projection based on uniform national swing (UNS). This applies the same polling change to every constituency and does not account for regional variation, tactical voting, or local factors. It should be treated as illustrative, not as a forecast.

Data sources

Polling data is sourced from publicly available records, primarily Wikipedia's UK opinion polling tables, which aggregate polls published by British Polling Council members and other pollsters. Each poll entry includes pollster name, client, fieldwork dates, sample size, and party vote share figures.

The system stores both raw scraped data and a cleaned canonical version. All entries include source attribution. Polls that fail validation checks (e.g. party totals exceeding 105%, very small samples) are flagged and excluded from the model.

Known limitations

All polling aggregates have limitations. The key ones for this model are:

•It relies on publicly published polls. Polls that are not published or are embargoed will not appear.
•If a source changes its format, the automated scraper may fail until manually fixed.
•The multi-election framework currently has full data for 2024 only. Historical elections (2019, 2017) will auto-activate when pre-election polling data is ingested.
•The model treats all polls as measuring the same thing, even though question wording, turnout filtering, and weighting methods differ between pollsters.
•GB-only and UK-wide polls are currently combined, which may slightly affect SNP and Plaid Cymru figures.

Version history

v6.0 (March 2026) — Fourth iteration from external expert review (ChatGPT 88, Claude 90, Gemini 97 — avg 91.7). Seven improvements: per-input shrinkage k-values calibrated to information content per observation type (k=8 accuracy, k=15 consensus, k=12 house effects), empirically derived method scores grounded in 2024 RMSE by method type, comprehensive 7-parameter sensitivity analysis (weights, k, EMA α, herding onset/max, consensus window, HE thresholds), systemic bias deflator reducing individual penalties when industry-wide misses occur, MRP separation with distinct flagging and annotation, exponential cross-election decay (5-year half-life replacing fixed weights), and downstream impact measurement proving reliability weighting improves the aggregate vs flat weights.

v5.0 (March 2026) — Multi-election architecture, EMA-smoothed consensus trend, graduated anti-herding penalty, built-in sensitivity analysis, distribution-calibrated grade boundaries, cross-validated shrinkage constant, method-era awareness.

v4.0 (March 2026) — Per-input Bayesian shrinkage, informed hierarchical prior, Spearman rank correlation, frequency-capped consensus, anti-herding penalty, empirically calibrated thresholds, all pre-election polls with recency weighting, vote-share-weighted relative house effects.

v3.0 (March 2026) — First iteration from external review. Vote-share-weighted RMSE, time-normalised consensus deltas, differentiated missing data defaults, composite Bayesian shrinkage, house effect stability, methodology consolidation.

v2.0 (March 2026) — Pollster reliability scoring calibrated against 2024 election results. Four-input framework: election accuracy (40%), consensus alignment (20%), methodology (25%), house effect magnitude (15%). Scores bounded to 0.30–0.70 range. Consensus alignment replaces naive std-dev consistency — measures correlation with peer movements rather than raw volatility. Missing data defaults lowered from 0.50 to 0.35 (penalises unknowns). New Pollster Ratings page with full transparency.

v1.0 (March 2026) — Initial release. Weighted rolling average with recency decay (14-day half-life, decay-only, no hard cutoff), sample-size weighting, and pollster reliability framework (all pollsters at default in v1). 90% uncertainty bands. Data from Wikipedia polling tables. Parameters validated by backtesting against the 2024 general election: mean absolute error of 2.4 percentage points across all five major parties.

Questions about the methodology? Contact [email protected]