of better decisions
The intellectual history of pairwise comparison begins with a practical problem: how do you measure things that have no natural unit?
The answer, proposed by psychologist Louis Leon Thurstone at the University of Chicago, was to compare things directly, in pairs. His 1927 paper, A Law of Comparative Judgment, formalised this insight into a precise mathematical model.
Thurstone's insight was grounded in an empirical observation: people are significantly better at answering "which of these two is louder?" than "how loud is this, on a scale of one to ten?"
When you ask someone to rate the importance of a strategic priority on a scale of one to ten, you are asking them to perform several cognitive operations simultaneously. This process introduces systematic error at every step.
The most damaging of these errors is anchoring bias. The first item rated tends to influence all subsequent ratings. Research by Tversky and Kahneman (1974) demonstrated that anchoring effects are both large and remarkably resistant to correction.
A second source of error is scale compression. In practice, most respondents avoid the extreme ends of scales, clustering responses in the middle.
Given n items to rank, each item is compared against every other item exactly once. The total number of comparisons required is n(n−1)/2.
Aggregating individual responses
For individual ranking sessions, the result is derived directly from the win count: each item receives one point for each comparison it wins. The final ranked list orders items from most wins to fewest.
Aggregating group responses
For group sessions, each participant's comparisons are aggregated before the final ranking is calculated. The group win total for any comparison is the sum of individual votes across all participants.
Crucially, the aggregated result also exposes the shape of disagreement. Items that divide the group are visible in the data — this is the most valuable output of a group session.
Conventional group decision-making methods share a fundamental flaw: they are sensitive to the order in which opinions are expressed. The first person to speak anchors the group.
This is not a failure of individuals — it is a predictable consequence of group dynamics. Research has consistently found that group discussions produce less accurate collective judgements than the aggregation of independent private judgements. This effect is sometimes called the hidden profile problem.
Strategic planning and management
The Analytic Hierarchy Process has been used in high-stakes strategic decisions in defence, infrastructure, healthcare, and government. A 2008 review identified over 900 published applications across 19 different fields.
Machine learning and AI
Reinforcement learning from human feedback (RLHF), the technique used to train modern large language models, relies entirely on pairwise comparison. Human evaluators are shown two model outputs and asked to select the better one.
Sports and competitive ranking
The Elo rating system, used in chess, football, and dozens of other competitive domains, is a direct application of the Bradley–Terry pairwise model.
A Law of Comparative Judgment. Psychological Review, 34(4), 273–286.
Rank analysis of incomplete block designs. Biometrika, 39(3/4), 324–345.
A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology, 15(3), 234–281.
Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124–1131.
Pooling of Unshared Information in Group Discussion. Journal of Personality and Social Psychology, 48(6), 1467–1478.
Deep Reinforcement Learning from Human Preferences. NeurIPS 2017.