Improving the Elo rating system

Neel Somani - February 2, 2022

As a side project, I've been working on a new dating app called Cuffed. Just like Tinder, we have to somehow decide which profiles to show to a user who is swiping. Tinder's original solution involved computing an "Elo rating" for each profile, so this provides a natural starting point.

But the problem with the Elo rating system is that it only updates two players' ratings after they play a game with each other. We don't update the ratings for any previous competitors that faced these players. Today I thought a bit about what a version of Elo would look like if we relaxed this assumption.

The Elo rating system

The Elo rating system starts with the principle that if player $i$ has a score of 400 points greater than player $j$, then player $i$ should have 10 times higher odds of winning the game. Since the derivation is well-known, I'll just state the resulting update operation:

Let $R_{1}, \dots, R_{n} \in \mathbb{R}$ represent the ratings for players $1$ through $n$ in a tournament.

Define a game $g \in \{(i, j) | i < j\}$, where $(i, j)$ represents a game between players $i$ and $j$. Each game $g$ is associated with an outcome $o_{g} \sim \mathrm{Bernoulli}$ which represents whether player $i$ won the game.

Then after a game $g = (i, j)$ is completed, we can apply the $\mathrm{SingleElo}$ operation to calculate the updated ratings $R'_{i}$ and $R'_{j}$:

$$R'_{i} = R_{i} + (o_{g} - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$ $$R'_{j} = R_{j} + ((1 - o_{g}) - \frac{10^\frac{R_{j}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$

The quantity $\frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}$ is meant to represent Elo's estimate of the probability that $i$ wins the game against $j$, which leads to the desirable property:

$$E[R'_{i} - R_{i}] = E[o_{g}] - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}$$

meaning that if our estimated probability of $i$ winning is too low, then in expectation, $R_{i}$ should increase and vice-versa.

AverageElo: ignore the order of games

After a set of $k$ games $\vec{g} = [g_{1}, ... g_{m}]^\intercal$, we can apply the $\mathrm{SingleElo}$ update operation for each game sequentially. We define the resulting Elo score for a player $i$ as $\mathrm{Elo}_{i}(\vec{g})$.

To make it so the order of games $\vec{g}$ doesn't matter, we can iterate over each of the permutations $\pi \in S(\vec{g})$ and calculate the average Elo rating for player $i$:

$$\mathrm{AverageElo}_{i}(\vec{g}) := \frac{\sum_{\pi \in S(\vec{g})}{\mathrm{Elo}_{i}(\pi(\vec{g}))}}{|S(\vec{g})|}$$

We can approximate $\mathrm{AverageElo}$ by sampling $k$ permutations $\alpha$ uniformly at random from $S(\vec{g})$:

$$\mathrm{SampleElo}_{i}(\vec{g}, k) := \frac{\sum_{j=1}^{k}{\mathrm{Elo}_{i}(\alpha_{j}(\vec{g}))}}{k}$$

$\mathrm{SampleElo}$ is pretty noisy and slow to evaluate, so I wanted a faster solution.

Exact solution for AverageElo

Let $R^{*}_{i}$ represent the final rating for player $i$ that comes out of the $\mathrm{AverageElo}$ algorithm. Then for a given game $g = (i, j)$:

$$\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = \mathrm{E}[o_{g}] - \frac{10^\frac{R^{*}_{i}}{400}}{10^\frac{R^{*}_{i}}{400} + 10^{\frac{R^{*}_{j}}{400}}}$$

This update should not change $R^{*}_{i}$ in expectation, so $\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = 0$. Our best estimate of $\mathrm{E}[o_{g}]$ is $\frac{|\text{games }i\text{ beat }j|}{|\text{games }i\text{ played }j|}$, which through simple algebra leads to the necessary condition for each pair of players $(i, j)$:

$$\frac{R^{*}_{j}}{400} - \frac{R^{*}_{i}}{400} =$$ $$\log_{10}(|\text{games }j\text{ beat }i|) - \log_{10}(|\text{games }i\text{ beat }j|)$$

which is another way of writing the Elo principle that each difference of 400 points should result in a 10 times greater likelihood of winning the game. Because Elo only cares about the differences between the players' scores, I add an additional equation for $R^{*}_{1} = 1200$. I throw all of these equations into a least squares regression and use the resulting scores to determine the relative strengths of the players.

Results

I compared the performance of this system of equations to three well-known rating systems: the original Elo algorithm, Glicko, and PageRank.

I generated five players with varying strengths and randomly produced a series of games. With this series of games, I calculated the ratings for each player as determined by each rating system. I calculated the Spearman's rank correlation coefficient for each set of ratings vs. the true relative strengths of the players. I repeated this process 400 times to produce an average Spearman's rank correlation coefficient for each rating system:

Rating System	Average Coefficient
Standard Elo	0.1685
System of Equations	0.60275
PageRank	0.47125
Glicko	0.59425

The system of equations performs very similarly to Glicko, probably within the margin of error. I also ran $\mathrm{SampleElo}$, but it took a while to evaluate and was clearly performing worse than the system of equations approach.

You can replicate these results here: GitHub gist

The Elo rating system

AverageElo: ignore the order of games

Exact solution for AverageElo

Results

About the Author