Improving the Elo rating system

-

As a side project, I've been working on a new dating app called Cuffed. Just like Tinder, we have to somehow decide which profiles to show to a user who is swiping. Tinder's original solution involved computing an "Elo rating" for each profile, so this provides a natural starting point.

But the problem with the Elo rating system is that it only updates two players' ratings after they play a game with each other. We don't update the ratings for any previous competitors that faced these players. Today I thought a bit about what a version of Elo would look like if we relaxed this assumption.

The Elo rating system

The Elo rating system starts with the principle that if player \(i\) has a score of 400 points greater than player \(j\), then player \(i\) should have 10 times higher odds of winning the game. Since the derivation is well-known, I'll just state the resulting update operation:

Let \(R_{1}, \dots, R_{n} \in \mathbb{R}\) represent the ratings for players \(1\) through \(n\) in a tournament.

Define a game \(g \in \{(i, j) | i < j\}\), where \((i, j)\) represents a game between players \(i\) and \(j\). Each game \(g\) is associated with an outcome \(o_{g} \sim \mathrm{Bernoulli}\) which represents whether player \(i\) won the game.

Then after a game \(g = (i, j)\) is completed, we can apply the \(\mathrm{SingleElo}\) operation to calculate the updated ratings \(R'_{i}\) and \(R'_{j}\):

$$R'_{i} = R_{i} + (o_{g} - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$ $$R'_{j} = R_{j} + ((1 - o_{g}) - \frac{10^\frac{R_{j}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$

The quantity \(\frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}\) is meant to represent Elo's estimate of the probability that \(i\) wins the game against \(j\), which leads to the desirable property:

$$E[R'_{i} - R_{i}] = E[o_{g}] - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}$$

meaning that if our estimated probability of \(i\) winning is too low, then in expectation, \(R_{i}\) should increase and vice-versa.

AverageElo: ignore the order of games

After a set of \(k\) games \(\vec{g} = [g_{1}, ... g_{m}]^\intercal\), we can apply the \(\mathrm{SingleElo}\) update operation for each game sequentially. We define the resulting Elo score for a player \(i\) as \(\mathrm{Elo}_{i}(\vec{g})\).

To make it so the order of games \(\vec{g}\) doesn't matter, we can iterate over each of the permutations \(\pi \in S(\vec{g})\) and calculate the average Elo rating for player \(i\):

$$\mathrm{AverageElo}_{i}(\vec{g}) := \frac{\sum_{\pi \in S(\vec{g})}{\mathrm{Elo}_{i}(\pi(\vec{g}))}}{|S(\vec{g})|}$$

We can approximate \(\mathrm{AverageElo}\) by sampling \(k\) permutations \(\alpha\) uniformly at random from \(S(\vec{g})\):

$$\mathrm{SampleElo}_{i}(\vec{g}, k) := \frac{\sum_{j=1}^{k}{\mathrm{Elo}_{i}(\alpha_{j}(\vec{g}))}}{k}$$

\(\mathrm{SampleElo}\) is pretty noisy and slow to evaluate, so I wanted a faster solution.

Exact solution for AverageElo

Let \(R^{*}_{i}\) represent the final rating for player \(i\) that comes out of the \(\mathrm{AverageElo}\) algorithm. Then for a given game \(g = (i, j)\):

$$\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = \mathrm{E}[o_{g}] - \frac{10^\frac{R^{*}_{i}}{400}}{10^\frac{R^{*}_{i}}{400} + 10^{\frac{R^{*}_{j}}{400}}}$$

This update should not change \(R^{*}_{i}\) in expectation, so \(\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = 0\). Our best estimate of \(\mathrm{E}[o_{g}]\) is \(\frac{|\text{games }i\text{ beat }j|}{|\text{games }i\text{ played }j|}\), which through simple algebra leads to the necessary condition for each pair of players \((i, j)\):

$$\frac{R^{*}_{j}}{400} - \frac{R^{*}_{i}}{400} =$$ $$\log_{10}(|\text{games }j\text{ beat }i|) - \log_{10}(|\text{games }i\text{ beat }j|)$$

which is another way of writing the Elo principle that each difference of 400 points should result in a 10 times greater likelihood of winning the game. Because Elo only cares about the differences between the players' scores, I add an additional equation for \(R^{*}_{1} = 1200\). I throw all of these equations into a least squares regression and use the resulting scores to determine the relative strengths of the players.

Results

I compared the performance of this system of equations to three well-known rating systems: the original Elo algorithm, Glicko, and PageRank.

I generated five players with varying strengths and randomly produced a series of games. With this series of games, I calculated the ratings for each player as determined by each rating system. I calculated the Spearman's rank correlation coefficient for each set of ratings vs. the true relative strengths of the players. I repeated this process 400 times to produce an average Spearman's rank correlation coefficient for each rating system:

Rating SystemAverage Coefficient
Standard Elo0.1685
System of Equations0.60275
PageRank0.47125
Glicko0.59425

The system of equations performs very similarly to Glicko, probably within the margin of error. I also ran \(\mathrm{SampleElo}\), but it took a while to evaluate and was clearly performing worse than the system of equations approach.

You can replicate these results here: GitHub gist

Tags: elo rating glicko linear algebra algorithms

See also: Three Controversial Beliefs About Living Things

Back to all posts

Neel Somani

About the Author

I'm the founder of Eclipse. You can follow me on Twitter.