# Improving the Elo rating system

-

As a side project, I've been working on a new dating app called Cuffed. Just like Tinder, we have to somehow decide which profiles to show to a user who is swiping. Tinder's original solution involved computing an "Elo rating" for each profile, so this provides a natural starting point.

But the problem with the Elo rating system is that it only updates two players' ratings after they play a game with each other. We don't update the ratings for any previous competitors that faced these players. Today I thought a bit about what a version of Elo would look like if we relaxed this assumption.

## The Elo rating system

The Elo rating system starts with the principle that if player $$i$$ has a score of 400 points greater than player $$j$$, then player $$i$$ should have 10 times higher odds of winning the game. Since the derivation is well-known, I'll just state the resulting update operation:

Let $$R_{1}, \dots, R_{n} \in \mathbb{R}$$ represent the ratings for players $$1$$ through $$n$$ in a tournament.

Define a game $$g \in \{(i, j) | i < j\}$$, where $$(i, j)$$ represents a game between players $$i$$ and $$j$$. Each game $$g$$ is associated with an outcome $$o_{g} \sim \mathrm{Bernoulli}$$ which represents whether player $$i$$ won the game.

Then after a game $$g = (i, j)$$ is completed, we can apply the $$\mathrm{SingleElo}$$ operation to calculate the updated ratings $$R'_{i}$$ and $$R'_{j}$$:

$$R'_{i} = R_{i} + (o_{g} - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$ $$R'_{j} = R_{j} + ((1 - o_{g}) - \frac{10^\frac{R_{j}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$

The quantity $$\frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}$$ is meant to represent Elo's estimate of the probability that $$i$$ wins the game against $$j$$, which leads to the desirable property:

$$E[R'_{i} - R_{i}] = E[o_{g}] - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}$$

meaning that if our estimated probability of $$i$$ winning is too low, then in expectation, $$R_{i}$$ should increase and vice-versa.

## AverageElo: ignore the order of games

After a set of $$k$$ games $$\vec{g} = [g_{1}, ... g_{m}]^\intercal$$, we can apply the $$\mathrm{SingleElo}$$ update operation for each game sequentially. We define the resulting Elo score for a player $$i$$ as $$\mathrm{Elo}_{i}(\vec{g})$$.

To make it so the order of games $$\vec{g}$$ doesn't matter, we can iterate over each of the permutations $$\pi \in S(\vec{g})$$ and calculate the average Elo rating for player $$i$$:

$$\mathrm{AverageElo}_{i}(\vec{g}) := \frac{\sum_{\pi \in S(\vec{g})}{\mathrm{Elo}_{i}(\pi(\vec{g}))}}{|S(\vec{g})|}$$

We can approximate $$\mathrm{AverageElo}$$ by sampling $$k$$ permutations $$\alpha$$ uniformly at random from $$S(\vec{g})$$:

$$\mathrm{SampleElo}_{i}(\vec{g}, k) := \frac{\sum_{j=1}^{k}{\mathrm{Elo}_{i}(\alpha_{j}(\vec{g}))}}{k}$$

$$\mathrm{SampleElo}$$ is pretty noisy and slow to evaluate, so I wanted a faster solution.

## Exact solution for AverageElo

Let $$R^{*}_{i}$$ represent the final rating for player $$i$$ that comes out of the $$\mathrm{AverageElo}$$ algorithm. Then for a given game $$g = (i, j)$$:

$$\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = \mathrm{E}[o_{g}] - \frac{10^\frac{R^{*}_{i}}{400}}{10^\frac{R^{*}_{i}}{400} + 10^{\frac{R^{*}_{j}}{400}}}$$

This update should not change $$R^{*}_{i}$$ in expectation, so $$\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = 0$$. Our best estimate of $$\mathrm{E}[o_{g}]$$ is $$\frac{|\text{games }i\text{ beat }j|}{|\text{games }i\text{ played }j|}$$, which through simple algebra leads to the necessary condition for each pair of players $$(i, j)$$:

$$\frac{R^{*}_{j}}{400} - \frac{R^{*}_{i}}{400} =$$ $$\log_{10}(|\text{games }j\text{ beat }i|) - \log_{10}(|\text{games }i\text{ beat }j|)$$

which is another way of writing the Elo principle that each difference of 400 points should result in a 10 times greater likelihood of winning the game. Because Elo only cares about the differences between the players' scores, I add an additional equation for $$R^{*}_{1} = 1200$$. I throw all of these equations into a least squares regression and use the resulting scores to determine the relative strengths of the players.

## Results

I compared the performance of this system of equations to three well-known rating systems: the original Elo algorithm, Glicko, and PageRank.

I generated five players with varying strengths and randomly produced a series of games. With this series of games, I calculated the ratings for each player as determined by each rating system. I calculated the Spearman's rank correlation coefficient for each set of ratings vs. the true relative strengths of the players. I repeated this process 400 times to produce an average Spearman's rank correlation coefficient for each rating system:

 Rating System Average Coefficient Standard Elo 0.1685 System of Equations 0.60275 PageRank 0.47125 Glicko 0.59425

The system of equations performs very similarly to Glicko, probably within the margin of error. I also ran $$\mathrm{SampleElo}$$, but it took a while to evaluate and was clearly performing worse than the system of equations approach.

You can replicate these results here: GitHub gist

Tags: elo rating glicko linear algebra algorithms 