As a side project, I've been working on a new dating app called Cuffed. Just like Tinder, we have to somehow decide which profiles to show to a user who is swiping. Tinder's original solution involved computing an "Elo rating" for each profile, so this provides a natural starting point.
But the problem with the Elo rating system is that it only updates two players' ratings after they play a game with each other. We don't update the ratings for any previous competitors that faced these players. Today I thought a bit about what a version of Elo would look like if we relaxed this assumption.
The Elo rating system
The Elo rating system starts with the principle that if player \(i\) has a score of 400 points greater than player \(j\), then player \(i\) should have 10 times higher odds of winning the game. Since the derivation is well-known, I'll just state the resulting update operation:
Let \(R_{1}, \dots, R_{n} \in \mathbb{R}\) represent the ratings for players \(1\) through \(n\) in a tournament.
Define a game \(g \in \{(i, j) | i < j\}\), where \((i, j)\) represents a game between players \(i\) and \(j\). Each game \(g\) is associated with an outcome \(o_{g} \sim \mathrm{Bernoulli}\) which represents whether player \(i\) won the game.
Then after a game \(g = (i, j)\) is completed, we can apply the \(\mathrm{SingleElo}\) operation to calculate the updated ratings \(R'_{i}\) and \(R'_{j}\):
$$R'_{i} = R_{i} + (o_{g} - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$ $$R'_{j} = R_{j} + ((1 - o_{g}) - \frac{10^\frac{R_{j}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}})$$The quantity \(\frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}\) is meant to represent Elo's estimate of the probability that \(i\) wins the game against \(j\), which leads to the desirable property:
$$E[R'_{i} - R_{i}] = E[o_{g}] - \frac{10^\frac{R_{i}}{400}}{10^\frac{R_{i}}{400} + 10^{\frac{R_{j}}{400}}}$$meaning that if our estimated probability of \(i\) winning is too low, then in expectation, \(R_{i}\) should increase and vice-versa.
AverageElo: ignore the order of games
After a set of \(k\) games \(\vec{g} = [g_{1}, ... g_{m}]^\intercal\), we can apply the \(\mathrm{SingleElo}\) update operation for each game sequentially. We define the resulting Elo score for a player \(i\) as \(\mathrm{Elo}_{i}(\vec{g})\).
To make it so the order of games \(\vec{g}\) doesn't matter, we can iterate over each of the permutations \(\pi \in S(\vec{g})\) and calculate the average Elo rating for player \(i\):
$$\mathrm{AverageElo}_{i}(\vec{g}) := \frac{\sum_{\pi \in S(\vec{g})}{\mathrm{Elo}_{i}(\pi(\vec{g}))}}{|S(\vec{g})|}$$We can approximate \(\mathrm{AverageElo}\) by sampling \(k\) permutations \(\alpha\) uniformly at random from \(S(\vec{g})\):
$$\mathrm{SampleElo}_{i}(\vec{g}, k) := \frac{\sum_{j=1}^{k}{\mathrm{Elo}_{i}(\alpha_{j}(\vec{g}))}}{k}$$\(\mathrm{SampleElo}\) is pretty noisy and slow to evaluate, so I wanted a faster solution.
Exact solution for AverageElo
Let \(R^{*}_{i}\) represent the final rating for player \(i\) that comes out of the \(\mathrm{AverageElo}\) algorithm. Then for a given game \(g = (i, j)\):
$$\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = \mathrm{E}[o_{g}] - \frac{10^\frac{R^{*}_{i}}{400}}{10^\frac{R^{*}_{i}}{400} + 10^{\frac{R^{*}_{j}}{400}}}$$This update should not change \(R^{*}_{i}\) in expectation, so \(\mathrm{E}[R^{*}_{i}{'} - R^{*}_{i}] = 0\). Our best estimate of \(\mathrm{E}[o_{g}]\) is \(\frac{|\text{games }i\text{ beat }j|}{|\text{games }i\text{ played }j|}\), which through simple algebra leads to the necessary condition for each pair of players \((i, j)\):
$$\frac{R^{*}_{j}}{400} - \frac{R^{*}_{i}}{400} =$$ $$\log_{10}(|\text{games }j\text{ beat }i|) - \log_{10}(|\text{games }i\text{ beat }j|)$$which is another way of writing the Elo principle that each difference of 400 points should result in a 10 times greater likelihood of winning the game. Because Elo only cares about the differences between the players' scores, I add an additional equation for \(R^{*}_{1} = 1200\). I throw all of these equations into a least squares regression and use the resulting scores to determine the relative strengths of the players.
Results
I compared the performance of this system of equations to three well-known rating systems: the original Elo algorithm, Glicko, and PageRank.
I generated five players with varying strengths and randomly produced a series of games. With this series of games, I calculated the ratings for each player as determined by each rating system. I calculated the Spearman's rank correlation coefficient for each set of ratings vs. the true relative strengths of the players. I repeated this process 400 times to produce an average Spearman's rank correlation coefficient for each rating system:
Rating System | Average Coefficient |
Standard Elo | 0.1685 |
System of Equations | 0.60275 |
PageRank | 0.47125 |
Glicko | 0.59425 |
The system of equations performs very similarly to Glicko, probably within the margin of error. I also ran \(\mathrm{SampleElo}\), but it took a while to evaluate and was clearly performing worse than the system of equations approach.
You can replicate these results here: GitHub gist