@justaz said in #62:
> Ordo calculates ± intervals for ratings. To do that it runs a bunch of simulations, which is fine when you have 50 engines, but 1 million players, not so much.
Right so it's a monte carlo simulation of some kind combined with some machine learning, interesting.
@justaz said in #62:
> Surely there is a smart way to calculate the ± values for every player.
There probably is but as you said it would probably be very similar to what Glicko1/2 do and the most mathematically correct way to do it might not be the best engineering choice either from a practical development perspective (it probably wouldn't computationally scale up to large numbers of players well) or in terms of how valid the mathematical modelling is to the messy real world. A robust statistical approach with uncertainty quantities with an updating procedure is the way to go.
But if you want a smart way, I would suggest using a simple approximation for the error based on the number of games played by the given player in the period of time you are interested in, which would greatly reduce computation needed. I would recommend 130/sqrt(n), as errors typically decay according to a 1/sqrt(n) relationship with sample size because of the way averaging affects uncertainty. The thing is the only way to bring down the uncertainty is with more data, and it will never reach 0 so calling them "exact ratings" can be a bit misleading.
@justaz said in #62:
> Let's imagine a simple case: Player A has rating 2000 ± 100 (SD = 50, ± is 95% confidence) Player B has only played A and scored +64-36=0 giving him a rating of 2100 ± x. What would be x here? Even simpler, let's imagine there are no draws.
This is definitely solvable with some statistical error propagation, these sort of problems come up all the time in physics experiment. The crucial piece of information missing is the sample size of the number of games (n), without this there is no meaningful notion of uncertainty regarding the game outcomes. I would solve this problem by first considering the uncertainty on the expected score for player B (p) from a game based on all the real game outcomes, this can be modelled using a beta distribution which is in this case is a binomial distribution rearranged to represent the distribution of the probability parameter (the inclusion of a small number of draws will not cause too many problems with this choice of model). The error on this expected score will be sqrt(p(1-p)/n) and this is then propagated (along with p) through the sigmoid curve used to determine expected scores from the rating difference. The propagation is achieved through simple dy/dx calculus to convert uncertainty in 1 variable to the other using the gradient and assuming that the errors are small and that the curve is locally smooth and linear. This would then give you a value for the rating difference between the two players along with a +- error for the difference. This would then be added to the rating for A and the error for A's rating would be combined with the error for the rating difference by "adding in quadrature" which is just adding the sum of the squares and then square rooting (this is the correct way to combine uncorrelated errors since when 2 uncorrelated random variables add together, you find the mean and variance from adding the component mean and variances, NOT the standard deviations).
This method would not scale up to larger numbers of players well at all, but it can give you an idea of what the analytical approach would look like.