Exact Ratings for Everyone on Lichess

@eel9 said in #51:
> mfw my real rating is 250 points lower than shown on lichess

Try to play rated games against bots 300 below your rating, and you will see. eg. @Boris-Trapsky @matmoi @jangine @zeekat

But I warn you, the truth hurts (rating and ego)

bubbleboy

#72

@justaz said in #10:
> I picked Ordo because it is standard in computer chess, and I assumed those guys know what they're doing. Check out www.talkchess.com/forum3/viewtopic.php?f=2&t=44180
>
>
> Cool! I'll look into this
>
>
> The key factor in Ordo being slow is that it's doing gradient descent on a single cpu thread. If I use my brain maybe I'll manage to put it on the GPU with Torch or something, and get a 1000x speedup. As for the period size, I look into it when I compute the whole backlog of ratings, maybe 3 months is perfect, maybe 3 weeks, we will see. Keep an eye out for my next rating release when December database drops!

Hey @justaz , I'm a mathematical physicist more than a statistician by training, so it would be great if in this or future blogs you include a reference to how Ordo works, as it will be easier for casual/new readers like me to jump into the content faster :)

From a logical perspective, I would have a lot of the same comments as @MyPoorRook around hand-wavy statements, but I am happy to see your open and measured responses :) One additional comment from a presentation of information perspective, I would have is that, at least on my lichess screen, is the first graph with 1+ million points is hard to read and get something from. I understand that no graphing program is going to enjoy trying to represent all of the points in that case, so would it be possible to present representative envelopes for points, or a trend line against lichess ratings to show that the general clump isn't well-modelled by slope 1 or something?

Interesting topic and thanks for posting.

#73

Wesz808

#74

Really enjoyed reading this post. Was able to follow it far better than I expected.
Hope you'll post some more in the future.
All these 10 self help steps to buy my course blogs are just rubbish. This was quality content.

#75

professionalpatzer1

#76

There are other dynamics that affect online chess ratings. It seems important to note that online is a particular type of rating and it does not ever equate to an over the board rating. That is the first flaw - people thinking an online rating suggests their otb ability.

For example, if only I had back all the points I have lost on mouse slips. How does lichess catch people who use electronic assistance for chess opening cheaters? There are many factors that make online chess ratings for any thing slower than bullet highly artificial. It is for these reasons that I do not care about my lichess ratings at all. Unless someone is at the master to GM level, they have 0 meaning.

dboing

#77

The point of rating here on lichess is roughly to have interesting games.

so it really does not matter at all whatever true exact eternal ultimate wooden OTB rating one would have if being processed through OTB real chess. What matters here is that we are pooled together to have relatively meaningful rating for game pairing.

Just repeating what the FAQ says.. And what know about competitive pairing rating systems. There is no magic true strength revelation to be had there... sorry this is not an ego mirror to ask about own chess strength.

futile pursuit.

dboing

#78

Also about distribution, I prefer the truth of the distribution to a fake distribution that reflect the hand-crafted few parameter population distribution of ELO minded rating systems.

The nice curve is a fake curve in that sense. It gives a false sense of individual rating significance.

The asperities in the lichess distributions are informative. I should have commented on this first.

And as one poster just mentioned, all your dense clouds actually argue to a general effect like a line through the cloud would be significative.. (if no ratio have been buried in the implementation code, only documentation so far). Search ORDO and chess rating, I could find a GitHub repository, so some code (even some c code., many files) but only promise of documentation. no doc.

I am reacting this late, because I think people really don't get it what a rating system can do and can't do. And that ELO from mighty real OTB land is a shallow concept house, that is the worse at scrambling internal population fluxes onto all the individuals from fudging the information surges all over to maintain the fixed controlled few parameter population distribution as a premise.

so futile pursuit, and worn out method. However I actually appreciate the cloud plot. It speaks to the opposite of the need for something else on lichess. The density of it and well packs envelope, is rather comforting actually. Statistics are statistics, but not having a full dispersed cloud is saying the bulk is the same.. And I would suggest the lichess variations are the true information.

dboing

#79

one can do the statistics of regressions without representing all the points.. as well. these can be graphed, and confidence intervals or error bars would represent the variation around the averages of such conditional function regression.

non-linear regression algorithms might also help. It does look like a straight line, and some variation around it.
The non-linear models regression might tell us if higher polynomial degrees are warranted.

Then I would look at the asperities from lichess own statistics with weekly distribution.. A lot more fine grain peek into its own data.

The asperities of each time control distribution, and its persistence at same bin values from week to week, kind of instill confidence of something having been captured of informative value.

There are even hypotheses** around about people not cheating at all, just taking an accomplishment vacation.... on some round values... now where are those. I think from the total population live on lichess numbers one can reach those... I prefer the population assumptions to be minimal, so not pushing on the cover of some pressure cooker... to even out the temperature in the gaz... let the gaz go where it wants to.....

en.wikipedia.org/wiki/Asperity_(materials_science)

** a reddit thread. and a few recurring forum questions. (one of mine, if I recall). if anyone want the links, I might try to find them.

dboing

edited

#80

lichess.org/stat/rating/distribution/blitz
4 bins per hundredth.. or is that 5.. being intervals of floats (and real line in the mathematical model).

anyway, however they split the continuum of the rating values, 4 or five points, we see the obvious psychological sub-population that might be less looking at the chessboard and more at their "exact" rating effects.

I think this should keep.... If wanting to go there, we might make unfitting experiments with controlled smoothe distributoins, and try to estimate some relative size of those significant bumps... compared to bulk of population.

or if could be integral or surface under curve from splines with various smoothing constraints (keep some flexibility
). If more statisticians than me could vet such visual thinking.