August 17, 2011

Computerized Statistical Analysis of Chessplayer Ratings over Time


For many decades, chess players have incorporated computers into their game. We have chess-playing software, of course, to provide a computerized opponent; this software has improved to the point where it is now useful as a training tool at even the highest level of chess, as I wrote about before.

chess pieces

Another thing that chess players have been doing for many years is to record their games in a machine-readable format. The most popular chess notation format, I believe, is Portable Game Notation, an elaboration of the older Algebraic notation invented by Philipp Stamma almost 250 years ago.

Portable Game Notation is nearly 20 years old now, and the global accumulation of recorded chess games has become substantial. The ChessBase company sells a database of almost five million games for under $100. (Although the ChessBase format is not the same as PGN, both formats are similar and there are programs to convert back and forth).

Recently, a team of chess-playing mathematics students at the University of Buffalo, under the guidance of Professor Ken Regan, conducted a statistical analysis of such a database of games, with the goal of analyzing whether or not there had been "grade inflation" in the ELO chess player rating system (you can learn more about Professor Arpad Elo's rating system here). The team recently published their paper: Intrinsic Chess Ratings. The authors note that they were interested in several questions:

  1. Has there been 'inflation' -- or deflation -- in the chess Elo rating system over the past forty years?
  2. Were the top players of earlier times as strong as the top players of today?
  3. Does a faster time control markedly reduce the quality of play?
  4. Can recorded games from tournaments where high results by a player are suspected as fraudulent reveal the extent to which luck or collusion played a role?

In many sporting competitions, it is nearly impossible to compare contestants of different eras: how would Babe Ruth hit against Tim Lincecum? Would Jim Thorpe have scored against the 1985 Chicago Bears defense? These questions, although fun to debate, are practically worthless because so many things about their respective games have changed: rules, styles, team formations, training techniques, equipment, etc. But with chess, the rules and equipment has remained essentially unchanged for many hundreds of years, so the question of analyzing chess play over the years is much more tractable. The authors, in their study, conclude that:

there has been little or no ‘inflation’ in ratings over time—if anything there has been deflation. This runs counter to conventional wisdom, but is predicted by population models on which rating systems have been based [Gli99]. The results also support a no answer to question 2. In the 1970’s there were only two players with ratings over 2700, namely Bobby Fischer and Anatoly Karpov, and there were years as late as 1981 when no one had a rating over 2700 (see [Wee00]). In the past decade there have usually been thirty or more players with such ratings. Thus lack of inflation implies that those players are better than all but Fischer and Karpov were.

So, the next time you're at a party, and somebody engages you in a discussion about whether Sergey Karjakin or Hikaru Nakamura could hold their own against Mikhail Botvinnik or Tigran Petrosian, you can now look them in the eye and say:

Yes, I think they could. In fact, I think they would win 4 out of 7.

and be able to back it up!