Backgammon Match Archives

Match Archives

Big Brother Statistics

From:   Peter Fankhauser
Address:   fankhaus@darmstadt.gmd.de
Date:   19 January 1996
Subject:   Some statistics on the BigBrother Matches (till November)
Forum:   rec.games.backgammon
Google:   4dofu0$slm@omega.gmd.de

Hi all, Ever since Mark Damish's bot Big_Brother has started to record matches between the top 150 players on fibs I wanted to extract some statistics from them. Now that the database at Patti Beadle's ftp site contains over a 1000 matches from August 95 to October 95 there are enough data to derive something statistically significant. So I wrote a small awk-script to derive match equities, initial cube actions, and influence of the fibs rating difference on the outcome of matches at certain lengths. As a fairly homogeneous control set for the script I used the collection of 100 mloner vs. idiot 5pters from their recent battle on fibs (mloner won 55 to 45). In part the results are surprising. My first explanation for the occurring deviations from "official" data was of course bugs in my script. But even after extensive debugging (there is no bug free software), the data don't behave as they should. The explanations left may be a combination of lack of examples, undetected noise in the matches (I did some consistency checks and the data seem to be of good quality - I had to rule out just 2 matches out of 1035), principle problems in methodology, or a significant difference between real life backgammon and that virtual world of fibs. I don't know. Well, maybe we're really testbunnies afterall - and the data is completely skewed by the real Big Brothers:-) Anyway here are the data: (A) Match Equities To compute the match equities I simply counted the number of wins from a particular score (in terms of m-away,n-away), and the number of losses. The equity[m-away,n-away], that is, the average number of matches won (or lost) from a certain score, can then be derived by (win[m-away,n-away]-loss[m-away,n-away])/ (win[m-away,n-away]+loss[m-away,n-away]). Post-Crawford scores I did not count. This leads to the following table (Table 1). Match Equities (Table 1) 1 2 3 4 5 6 7 1: 0.000 0.333 0.526 0.590 0.765 0.826 0.824 2: -0.333 0.000 0.143 0.222 0.526 0.421 0.622 3: -0.526 -0.143 0.000 0.116 0.269 0.565 0.535 4: -0.590 -0.222 -0.116 0.000 0.170 0.282 0.309 5: -0.765 -0.526 -0.269 -0.170 0.000 0.296 0.267 6: -0.826 -0.421 -0.565 -0.282 -0.296 0.000 0.062 7: -0.824 -0.622 -0.535 -0.309 -0.267 -0.062 0.000 Or in more familiar terms of the percentage matches won (Table 2): Match Percentages (Table 2) 1 2 3 4 5 6 7 1: 50.0 66.7 76.3 79.5 88.2 91.3 91.2 2: 33.3 50.0 57.1 61.1 76.3 71.1 81.1 3: 23.7 42.9 50.0 55.8 63.4 78.3 76.7 4: 20.5 38.9 44.2 50.0 58.5 64.1 65.5 5: 11.8 23.7 36.6 41.5 50.0 64.8 63.3 6: 8.7 28.9 21.7 35.9 35.2 50.0 53.1 7: 8.8 18.9 23.3 34.5 36.7 46.9 50.0 While the table seems to be fairly consistent in itself, that is, the chance of winning a match increases with the lead in the match, the actual numbers differ in part significantly from Kit Woolsey's table derived from Hal Heinrich's database of real life experts (to my knowledge first published in Inside Backgammon/Vol.2/No.2/March-April 1992). For example, for (1-away,2-away) the fibs table gives just 67% chance, while Woolsey's table give 70% chance. Even more striking, the percentage for (2-away,4-away) differs by a whopping 9% from the figure in Woolsey's table. Mark Damish, whome I have sent the table a few days ago, suggested that this indicates that the fibs experts double way too late when being behind 4-away/2-away. Admittedly, I did not know about the proper cube-action at this score (but I wouldn't consider myself as an expert), so this might well be (see also below - Table 4). The other reason for the deviations may of course be a lack of examples. However, at least up to scores 7-away/7-away the number of examples almost reaches the numbers in Heinrich's database (see Table 3, divide numbers for n-away/n-away by 2). (I started with determining the degree of error for a 95% confidence interval - but still have to freshen up my poor background in statistics to give any hard data here). Number of examples for a particular score (Table 3) 1 2 3 4 5 6 7 1: 240 57 152 39 119 23 34 2: 57 212 203 90 114 38 37 3: 152 203 550 138 227 46 86 4: 39 90 138 118 212 39 55 5: 119 114 227 212 872 54 120 6: 23 38 46 39 54 62 128 7: 34 37 86 55 120 128 554 For comparison Table 2a gives the percentages from the mloner vs. idiot matches: Match Percentages from mloner vs. idiot (Table 2a) 1 2 3 4 5 1: - 50.0 83.3 90.0 96.3 2: 50.0 50.0 68.8 88.9 66.7 3: 16.7 31.2 50.0 48.1 70.5 4: 10.0 11.1 51.9 50.0 63.3 5: 3.7 33.3 29.5 36.7 50.0 Obviously, these figures even increase the observed trend in deviation. The leader appears to have much better chances then usual. However, 100 matches do not give that many examples (see Table 3a): Number of examples for a particular score from mloner vs. idiot (Table 3a) 1 2 3 4 5 1: 0 4 6 10 27 2: 4 12 16 18 12 3: 6 16 24 27 44 4: 10 18 27 16 49 5: 27 12 44 49 200 (B) Cube action (B.A) Initial Cubes depending on matchscore The next thing I investigated was the number of initial cubes issued by the leader vs. the number of initial cubes issued by the trailer in a match (see Table 4). Again I switched off Post-Crawford games, thus there are no figures for 1-away scores. Initial Cubes by Leader depending on the match-score (Table 4) 2 3 4 5 6 7 2: - 26.3 22.6 22.2 19.4 22.9 3: 73.7 - 48.1 41.8 45.5 43.4 4: 77.4 51.9 - 48.1 43.2 50.0 5: 77.8 58.2 51.9 - 50.0 41.2 6: 80.6 54.5 56.8 50.0 - 56.3 7: 77.1 56.6 50.0 58.8 43.7 - The table seems to be fairly consistent in itself, that is, the larger the lead, the less initial cubes are issued. If Mark Damish's hypothesis is true, the percentage for 4-away/2-away should probably be larger. The inconsistency for some 6-away or 7-away scores may be due to the lack of examples (see Table 5 - again divide n-away/n-away by 2). Overall Number of Initial Cubes (Table 5) 2 3 4 5 6 7 2: 208 190 84 108 36 35 3: 190 534 133 225 44 83 4: 84 133 114 212 37 54 5: 108 225 212 848 54 119 6: 36 44 37 54 62 126 7: 35 83 54 119 126 546 And here are the data from the mloner vs. idiot matches. Initial Cubes by Leader from mloner vs. idiot (Table 4a) 2 3 4 5 2: - 31.2 25.0 16.7 3: 68.8 - 48.1 39.5 4: 75.0 51.9 - 58.3 5: 83.3 60.5 41.7 - Overall Number of Initial Cubes (Table 5a) 2 3 4 5 2: 12 16 16 12 3: 16 24 27 43 4: 16 27 16 48 5: 12 43 48 196 (B.B) Individual Cube Action The mere number of initial cubes tells only part of the story. Following a method developed by Kit Woolsey (see again Inside Backgammon/Vol.2/No.2/March-April 1992 and Inside Backgammon/Vol.2/No.3/ May-June 92) I counted the number of takes vs. the number of passes of all initial doubles at scores >= 4-away/4-away. In his evaluation of Heinrich's database Woolsey only considered scores >= 15-away/15-away and some close scores >= 7-away, but as the maximum length of matches in Mark's fibs-collection is 11pts I just had to lower this borderline. Furthermore, to get an impression on the effectiveness of the cube action, I determined the equity on takes as follows (again following Woolsey). For every game doubled by Player 1, taken by Player 2, and played to conclusion I counted the number of points won or lost by Player 2. Because of the limited length of the matches I counted backgammons gammons. For every taken recube I gave Player 2 a settlement of 1.6 points, and for every recube/pass Player 2 got 2 points. For each player who has at least issued one initial cube this procedure gave entries like the following: Examples of individual cube actions (Table 6) Player doubles Opponent doubles Take Eq-opp Pass Total Take Eq-play Pass Total mloner : 94 ( 69%, -0.55) 42 ( 30%) 136 52 ( 48%, -0.49) 55 ( 51%) 107 idiot : 11 ( 47%, -0.95) 12 ( 52%) 23 17 ( 77%, -0.80) 5 ( 22%) 22 funk : 20 ( 60%, -1.56) 13 ( 39%) 33 23 ( 71%, -0.78) 9 ( 28%) 32 ......... Sum :843 ( 55%, -0.71)680 ( 44%)1523 843 ( 55%, -0.71)680 ( 44%) 1523 The first row gives the number of takes by the opponent, the second the percentage of takes, the third the opponent's equity on a take, the 4th the number of passes, the 5th their percentage, and the 6th the total number of initial doubles issued by the player (rows 7 through 12 give the figures for the initial doubles by the opponent). Obviously take-equities below -1.00 indicate that the opponent or player has taken too much, and take-equities close to 0 or even above indicate that the player or opponent tend to double prematurely. So I certainly like my cube action here, but the bots aren't doing that bad either :). For the full list see the Appendix. To get a feeling for the influence of the fairly small match-length on these data I computed the summary data for slightly larger scores too: >=4 : 843 ( 55%, -0.71) 680 ( 44%) 1523 (see above) >=5 : 649 ( 56%, -0.80) 498 ( 43%) 1147 >=6 : 297 ( 56%, -0.72) 233 ( 43%) 530 >=7 : 200 ( 54%, -0.67) 165 ( 45%) 365 Applying the same procedure to the mloner vs. idiot matches lead to very surprising data...(see Table 6a) Individual cube actions from mloner vs. idiot (Table 6a) Mloner doubles Idiot doubles Take Eq-idiot Pass Total Take Eq-mloner Pass Total 49 ( 60%, -0.78) 32 ( 39%) 81 28 ( 38%, 0.29) 45 ( 61%) 73 It certainly looks like idiot has found its master. mloner seems to take very conservatively (only 38%) but manages to get a positive equity on these takes! (C) The Fibs-rating or "How much of a favorite is the favorite?" Finally, I took a look at the relationship between rating difference, match length, and outcome of the match. These data I find the hardest to believe, so I'm still searching hard for the bug in my script. At first I counted for each match length (ml) the number of matches (nm), the number of points played (np = nm*ml), the number of points won by the favorite - number of points won by the player with the smaller rating (tp) (the size of the win was of course not regarded). tp/np then gives the average number of points won by the favorite. For comparison, I computed also the total rating gain (tg) for the favorite based on the fibs rating formula used. In order to make the rating changes comparable over the various match-lengths, I accumulated also all rating changes divided by the sqrt(ml), giving a total rating gain average (tga) and divided this number by the number of matches played for each length, giving tga/nm. And here are the results: Gain/Loss of favorite according to matchlength (Table 7) ml: nm ( np) tp tp/np; tg tga/nm 1: 120 ( 120) 24 0.200; 36.76 0.306 2: 10 ( 20) -4 -0.200; -6.81 -0.482 3: 185 ( 555) 27 0.049; 3.42 0.011 4: 3 ( 12) -12 -1.000; -12.62 -2.104 5: 405 (2025) -35 -0.017; -146.05 -0.161 6: 1 ( 6) -6 -1.000; -5.11 -2.086 7: 270 (1890) -14 -0.007; -143.61 -0.201 8: 1 ( 8) -8 -1.000; -5.83 -2.062 9: 25 ( 225) -45 -0.200; -43.51 -0.580 11: 13 ( 143) -33 -0.231; -28.17 -0.653 Su: 1033 (5004)-106 -0.103; -351.53 -0.115 It looks like for all matches longer than 3 pts the favorite is not favorite at all! (and the 1-pters may mostly be due to one_pointer :)) If this was only true for the rating changes, one might argue that the rating formula is unjust, but it also holds for the total points won or lost. And for those the size of the rating difference is not taken into account at all. I'm totally stuck here - is there any principle flaw in the evaluation? Gain/Loss of favorite according to rating difference (Table 8) rd: nm ( np) tp tp/np; tg tga/nm 0: 142 ( 709) 25 0.035; 9.93 -0.017 10: 146 ( 701) -37 -0.053; -32.84 -0.063 20: 104 ( 480) -26 -0.054; -26.80 -0.060 30: 106 ( 528) 32 0.061; 3.29 0.025 40: 86 ( 435) -73 -0.168; -78.72 -0.345 50: 78 ( 390) -66 -0.169; -86.64 -0.499 60: 63 ( 297) 1 0.003; -16.10 -0.061 70: 65 ( 321) -15 -0.047; -31.62 -0.155 80: 45 ( 200) -6 -0.030; -22.21 -0.152 90: 31 ( 130) 36 0.277; 21.60 0.371 100: 40 ( 210) 2 0.010; -25.14 -0.261 110: 15 ( 77) 15 0.195; 2.80 0.121 120: 13 ( 67) -17 -0.254; -22.57 -0.758 130: 17 ( 101) 53 0.525; 27.83 0.699 140: 11 ( 57) -25 -0.439; -25.34 -0.890 150: 10 ( 46) 4 0.087; -5.50 -0.353 160: 15 ( 73) -27 -0.370; -29.56 -0.508 170: 9 ( 43) 5 0.116; -0.52 0.267 180: 9 ( 43) -11 -0.256; -15.71 -0.646 190: 10 ( 32) 6 0.188; 1.91 0.449 200: 6 ( 24) 12 0.500; 4.13 0.241 210: 3 ( 13) -3 -0.231; -5.92 -1.135 220: 3 ( 15) -1 -0.067; -5.71 -1.188 240: 2 ( 2) 0 0.000; -0.56 -0.278 250: 3 ( 3) 3 1.000; 5.15 1.718 260: 1 ( 7) 7 1.000; 3.29 1.242 Sum: 1033 (5004)-106 -0.103; -351.53 -0.115 Maybe Table 8 sheds some further light on this. It gives the same data categorized according to the size of the rating difference in steps of 10 pts (the fact that tg=9.93 >0 whereas tga < 0 for rating differences between 0 and 10 pts comes from dividing the individual rating changes by sqrt(matchlength). Food for thought... Further interpretations and suggestions for other kinds of summaries are welcome. As soon as I've got the time I intend to some analysis of checkerplays and gammon-rates at certain match-scores. Enjoy, Peter Fankhauser

Did you find the information in this article useful?

Do you have any comments you'd like to add?

Match Archives

Big Brother (Mark Damish, Oct 1995)

Big Brother Statistics (Peter Fankhauser, Jan 1996) [Long message]

Big Brother file viewer (Vince Mounts, June 1998)

Database software (Achim Müller, July 1997)

Dueller: Jellyfish vs. Snowie (Tony Lezard, Jan 2002)

LittleSister (Francois Hochede, May 2002)

Monte Carlo 2001 Final (Daniel Murphy, July 2001) [Long message]

Standard notation/archive format (Mark Damish, Oct 1994)

Where to get matches to download (Stick, Apr 2006)

Where to get matches to download (KaRaNLiK+, Oct 2005)

[GammOnLine forum] From GammOnLine [Long message] Long message [Recommended reading] Recommended reading [Recent addition] Recent addition

Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

Return to: Backgammon Galore : Forum Archive Main Page