Hi all,
Ever since Mark Damish's bot Big_Brother has started to record
matches between the top 150 players on fibs I wanted to extract
some statistics from them. Now that the database at Patti Beadle's
ftp site contains over a 1000 matches from August 95 to October 95
there are enough data to derive something statistically significant.
So I wrote a small awk-script to derive match equities, initial cube
actions, and influence of the fibs rating difference on the outcome of
matches at certain lengths. As a fairly homogeneous control set for the
script I used the collection of 100 mloner vs. idiot 5pters from their
recent battle on fibs (mloner won 55 to 45).
In part the results are surprising. My first explanation for the
occurring deviations from "official" data was of course bugs in my
script. But even after extensive debugging (there is no bug free
software), the data don't behave as they should. The explanations left
may be a combination of lack of examples, undetected noise in the
matches (I did some consistency checks and the data seem to be of good
quality - I had to rule out just 2 matches out of 1035),
principle problems in methodology, or a significant
difference between real life backgammon and that virtual world of
fibs. I don't know. Well, maybe we're really testbunnies afterall -
and the data is completely skewed by the real Big Brothers:-)
Anyway here are the data:
(A) Match Equities
To compute the match equities I simply counted the number of wins from
a particular score (in terms of m-away,n-away), and the number of
losses. The equity[m-away,n-away], that is, the average number of
matches won (or lost) from a certain score, can then be derived by
(win[m-away,n-away]-loss[m-away,n-away])/
(win[m-away,n-away]+loss[m-away,n-away]).
Post-Crawford scores I did not count. This leads to the following table
(Table 1).
Match Equities (Table 1)
1 2 3 4 5 6 7
1: 0.000 0.333 0.526 0.590 0.765 0.826 0.824
2: -0.333 0.000 0.143 0.222 0.526 0.421 0.622
3: -0.526 -0.143 0.000 0.116 0.269 0.565 0.535
4: -0.590 -0.222 -0.116 0.000 0.170 0.282 0.309
5: -0.765 -0.526 -0.269 -0.170 0.000 0.296 0.267
6: -0.826 -0.421 -0.565 -0.282 -0.296 0.000 0.062
7: -0.824 -0.622 -0.535 -0.309 -0.267 -0.062 0.000
Or in more familiar terms of the percentage matches won (Table 2):
Match Percentages (Table 2)
1 2 3 4 5 6 7
1: 50.0 66.7 76.3 79.5 88.2 91.3 91.2
2: 33.3 50.0 57.1 61.1 76.3 71.1 81.1
3: 23.7 42.9 50.0 55.8 63.4 78.3 76.7
4: 20.5 38.9 44.2 50.0 58.5 64.1 65.5
5: 11.8 23.7 36.6 41.5 50.0 64.8 63.3
6: 8.7 28.9 21.7 35.9 35.2 50.0 53.1
7: 8.8 18.9 23.3 34.5 36.7 46.9 50.0
While the table seems to be fairly consistent in itself, that is, the
chance of winning a match increases with the lead in the match,
the actual numbers differ in part significantly from Kit
Woolsey's table derived from Hal Heinrich's database of real life
experts (to my knowledge first published in Inside
Backgammon/Vol.2/No.2/March-April 1992). For example, for
(1-away,2-away) the fibs table gives just 67% chance, while Woolsey's
table give 70% chance. Even more striking, the percentage for
(2-away,4-away) differs by a whopping 9% from the figure in Woolsey's
table. Mark Damish, whome I have sent the table a few days ago,
suggested that this indicates that the fibs experts double way too late
when being behind 4-away/2-away. Admittedly, I did not know about the
proper cube-action at this score (but I wouldn't consider myself as an
expert), so this might well be (see also below - Table 4).
The other reason for the deviations may of course be a lack of examples.
However, at least up to scores 7-away/7-away the number of examples
almost reaches the numbers in Heinrich's database (see Table 3, divide
numbers for n-away/n-away by 2). (I started with determining the degree
of error for a 95% confidence interval - but still have to freshen up
my poor background in statistics to give any hard data here).
Number of examples for a particular score (Table 3)
1 2 3 4 5 6 7
1: 240 57 152 39 119 23 34
2: 57 212 203 90 114 38 37
3: 152 203 550 138 227 46 86
4: 39 90 138 118 212 39 55
5: 119 114 227 212 872 54 120
6: 23 38 46 39 54 62 128
7: 34 37 86 55 120 128 554
For comparison Table 2a gives the percentages from the mloner vs. idiot
matches:
Match Percentages from mloner vs. idiot (Table 2a)
1 2 3 4 5
1: - 50.0 83.3 90.0 96.3
2: 50.0 50.0 68.8 88.9 66.7
3: 16.7 31.2 50.0 48.1 70.5
4: 10.0 11.1 51.9 50.0 63.3
5: 3.7 33.3 29.5 36.7 50.0
Obviously, these figures even increase the observed trend in deviation.
The leader appears to have much better chances then usual. However,
100 matches do not give that many examples (see Table 3a):
Number of examples for a particular score from mloner vs. idiot (Table 3a)
1 2 3 4 5
1: 0 4 6 10 27
2: 4 12 16 18 12
3: 6 16 24 27 44
4: 10 18 27 16 49
5: 27 12 44 49 200
(B) Cube action
(B.A) Initial Cubes depending on matchscore
The next thing I investigated was the number of initial cubes issued by
the leader vs. the number of initial cubes issued by the trailer in a
match (see Table 4). Again I switched off Post-Crawford games, thus
there are no figures for 1-away scores.
Initial Cubes by Leader depending on the match-score (Table 4)
2 3 4 5 6 7
2: - 26.3 22.6 22.2 19.4 22.9
3: 73.7 - 48.1 41.8 45.5 43.4
4: 77.4 51.9 - 48.1 43.2 50.0
5: 77.8 58.2 51.9 - 50.0 41.2
6: 80.6 54.5 56.8 50.0 - 56.3
7: 77.1 56.6 50.0 58.8 43.7 -
The table seems to be fairly consistent in itself, that is, the
larger the lead, the less initial cubes are issued. If Mark Damish's
hypothesis is true, the percentage for 4-away/2-away should probably be
larger. The inconsistency for some 6-away or 7-away scores may be due
to the lack of examples (see Table 5 - again divide n-away/n-away by 2).
Overall Number of Initial Cubes (Table 5)
2 3 4 5 6 7
2: 208 190 84 108 36 35
3: 190 534 133 225 44 83
4: 84 133 114 212 37 54
5: 108 225 212 848 54 119
6: 36 44 37 54 62 126
7: 35 83 54 119 126 546
And here are the data from the mloner vs. idiot matches.
Initial Cubes by Leader from mloner vs. idiot (Table 4a)
2 3 4 5
2: - 31.2 25.0 16.7
3: 68.8 - 48.1 39.5
4: 75.0 51.9 - 58.3
5: 83.3 60.5 41.7 -
Overall Number of Initial Cubes (Table 5a)
2 3 4 5
2: 12 16 16 12
3: 16 24 27 43
4: 16 27 16 48
5: 12 43 48 196
(B.B) Individual Cube Action
The mere number of initial cubes tells only part of the story.
Following a method developed by Kit Woolsey (see again Inside
Backgammon/Vol.2/No.2/March-April 1992 and Inside
Backgammon/Vol.2/No.3/ May-June 92) I counted the number of takes
vs. the number of passes of all initial doubles at scores >=
4-away/4-away. In his evaluation of Heinrich's database
Woolsey only considered scores >= 15-away/15-away and some close
scores >= 7-away, but as the maximum length of matches in
Mark's fibs-collection is 11pts I just had to lower this borderline.
Furthermore, to get an impression on the effectiveness of
the cube action, I determined the equity on takes as follows
(again following Woolsey).
For every game doubled by Player 1, taken by Player 2, and played
to conclusion I counted the number of points won or lost by Player 2.
Because of the limited length of the matches I counted backgammons
gammons. For every taken recube I gave Player 2 a settlement of
1.6 points, and for every recube/pass Player 2 got 2 points.
For each player who has at least issued one initial cube this procedure
gave entries like the following:
Examples of individual cube actions (Table 6)
Player doubles Opponent doubles
Take Eq-opp Pass Total Take Eq-play Pass Total
mloner : 94 ( 69%, -0.55) 42 ( 30%) 136 52 ( 48%, -0.49) 55 ( 51%) 107
idiot : 11 ( 47%, -0.95) 12 ( 52%) 23 17 ( 77%, -0.80) 5 ( 22%) 22
funk : 20 ( 60%, -1.56) 13 ( 39%) 33 23 ( 71%, -0.78) 9 ( 28%) 32
.........
Sum :843 ( 55%, -0.71)680 ( 44%)1523 843 ( 55%, -0.71)680 ( 44%) 1523
The first row gives the number of takes by the opponent, the second the
percentage of takes, the third the opponent's equity on a take, the 4th
the number of passes, the 5th their percentage, and the 6th the total
number of initial doubles issued by the player (rows 7 through 12 give
the figures for the initial doubles by the opponent). Obviously
take-equities below -1.00 indicate that the opponent or player has
taken too much, and take-equities close to 0 or even above indicate
that the player or opponent tend to double prematurely. So I certainly
like my cube action here, but the bots aren't doing that bad either :).
For the full list see the Appendix.
To get a feeling for the influence of the fairly small match-length on
these data I computed the summary data for slightly larger scores too:
>=4 : 843 ( 55%, -0.71) 680 ( 44%) 1523 (see above)
>=5 : 649 ( 56%, -0.80) 498 ( 43%) 1147
>=6 : 297 ( 56%, -0.72) 233 ( 43%) 530
>=7 : 200 ( 54%, -0.67) 165 ( 45%) 365
Applying the same procedure to the mloner vs. idiot matches lead to very
surprising data...(see Table 6a)
Individual cube actions from mloner vs. idiot (Table 6a)
Mloner doubles Idiot doubles
Take Eq-idiot Pass Total Take Eq-mloner Pass Total
49 ( 60%, -0.78) 32 ( 39%) 81 28 ( 38%, 0.29) 45 ( 61%) 73
It certainly looks like idiot has found its master. mloner seems to take
very conservatively (only 38%) but manages to get a positive equity on
these takes!
(C) The Fibs-rating or "How much of a favorite is the favorite?"
Finally, I took a look at the relationship between rating difference,
match length, and outcome of the match. These data I find the hardest
to believe, so I'm still searching hard for the bug in my script.
At first I counted for each match length (ml) the number of matches (nm),
the number of points played (np = nm*ml), the number of points won by
the favorite - number of points won by the player with the smaller
rating (tp) (the size of the win was of course not regarded).
tp/np then gives the average number of points won by the
favorite. For comparison, I computed also the total rating gain (tg) for
the favorite based on the fibs rating formula used. In order to make
the rating changes comparable over the various match-lengths, I accumulated
also all rating changes divided by the sqrt(ml), giving a total rating gain
average (tga) and divided this number by the number of matches played for
each length, giving tga/nm.
And here are the results:
Gain/Loss of favorite according to matchlength (Table 7)
ml: nm ( np) tp tp/np; tg tga/nm
1: 120 ( 120) 24 0.200; 36.76 0.306
2: 10 ( 20) -4 -0.200; -6.81 -0.482
3: 185 ( 555) 27 0.049; 3.42 0.011
4: 3 ( 12) -12 -1.000; -12.62 -2.104
5: 405 (2025) -35 -0.017; -146.05 -0.161
6: 1 ( 6) -6 -1.000; -5.11 -2.086
7: 270 (1890) -14 -0.007; -143.61 -0.201
8: 1 ( 8) -8 -1.000; -5.83 -2.062
9: 25 ( 225) -45 -0.200; -43.51 -0.580
11: 13 ( 143) -33 -0.231; -28.17 -0.653
Su: 1033 (5004)-106 -0.103; -351.53 -0.115
It looks like for all matches longer than 3 pts the favorite is not
favorite at all! (and the 1-pters may mostly be due to one_pointer :)) If
this was only true for the rating changes, one might argue that the rating
formula is unjust, but it also holds for the total points won or lost. And
for those the size of the rating difference is not taken into account at
all. I'm totally stuck here - is there any principle flaw in the
evaluation?
Gain/Loss of favorite according to rating difference (Table 8)
rd: nm ( np) tp tp/np; tg tga/nm
0: 142 ( 709) 25 0.035; 9.93 -0.017
10: 146 ( 701) -37 -0.053; -32.84 -0.063
20: 104 ( 480) -26 -0.054; -26.80 -0.060
30: 106 ( 528) 32 0.061; 3.29 0.025
40: 86 ( 435) -73 -0.168; -78.72 -0.345
50: 78 ( 390) -66 -0.169; -86.64 -0.499
60: 63 ( 297) 1 0.003; -16.10 -0.061
70: 65 ( 321) -15 -0.047; -31.62 -0.155
80: 45 ( 200) -6 -0.030; -22.21 -0.152
90: 31 ( 130) 36 0.277; 21.60 0.371
100: 40 ( 210) 2 0.010; -25.14 -0.261
110: 15 ( 77) 15 0.195; 2.80 0.121
120: 13 ( 67) -17 -0.254; -22.57 -0.758
130: 17 ( 101) 53 0.525; 27.83 0.699
140: 11 ( 57) -25 -0.439; -25.34 -0.890
150: 10 ( 46) 4 0.087; -5.50 -0.353
160: 15 ( 73) -27 -0.370; -29.56 -0.508
170: 9 ( 43) 5 0.116; -0.52 0.267
180: 9 ( 43) -11 -0.256; -15.71 -0.646
190: 10 ( 32) 6 0.188; 1.91 0.449
200: 6 ( 24) 12 0.500; 4.13 0.241
210: 3 ( 13) -3 -0.231; -5.92 -1.135
220: 3 ( 15) -1 -0.067; -5.71 -1.188
240: 2 ( 2) 0 0.000; -0.56 -0.278
250: 3 ( 3) 3 1.000; 5.15 1.718
260: 1 ( 7) 7 1.000; 3.29 1.242
Sum: 1033 (5004)-106 -0.103; -351.53 -0.115
Maybe Table 8 sheds some further light on this. It gives the same data
categorized according to the size of the rating difference in steps of 10
pts (the fact that tg=9.93 >0 whereas tga < 0 for rating differences
between 0 and 10 pts comes from dividing the individual rating changes by
sqrt(matchlength).
Food for thought...
Further interpretations and suggestions for other kinds of summaries are
welcome. As soon as I've got the time I intend to some analysis of
checkerplays and gammon-rates at certain match-scores.
Enjoy,
Peter Fankhauser
|