David Montgomery (monty@cs.umd.edu) wrote:
> +24-23-22-21-20-19-+---+18-17-16-15-14-13-+
> | X X O O O | | X O |
> | O O O | | X O |
> | | | O |
> | | | O |
> | | | |
> | | | | [1]
> | | | |
> | | | O |
> | | | X O |
> | | | X O |
> | X X X | | X O |
> | X X X | | X X O |
> +-1--2--3--4--5--6-+---+-7--8--9-10-11-12-+
> Money game. X to play 6-3.
>
> Thanks for all the responses to this problem.
> This is a position from _Costa Rica 1993_.
> Wilcox Snellings played 22/13.
>
> My preference before seeing any rollouts
> or analysis was for 11/5 7/4. This was also
> the choice of Herb Gurland, a top Boston player.
> The authors of _Costa Rica 1993_ also preferred
> 11/5 7/4.
>
> Wilcox Snelling rolled two plays out by hand
> 108 times with the following results:
>
> 11/5 7/4 -.42
> 22/13 -.50
>
> The authors rolled another play out 108 times:
>
> 11/8 24/18 -.50
>
> I rolled all of these plays out (and several others)
> 3888 times on Jellyfish (no truncation, duplicate
> dice, 3 sets of 1296 with seeds 2430, 2431, and 2432).
>
> Jellyfish cubeless equities:
>
> 22/13 -.363
> 22/16 11/8 -.405
> 22/16 7/4 -.416
> 24/18 11/8 -.446
> 11/5 7/4 -.449
> 24/18 7/4 -.456
> 24/15 -.458
>
> I wasn't really that surprised that 22/13 came out
> on top, although it wasn't the play that I would
> have made. But I was *very* surprised that it came
> out right by so much. This mistake actually costs
> about 2/10 of a point when the cube is figured in.
>
> I would welcome any further illumination on why
> 22/13 is so much better than the other plays,
> especially 11/5 7/4.
>
> David Montgomery
> monty on FIBS
While Jellyfish rollouts are usually accurate and quite informative,
occasionally they can give us wrong information. One the dangers is that
the program is simply misplaying the position, and this affects one of
the plays being rolled out more than the other one. Keep in mind that
for the rollout the program is playing with only 1-ply (that is the same
as level 5). This is necessary for speed purposes -- to use 2-ply in the
rollouts would make the rollouts take far longer. The program still
plays pretty well at 1-ply, but not nearly as well as 2-ply and therefore
is more likely to be doing something wrong in the play. Most of the time
this will not matter (particularly in play vs. play problems), since
these errors in play tend to cancel out and generally are not huge
anyway. Occasionally the two plays being tested lead to different types
of positions, where one play gives the program a chance to make an error
which the other play doesn't.
When I saw David's results, I thought this might be happening. I thought
after playing 11/5, 7/4 the program might be making the defensive three
point if it rolled a two. I also thought this might be the wrong
strategy -- hanging back on the ace point with the back man and springing
the other checker could be better. So, I decided to run a test. I had X
play 11/5, 7/4 with the 6-3, and gave O a 6-1 (played 13/6). This left
the following position:
13 14 15 16 17 18 19 20 21 22 23 24
+------------------------------------------+
| O X | | O O O X X |
| O X | | O O O |
| O | | O |
| O | | |
| | | |
| | | |
| O | | |
| O X | | X X |
| O X | | X X X |
| O X | | X X X |
+------------------------------------------+
12 11 10 9 8 7 6 5 4 3 2 1
Now I gave X a 4-2 to play, and looked at Jellyfish's 1-ply opinion. I
also rolled out the three logical plays 2952 times each, duplicate dice.
These were the results:
Play 1-ply Rollout
24/22, 7/3 -.428 -.514
7/3, 5/3 -.501 -.486
22/18, 5/3 -.510 -.412
These results confirmed my suspicions. Jellyfish was thematically
misplaying the position in its rollouts after playing 11/5, 7/4 with the
original 6-3. However after playing 22/13 with the 6-3 the program
didn't have the opportunity to make this sort of misplay, since there was
no way to make the 22 point so there was no incentive to move the back
checker. This misplay might be sufficient to turn the rollout results of
the 6-3 around, and certainly explains why 22/13 came out so much better
than 11/5, 7/4 in David's rollout.
Any time you are suspicious about the results of a rollout, it is vital
to examine how the program is playing at least the next couple of rolls
before accepting the results of the rollout as gospel. The rollouts are
good, but we still have to keep our eyes open or we may fall into some
unexpected traps.
Kit
|