Backgammon Snowie

Snowie

Using rollouts

From:   Michael J. Zehr
Address:   michaelz@michaelz.com
Date:   14 October 1998
Subject:   Re: Rollouts
Forum:   rec.games.backgammon
Google:   362557C5.86D36F13@michaelz.com

Andrew Bokelman wrote: > ... let me ask you this. Given the position below, how would you roll > out what Snowie listed as the top three choices. > > -------------------------------------------------------------------- > | AndrewB (X) vs. Snowie (O) | > | 9 point Match | > -------------------------------------------------------------------- > > Match to 9. Score X-O: 3-7 > > -------------------------- Move 16 O ------------------------- > O to play (5 1) > +-1--2--3--4--5--6--------7--8--9-10-11-12-+ > | O X O O O | | O O O | > | X O O O | | O O O | > | X | | O | S > | | | | n > | | | | o > | |BAR| | w > | 6 | O | | i > | X X | | | e > | X X | | | > | X X X | | | > | X X X | | | > +24-23-22-21-20-19-------18-17-16-15-14-13-+ > Pipcount X: 88 O: 113 X-O: 3-7/9 (6) > CubeValue: 1 > > * 1. 2 bar/20 7/6 1.028 > 2. 2 bar/20 9/8 0.953 (-0.076) > 3. 2 bar/19 0.943 (-0.086) > 4. 1 bar/20 8/7 0.872 (-0.157) > 5. 1 bar/20 6/5 0.839 (-0.189) The first thing I always do in analyzing a position is to get the best evaluation possible with the tool, and only then decide if it needs rollouts. If I had a second tool in my toolbox (Jellyfish) then I usually use that as well: S 3-ply J L7 1. bar/20 7/6 .991 .862 2. bar/20 9/8 .989 .883 3. bar/19 .976 .888 4. bar/20 6/5 .911 .744 5. bar/20 8/7 .888 .800 6. bar/20 5/4 .873 .796 Most of the time I'd call these differences too small to be worth doing a rollout. However this position is different because there are two very different strategies: come home safely and break points eventually; or voluntarily break the prime and try to force X off the anchor, hoping to close out 1-4 checkers eventually. (In other words, you gave us a great sample position!) This might make a good reference position for these strategies since the evaluations are close. (The rollouts might show huge differences however.) But we have to be aware that rollouts are not always the best tool for such analysis. If one strategy is better than another and the neural nets evaluate it incorrectly, the best we can do is force a neural net to make the desired play on the first move. For example, jellyfish might remake the 7point on the next play when we roll out bar/20 7/6, and Snowie might break the 7point next turn when we roll out bar/20 9/8. Next step in the analysis would be to pick one play from each of the candidate strategies. Later I might want to compare the choices for each strategy, but I can do that after determining which strategy I want to examine more closely. Just eyeballing the plays I'd choose bar/20 9/8 and bar/20 7/6. (If "eyeballing" isn't a good enough justification, consider these reasons: 9/8 prepares to eventually break the 9 without having to break a point with a 6, so that builder is in a better position on the 8 than the 9. That's why I'd prefer that to bar/19. Either one leaves one slightly awkward roll (55 or 66). For the other choice, I wouldn't break the 5 or 6 because I want those for when I hit X. Given that I don't want to break the 5 or 6 but I'm willing to break an outside point, 7/6 gives me an extra builder. Furthermore 7/6 gives a chance to hit a 4th blot on a 42, whereas 51 doesn't expose a blot after 8/7. After 7/6 X can't get a blot to safety on a 44, but after 8/7 he can on a 55. (It might not be right to run a blot all the way, but unless I'm positive it's wrong, why give X that choice?)) As part of analyzing this I note that the 2-ply and 3-ply evaluations show a big change. Because the ordring of plays is the same a 2-ply rollout is probably okay, but I make a mental note to consider a 3-ply rollout later. And since I'm still in the early stages of deciding how to do this I'll do some short rollouts first before doing the longer ones. (I agree with Chuck's statements about 1296 being a minimum length rollout for any serious analysis. But I believe the real art and science of doing rollouts is interpreting the results afterwards. Anyone can hit a problem over the head with a 1296-pound hammer, but knowing whether you have data or information (i.e. whether the results are only precise or if they're accurate) is much more difficult and requires a lot of experience. I'll start with short (216 game) rollouts just to get a feel for whether I'm getting anything useful out of the computer before running an overnight job: JF L6 pure: .950 (0.5 18.3 88.3 11.7 0.3 0.0) std dev .014 JF L6 impure: .915 (0.4 20.1 85.8 14.2 0.5 0.0) std dev .016 If I were using only Jellyfish I'd have to do my own analysis of how the cube changes things (since JF doesn't have true live cube rollouts). With the pure play JF says O is too good to double, but no longer has a double after 7/6 and X rolling a 4. Since after the pure play the equity will change very slowly, O is likely to be able to cash a fair number of the 11.7% losses. But after the impure play, many of the 14.2% losses come after X rolls an immediate 4, and now it's X that gets benefit of the cube. So I would expect to see the pure play be that much more ahead of the impure play with the cube live. Since the difference in equity between the two plays is .35, which is greater than the sum of the std dev (.30), I don't expect to see the order of the plays change after longer rollouts (Chuck could tell you exactly how likely it would be for the order to change), but I'll do a JF 1296-game rollout anyway. Meanwhile I'll check the short Snowie rollout (2-ply, large search space): pure: 1.069 (1.4 20.9 87.4 12.6 0.9 0.0) impure: 1.073 (2.2 23.4 86.0 14.0 1.6 0.0) Snowie agrees with Jellyfish that O is too good to double before rolling the 51, but thinks O has a double/pass after the impure play followed by X rolling any 4 but 44 or 42 (no double after 44, too good after 42). This means we can expect the cubeful results to be quite different from the cubeless ones. Back to the JF results of 1296 games: pure: .923 (0.4 17.6 87.4 12.6 0.5 0.0 std dev .006) impure: .907 (0.5 19.5 85.7 14.3 0.8 0.0 std dev .006) The difference in the plays is still only a little bit greater than the sum of the standard deviations, so the chances that an even longer rollout would change the order haven't gone down much. Before we do level 5 rollouts with settlements we need to do some thinking with the cube to set the settlement limits, which might be different for each side, complicating things. But you can also do an L5 rollout and check the cubeless results against the L6. If they disagree then no matter what you set the settlement limit the results are going to be suspect. JF L5, 1296 games: pure: .903 (0.5 15.9 87.1 12.9 0.4 0.0) std dev .023 impure: .846 (0.5 15.8 84.5 15.5 0.7 0.0) std dev .025 From this we can conclude that Jellyfish level 5 plays the pure position pretty well, but the impure position is harder to play. (That's not too surprising -- the choices are much more complicated after breaking the prime.) Because we can't get a reliable result out of L5 for the impure play, it isn't worth figuring out the correct settlement limits (other than as an example for other problems where L5 plays well enough). Before starting a long Snowie rollout we need to decide if 2-ply is adequate or if we should be using 3-ply. We can check this by doing a shorter 3-ply rollout (36 games) and comparing that to the 216 game 2-ply rollout: pure: 1.056 (2.1 18.4 88.6 11.4 0.7 0.0) impure: .997 (1.4 20.1 85.1 14.9 2.3 0.1) This is a bit surprising because we expect the impure play to be harder to play and hence gain for O when played 3-ply instead of 2-ply. However that might just be the shorter rollout, or it might be something else -- the correct play might be to play pure now and then break the 7 point next turn when the extra builder is placed better. It would take a long time to get reliable 3-ply cubeful results, so the next step is to do a 1296 game 2-ply cubeful rollout. Depending on the results of that we might do more analysis at 3-ply. Stay tuned for the next part....

Michael J. Zehr writes:

The next step is to look at the 2-ply cubeful results. (As described in the first post, having a live cube is likely to make a big difference. In general O can take risks to increase gammon chance provided O still has a cash if the risk doesn't pay off. So with a live cube, O can make a play that decreases cubeless equity but increases cubeful equity.) Snowie 2-ply, 1296 games pure: 89.85% match winning chances impure: 89.93% match winning chnaces Because the live cube increases variance, these values are too close to be statistically significant. (By way of reference, Snowie has a value of 89.1% for O winning 1 point (and of course 100 for winning 2 points), so 89.65 is like winning 1 point and winning .75/10.9 = .07 of a second point, for an equivalent equity of 1.07. 89.93 would be equivalent to 1.08. But standard deviation on 1296 games with a cube is greater than .01.) We have already seen that the 2-ply and 3-ply evaluations are quite different, so the picture might change if we do 3-ply rollouts: Snowie 3-ply, 1296 games pure: 89.85% mwc impure: 90.06% mwc Now we have equivalent equities of 1.07 and 1.09, but that isn't statiscially significant either. The simple conclusion to draw is that there isn't much of a difference between the strategies. However this is where one has to delve deeper into computer results to ferret out what is really happening. The strategies we're comparing are keeping the prime and playing purely vs. trying to force X off the anchor to win a gammon. If you look at how Snowie plays the position, it often breaks the 7point on the second play. So we're not comparing two different strategies. We're comparing trying to force X off the anchor this turn vs. waiting until next turn. That's part of the explanation for why the plays are so close in equity. With computer rollouts, not only is it important to have a consistent method for doing the rollouts, but it's also important to know how to interpret the results. One of the real challenges in doing computer rollouts is to convince the computer to try one strategy over another and follow through on that strategy for enough moves that you can get a valid comparison. In a position like this if the computer's evaluation picks 7/6 over 9/8, then rolling out the 9/8 position only forces the computer to keep the prime for one turn, and then it will break the prime anyway. On the other hand if the computer's evaluation picks 9/8 over 7/6 then you can get a better result. One third of the time the computer is forced into playing on with a broken prime, though after a miss by X it might try to remake the prime. Five years ago Kent Goulding wrote that computers weren't going to replace human analysis any time soon, and despite the huge advances in neural net technology since then, it's still true. Computers can give us lots of data, but it still takes humans to turn that into information. -Michael J. Zehr

Did you find the information in this article useful?

Do you have any comments you'd like to add?