Snowie

Forum Archive : Snowie

 
Using rollouts

From:   Michael J. Zehr
Address:   michaelz@michaelz.com
Date:   14 October 1998
Subject:   Re: Rollouts
Forum:   rec.games.backgammon
Google:   362557C5.86D36F13@michaelz.com

Andrew Bokelman wrote:
> ... let me ask you this.  Given the position below, how would you roll
> out what Snowie listed as the top three choices.
>
> --------------------------------------------------------------------
> |                    AndrewB (X) vs. Snowie (O)                    |
> |                          9 point Match                           |
> --------------------------------------------------------------------
>
> Match to 9. Score X-O: 3-7
>
>   -------------------------- Move 16 O -------------------------
>           O to play (5 1)
>           +-1--2--3--4--5--6--------7--8--9-10-11-12-+
>           | O     X  O  O  O |   |  O  O  O          |
>           |       X  O  O  O |   |  O  O  O          |
>           |       X          |   |        O          | S
>           |                  |   |                   | n
>           |                  |   |                   | o
>           |                  |BAR|                   | w
>           |    6             | O |                   | i
>           | X  X             |   |                   | e
>           | X  X             |   |                   |
>           | X  X  X          |   |                   |
>           | X  X  X          |   |                   |
>           +24-23-22-21-20-19-------18-17-16-15-14-13-+
>           Pipcount  X:  88  O: 113  X-O: 3-7/9 (6)
>           CubeValue:  1
>
>         * 1. 2  bar/20 7/6                         1.028
>           2. 2  bar/20 9/8                         0.953 (-0.076)
>           3. 2  bar/19                             0.943 (-0.086)
>           4. 1  bar/20 8/7                         0.872 (-0.157)
>           5. 1  bar/20 6/5                         0.839 (-0.189)

The first thing I always do in analyzing a position is to get the best
evaluation possible with the tool, and only then decide if it needs
rollouts.  If I had a second tool in my toolbox (Jellyfish) then I
usually use that as well:

              S 3-ply  J L7
1. bar/20 7/6  .991    .862
2. bar/20 9/8  .989    .883
3. bar/19      .976    .888
4. bar/20 6/5  .911    .744
5. bar/20 8/7  .888    .800
6. bar/20 5/4  .873    .796

Most of the time I'd call these differences too small to be worth doing
a rollout.  However this position is different because there are two
very different strategies: come home safely and break points eventually;
or voluntarily break the prime and try to force X off the anchor, hoping
to close out 1-4 checkers eventually.  (In other words, you gave us a
great sample position!)

This might make a good reference position for these strategies since the
evaluations are close.  (The rollouts might show huge differences
however.)  But we have to be aware that rollouts are not always the best
tool for such analysis.  If one strategy is better than another and the
neural nets evaluate it incorrectly, the best we can do is force a
neural net to make the desired play on the first move.  For example,
jellyfish might remake the 7point on the next play when we roll out
bar/20
7/6, and Snowie might break the 7point next turn when we roll out bar/20
9/8.

Next step in the analysis would be to pick one play from each of the
candidate strategies.  Later I might want to compare the choices for
each strategy, but I can do that after determining which strategy I want
to examine more closely.  Just eyeballing the plays I'd choose bar/20
9/8 and bar/20 7/6.  (If "eyeballing" isn't a good enough justification,
consider these reasons: 9/8 prepares to eventually break the 9 without
having to break a point with a 6, so that builder is in a better
position on the 8 than the 9.  That's why I'd prefer that to bar/19.
Either one leaves one slightly awkward roll (55 or 66).  For the other
choice, I wouldn't break the 5 or 6 because I want those for when I hit
X.  Given that I don't want to break the 5 or 6 but I'm willing to break
an outside point, 7/6 gives me an extra builder.  Furthermore 7/6 gives
a chance to hit a 4th blot on a 42, whereas 51 doesn't expose a blot
after 8/7.  After 7/6 X can't get a blot to safety on a
44, but after 8/7 he can on a 55.  (It might not be right to run a blot
all the way, but unless I'm positive it's wrong, why give X that
choice?))

As part of analyzing this I note that the 2-ply and 3-ply evaluations
show a big change.  Because the ordring of plays is the same a 2-ply
rollout is probably okay, but I make a mental note to consider a 3-ply
rollout later.  And since I'm still in the early stages of deciding how
to do this I'll do some short rollouts first before doing the longer
ones.  (I agree with Chuck's statements about 1296 being a minimum
length rollout for any serious analysis.  But I believe the real art and
science of doing rollouts is interpreting the results afterwards.
Anyone can hit a problem over the head with a 1296-pound hammer, but
knowing whether you have data or information (i.e. whether the results
are only precise or if they're accurate) is much more difficult and
requires a lot of experience.

I'll start with short (216 game) rollouts just to get a feel for whether
I'm getting anything useful out of the computer before running an
overnight job:

JF L6 pure:   .950 (0.5  18.3  88.3  11.7  0.3  0.0) std dev .014
JF L6 impure: .915 (0.4  20.1  85.8  14.2  0.5  0.0) std dev .016

If I were using only Jellyfish I'd have to do my own analysis of how
the cube changes things (since JF doesn't have true live cube
rollouts).  With the pure play JF says O is too good to double, but no
longer has a double after 7/6 and X rolling a 4.  Since after the pure
play the equity will change very slowly, O is likely to be able to cash
a fair number of the 11.7% losses.  But after the impure play, many of
the 14.2% losses come after X rolls an immediate 4, and now it's X that
gets benefit of the cube.  So I would expect to see the pure play be
that much more ahead of the impure play with the cube live.

Since the difference in equity between the two plays is .35, which is
greater than the sum of the std dev (.30), I don't expect to see the
order of the plays change after longer rollouts (Chuck could tell you
exactly how likely it would be for the order to change), but I'll do a
JF 1296-game rollout anyway.

Meanwhile I'll check the short Snowie rollout (2-ply, large search
space):

pure:   1.069  (1.4 20.9 87.4 12.6 0.9 0.0)
impure: 1.073  (2.2 23.4 86.0 14.0 1.6 0.0)

Snowie agrees with Jellyfish that O is too good to double before rolling
the 51, but thinks O has a double/pass after the impure play followed by
X rolling any 4 but 44 or 42 (no double after 44, too good after 42).
This means we can expect the cubeful results to be quite different from
the cubeless ones.

Back to the JF results of 1296 games:
pure:   .923  (0.4 17.6 87.4 12.6 0.5 0.0 std dev .006)
impure: .907  (0.5 19.5 85.7 14.3 0.8 0.0 std dev .006)


The difference in the plays is still only a little bit greater than the
sum of the standard deviations, so the chances that an even longer
rollout would change the order haven't gone down much.

Before we do level 5 rollouts with settlements we need to do some
thinking with the cube to set the settlement limits, which might be
different for each side, complicating things.  But you can also do an L5
rollout and check the cubeless results against the L6.  If they disagree
then no matter what you set the settlement limit the results are going
to be suspect.

JF L5, 1296 games:
pure:    .903  (0.5 15.9 87.1 12.9 0.4 0.0) std dev .023
impure:  .846  (0.5 15.8 84.5 15.5 0.7 0.0) std dev .025

From this we can conclude that Jellyfish level 5 plays the pure position
pretty well, but the impure position is harder to play.  (That's not too
surprising -- the choices are much more complicated after breaking the
prime.)

Because we can't get a reliable result out of L5 for the impure play, it
isn't worth figuring out the correct settlement limits (other than as an
example for other problems where L5 plays well enough).

Before starting a long Snowie rollout we need to decide if 2-ply is
adequate or if we should be using 3-ply.  We can check this by doing a
shorter 3-ply rollout (36 games) and comparing that to the 216 game
2-ply rollout:

pure:   1.056  (2.1 18.4 88.6 11.4 0.7 0.0)
impure:  .997  (1.4 20.1 85.1 14.9 2.3 0.1)

This is a bit surprising because we expect the impure play to be harder
to play and hence gain for O when played 3-ply instead of 2-ply.
However that might just be the shorter rollout, or it might be something
else -- the correct play might be to play pure now and then break the 7
point next turn when the extra builder is placed better.

It would take a long time to get reliable 3-ply cubeful results, so the
next step is to do a 1296 game 2-ply cubeful rollout.  Depending on the
results of that we might do more analysis at 3-ply.

Stay tuned for the next part....

Michael J. Zehr  writes:

The next step is to look at the 2-ply cubeful results.  (As described in
the first post, having a live cube is likely to make a big difference.
In general O can take risks to increase gammon chance provided O still
has a cash if the risk doesn't pay off.  So with a live cube, O can make
a play that decreases cubeless equity but increases cubeful equity.)

Snowie 2-ply, 1296 games
pure:   89.85% match winning chances
impure: 89.93% match winning chnaces

Because the live cube increases variance, these values are too close to
be statistically significant.  (By way of reference, Snowie has a value
of 89.1% for O winning 1 point (and of course 100 for winning 2 points),
so 89.65 is like winning 1 point and winning .75/10.9 = .07 of a second
point, for an equivalent equity of 1.07.  89.93 would be equivalent to
1.08.  But standard deviation on 1296 games with a cube is greater than
.01.)

We have already seen that the 2-ply and 3-ply evaluations are quite
different, so the picture might change if we do 3-ply rollouts:

Snowie 3-ply, 1296 games
pure:   89.85% mwc
impure: 90.06% mwc

Now we have equivalent equities of 1.07 and 1.09, but that isn't
statiscially significant either.

The simple conclusion to draw is that there isn't much of a difference
between the strategies.  However this is where one has to delve deeper
into computer results to ferret out what is really happening.  The
strategies we're comparing are keeping the prime and playing purely
vs. trying to force X off the anchor to win a gammon.

If you look at how Snowie plays the position, it often breaks the 7point
on the second play.  So we're not comparing two different strategies.
We're comparing trying to force X off the anchor this turn vs. waiting
until next turn.  That's part of the explanation for why the plays are
so close in equity.


With computer rollouts, not only is it important to have a consistent
method for doing the rollouts, but it's also important to know how to
interpret the results.  One of the real challenges in doing computer
rollouts is to convince the computer to try one strategy over another
and follow through on that strategy for enough moves that you can get a
valid comparison.

In a position like this if the computer's evaluation picks 7/6 over
9/8, then rolling out the 9/8 position only forces the computer to keep
the prime for one turn, and then it will break the prime anyway.  On the
other hand if the computer's evaluation picks 9/8 over 7/6 then you can
get a better result.  One third of the time the computer is forced into
playing on with a broken prime, though after a miss by X it might try to
remake the prime.

Five years ago Kent Goulding wrote that computers weren't going to
replace human analysis any time soon, and despite the huge advances in
neural net technology since then, it's still true.  Computers can give
us lots of data, but it still takes humans to turn that into
information.

-Michael J. Zehr
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Snowie

Announcement  (Olivier Egger, Apr 1998) 
Checker-play-according-to-score bug  (Peter Schneider+, June 2001) 
Error rates  (Gregg Cattanach, Oct 2000) 
Hints and questions  (Achim Müller+, Aug 1998) 
Luck calculation  (Gregg Cattanach+, Dec 1999) 
Questions and answers  (David Montgomery, Dec 1998) 
Running in low priority  (lmfback+, Oct 2004) 
Snowie 4.0  (SnowieGroup Info, Oct 2002) 
Snowie 4.3 update  (Gregg Cattanach, July 2005)  [GammOnLine forum]
Snowie cube evaluation  (Kit Woolsey, Sept 2007)  [GammOnLine forum]
Snowie vs GNU  (Stanley E. Richards+, Oct 2005)  [GammOnLine forum]
Snowie vs. Jellyfish  (Mark Driver, Apr 2001) 
Snowie vs. Jellyfish  (Daniel Murphy, Oct 2000) 
Snowie vs. Jellyfish  (Gregg Cattanach+, Sept 2000) 
Snowie vs. Jellyfish  (Wayne Crookes, Jan 1999) 
Snowie vs. Jellyfish  (Kenneth M. Arnold+, May 1998)  [Long message]
Terminology  (Alexander Nitschke, Sept 1998) 
Using rollouts  (Michael J. Zehr+, Oct 1998)  [Long message]

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page