Theory

Forum Archive : Theory

 
Error rate--Why count forced moves?

From:   Ian Shaw
Address:   ian.shaw@iee.org
Date:   28 April 2009
Subject:   Scale of Open/Intermediaate Players
Forum:   BGonline.org Forums

> GNU reports error rate far more sensibly than Snowie does. It reports
> your average ER in milli-EMG points per nonforced move.

The discussion of which method is better is complex. Gnubg's method seems
more intuitive to most. However, Douglas Zare wrote a gammon village
article some while ago explaining why he thought the Snowie method was
superior. I don't recall his reasons, but I have a lot of respect of his
opinions.

http://www.bkgm.com/articles/Zare/NormalizingErrors/

Jason Lee  writes:

D. Zare: "Another problem with gnu's method is that your error rate can
    be  ridiculously high if you encounter few unforced decisions in a
    match. This happens frequently with the cube decisions, which makes
    gnu's cube error rate unreliable. A high error rate often indicates
    that someone had few decisions rather than that the player lost a lot
    of equity. In fact, a low cube error rate often means the player had no
    real decisions, but gnu thought the player had many unforced cube
    decisions."

Alright, technically, Doug is right.

Here's my counter argument: Please name for me the last time you heard
somebody report to you their ER on cubes in a particular match. That's
right, you've never heard it.

I keep tabs of my ER in aggregate -- so I'm not measuring averages of
averages. I keep track of checker play errors, # of checker plays, cube
errors, # of cube errors. Then in sum, I can determine my overall error
rate appropriately. Thus, it is irrelevant if I had a fat 0.086 cube error
in just one decision in the match -- that 86 millipoint per cube decision
is swallowed up in the sea of cube decisions.

    D. Zare: "If forced moves were to happen at random, and were not
    affected by previous plays, then considering only unforced moves would
    be a clear improvement. However, upon closer inspection, the
    distinction we would like to draw is not between forced and unforced
    moves. It is between trivial moves and nontrivial moves. You can't make
    an error while dancing, but you also have little real opportunity to
    make an error when you are moving your checkers around the board after
    getting closed out, when you have negligible chances to win or to get
    gammoned. It is also not a real opportunity to err if you roll an
    opening 3-1."

This problem is smoothed out in the aggregate as well. I've played over 30
practice matches against a colleague over the last year, and recorded them
all. We put them into GNU and check our error rates. Because I'm a stronger
player, I've gotten the lower ER every single time. In one memorable match,
we both got absurdly low (for us) ER, but mine was still better. He
lamented that the one time he got a great error rate that I still outplayed
him. I pointed out that part of the reason was that the entire match was
played with fairly simple positions -- for example, every single game
happened to be played to conclusion, so we "benefited" from making lots of
racing checker plays where it's hard to err. Our ER "lucked out" from easy
to play positions.

So the argument against GNU style error rates is that they can be
misleading -- error rates for a single match are misleading anyway, for the
reason I stated above and because there's a sample size problem!

GNU's ER formula is not perfect, but it specifically eliminates some of the
peculiarities of Snowie's ER.

Maik Stiebler  writes:

> So the argument against GNU style error rates is that they can be
> misleading -- error rates for a single match are misleading anyway, for
> the reason I stated above and because there's a sample size problem!

Agreed. But if we are talking about aggregate numbers, is there any
advantage to the gnubg method? I can't see that. Average gnubg error rates
should correlate very strongly with average Snowie error rates, unless we
are looking at players who are especially talented at avoiding or steering
into forced decisions.

If the evaluation were perfect, the gnubg method would give an unbiased
estimate of the equity given up per unforced play. The question is, why
would that be more helpful (as an indicator of skill, or more precisely:
successful play) than the equity given up per play? Steering into positions
that you know how to play well is part of your overall skill, and positions
with forced decisions are an extreme example of those. Or, as Zare puts it:

    "If you blitz your opponent too often at DMP and often achieve strong 5
    point boards and closeouts, then you will have many more unforced moves
    than your opponent will. It is possible to give up more equity overall,
    while gnu reports that you are giving up less equity per unforced
    move."

In this sense, the gnubg method is biased against playing styles that tend
to lead to more forced decisions. Those are more successful in the long run
than the gnubg method predicts.

Jason Lee  writes:

> Steering into positions that you know how to play well is part of your
> overall skill,

No argument there.

> and positions with forced decisions are an extreme example of those.

This might be true if a play with a forced decision is a good thing, but
many examples of forced decisions that come to mind generally are not a
good thing: dancing, trashing your board with some gawdawful double fours,
leaving double shots in bearoffs. When your decision is forced, you're
entirely at the whim of the dice, which don't think about the quality of
your position.

Rolling something for which there are many legal moves means (a) you're
going to have a harder time finding the best move, since there are so many
options, but (b) there is more likely to be a play that improves your
equity from where you were.

So, your premise is true only if, on average, rolling a number which has a
forced decision improves your equity.

Maik Stiebler  writes:

Alice and Bob play a long series of DMP matches. It turns out Bob is a
blitzer and makes Alice dance more often in the long run than the other way
round. Bob has a slightly higher Snowie style error rate, whereas Alice
has, due to her lower number of unforced decisions, the slightly higher
gnubg style error rate. Do you back Bob or Alice?

Jason Lee  writes:

I don't believe you've addressed your premise, which is that steering
towards positions with forced decisions is good. The most common forced
decision is being on the bar against a closed board. I've never heard of
any strategy discussions that have instructed me to try to get up on the
roof against a closed board under the premise that I can't err while
sitting on my duff.

Maik Stiebler  writes:

My premise is that steering towards positions with forced decisions is
better than the gnubg method says it is.

Jason Lee  writes:

I'm going take "steering" to mean intentionally giving up equity to make
forced decisions later more likely. If you replace "forced" with "easy",
then I think this is a good thing. But if you insist on keeping "forced",
then I think you're giving up equity now in order to give up more equity
later.

Maik Stiebler  writes:

My point is: Steering (and even blundering) towards positions with forced
decisions is better than the gnubg method says it is. Which does not
necessarily mean it is good.
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Theory

Derivation of drop points  (Michael J. Zehr, Apr 1998) 
Double/take/drop rates  (Gary Wong, June 1999) 
Drop rate on initial doubles  (Gary Wong, July 1998) 
Error rate--Why count forced moves?  (Ian Shaw+, Apr 2009) 
Error rates--Repeated ND errors  (Joe Russell+, July 2009) 
Inconsistencies in how EMG equity is calculated  (Jeremy Bagai, Nov 2007)  [GammOnLine forum]
Janowski's formulas  (Joern Thyssen+, Aug 2000) 
Janowski's formulas  (Stig Eide, Sept 1999) 
Jump Model for money game cube decisions  (Mark Higgins+, Mar 2012) 
Number of distinct positions  (Walter Trice, June 1997) 
Number of no-contact positions  (Darse Billings+, Mar 2004) 
Optimal strategy?  (Gary Wong, July 1998) 
Proof that backgammon terminates  (Robert Koca+, May 1994)  [Recommended reading]
Solvability of backgammon  (Gary Wong, June 1998) 
Undefined equity  (Paul Tanenbaum+, Aug 1997)  [Recommended reading]
Under-doubling dice  (Bill Taylor, Dec 1997)  [Recommended reading]
Variance reduction  (Oliver Riordan, July 2003)  [Long message]

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page