Backgammon Theory

Theory

Error rate--Why count forced moves?

From:   Ian Shaw
Address:   ian.shaw@iee.org
Date:   28 April 2009
Subject:   Scale of Open/Intermediaate Players
Forum:   BGonline.org Forums

> GNU reports error rate far more sensibly than Snowie does. It reports > your average ER in milli-EMG points per nonforced move. The discussion of which method is better is complex. Gnubg's method seems more intuitive to most. However, Douglas Zare wrote a gammon village article some while ago explaining why he thought the Snowie method was superior. I don't recall his reasons, but I have a lot of respect of his opinions. http://www.bkgm.com/articles/Zare/NormalizingErrors/

Jason Lee writes:

D. Zare: "Another problem with gnu's method is that your error rate can be ridiculously high if you encounter few unforced decisions in a match. This happens frequently with the cube decisions, which makes gnu's cube error rate unreliable. A high error rate often indicates that someone had few decisions rather than that the player lost a lot of equity. In fact, a low cube error rate often means the player had no real decisions, but gnu thought the player had many unforced cube decisions." Alright, technically, Doug is right. Here's my counter argument: Please name for me the last time you heard somebody report to you their ER on cubes in a particular match. That's right, you've never heard it. I keep tabs of my ER in aggregate -- so I'm not measuring averages of averages. I keep track of checker play errors, # of checker plays, cube errors, # of cube errors. Then in sum, I can determine my overall error rate appropriately. Thus, it is irrelevant if I had a fat 0.086 cube error in just one decision in the match -- that 86 millipoint per cube decision is swallowed up in the sea of cube decisions. D. Zare: "If forced moves were to happen at random, and were not affected by previous plays, then considering only unforced moves would be a clear improvement. However, upon closer inspection, the distinction we would like to draw is not between forced and unforced moves. It is between trivial moves and nontrivial moves. You can't make an error while dancing, but you also have little real opportunity to make an error when you are moving your checkers around the board after getting closed out, when you have negligible chances to win or to get gammoned. It is also not a real opportunity to err if you roll an opening 3-1." This problem is smoothed out in the aggregate as well. I've played over 30 practice matches against a colleague over the last year, and recorded them all. We put them into GNU and check our error rates. Because I'm a stronger player, I've gotten the lower ER every single time. In one memorable match, we both got absurdly low (for us) ER, but mine was still better. He lamented that the one time he got a great error rate that I still outplayed him. I pointed out that part of the reason was that the entire match was played with fairly simple positions -- for example, every single game happened to be played to conclusion, so we "benefited" from making lots of racing checker plays where it's hard to err. Our ER "lucked out" from easy to play positions. So the argument against GNU style error rates is that they can be misleading -- error rates for a single match are misleading anyway, for the reason I stated above and because there's a sample size problem! GNU's ER formula is not perfect, but it specifically eliminates some of the peculiarities of Snowie's ER.

Maik Stiebler writes:

> So the argument against GNU style error rates is that they can be > misleading -- error rates for a single match are misleading anyway, for > the reason I stated above and because there's a sample size problem! Agreed. But if we are talking about aggregate numbers, is there any advantage to the gnubg method? I can't see that. Average gnubg error rates should correlate very strongly with average Snowie error rates, unless we are looking at players who are especially talented at avoiding or steering into forced decisions. If the evaluation were perfect, the gnubg method would give an unbiased estimate of the equity given up per unforced play. The question is, why would that be more helpful (as an indicator of skill, or more precisely: successful play) than the equity given up per play? Steering into positions that you know how to play well is part of your overall skill, and positions with forced decisions are an extreme example of those. Or, as Zare puts it: "If you blitz your opponent too often at DMP and often achieve strong 5 point boards and closeouts, then you will have many more unforced moves than your opponent will. It is possible to give up more equity overall, while gnu reports that you are giving up less equity per unforced move." In this sense, the gnubg method is biased against playing styles that tend to lead to more forced decisions. Those are more successful in the long run than the gnubg method predicts.

Jason Lee writes:

> Steering into positions that you know how to play well is part of your > overall skill, No argument there. > and positions with forced decisions are an extreme example of those. This might be true if a play with a forced decision is a good thing, but many examples of forced decisions that come to mind generally are not a good thing: dancing, trashing your board with some gawdawful double fours, leaving double shots in bearoffs. When your decision is forced, you're entirely at the whim of the dice, which don't think about the quality of your position. Rolling something for which there are many legal moves means (a) you're going to have a harder time finding the best move, since there are so many options, but (b) there is more likely to be a play that improves your equity from where you were. So, your premise is true only if, on average, rolling a number which has a forced decision improves your equity.

Maik Stiebler writes:

Alice and Bob play a long series of DMP matches. It turns out Bob is a blitzer and makes Alice dance more often in the long run than the other way round. Bob has a slightly higher Snowie style error rate, whereas Alice has, due to her lower number of unforced decisions, the slightly higher gnubg style error rate. Do you back Bob or Alice?

Jason Lee writes:

I don't believe you've addressed your premise, which is that steering towards positions with forced decisions is good. The most common forced decision is being on the bar against a closed board. I've never heard of any strategy discussions that have instructed me to try to get up on the roof against a closed board under the premise that I can't err while sitting on my duff.

Maik Stiebler writes:

My premise is that steering towards positions with forced decisions is better than the gnubg method says it is.

Jason Lee writes:

I'm going take "steering" to mean intentionally giving up equity to make forced decisions later more likely. If you replace "forced" with "easy", then I think this is a good thing. But if you insist on keeping "forced", then I think you're giving up equity now in order to give up more equity later.

Maik Stiebler writes:

My point is: Steering (and even blundering) towards positions with forced decisions is better than the gnubg method says it is. Which does not necessarily mean it is good.

Did you find the information in this article useful?

Do you have any comments you'd like to add?