Backgammon Programming

Programming

Variance reduction of rollouts

From:   Jim Williams
Address:   jimw@giga-net.com
Date:   11 June 1997
Subject:   Proposed Algorithm for Roll Outs
Forum:   rec.games.backgammon
Google:   339F21F8.3585@giga-net.com

The classic algorithm for doing roll outs is to start in position which is to be analyzed. From this position, the game is repeatedly rolled out to completion. A equity value is assigned to the completion result. The equity of the starting position is approximated by the average value of the completion equities. The implicit assumption is that the best move is always made, but subject to this assumption the average value of the completion equity will converge stochastically to the true equity of the starting position. The proposed algorithm will converge to the same limit, but can potentially converge much faster. This algorithm assumes the existance of an evaluation function which approximates the equity of a given position. The important feature of this algorithm is that the better the evaluation function is, the faster the convergence will be, but no matter how inaccurate the evaluation function is, the convergence limit is unaffected. For any give position, the "disparity" of that position is computed as follows. For each possible dice roll, the best move is determined and the evaluation function is applied to the resulting position. The disparity is then equal to the weighted average of the evaluations of all these resulting positions minus the evalution of the original position. The better the evaluation funcation is, the smaller the disparity will be. The roll out proceeds normally, but the result of each trial is equal to the equity value determined by applying the evalution function to the original position plus the sum of the disparities of all the positions that occurred in the roll out (including the original position). The proof of correctness is inductive. It is assumed that the evaluation function correctly evaluates a "game over" position. The expected value of the roll out result of a position is equal to the weighted average of the expected values of the roll outs of each of the next possible positions, plus the evaluation of the current position, minus the weighted average of the evaluations of the next postions, plus the disparity of the current position. This in turn is equal to just the weighted average of the expected values of the roll outs of the next positions. These are correct by the induction hypothesis.

Did you find the information in this article useful?

Do you have any comments you'd like to add?