Backgammon AI |
On the Construction of Evaluation
|
Originally published in Proceedings of the 6th IJCAI, Tokyo (1979), pp. 53–55. This work was sponsored by the Defense Advanced Research Projects Agency (DOD), Order No. 3597, monitored by the Air Force Avionics Laboratory Under Contract F33615-78-C-1551. |
Abstract. We present the SNAC method of encoding knowledge in a polynomial function. The most common use for such a function is to evaluate competing alternatives in a problem solving situation. The SNAC method was discovered during research on a program that plays backgammon. It has resulted in highly consistent, skilled performance, and has made adding new knowledge very easy. This has resulted in large performance increments in the backgammon program.
We show how to create sensitive evaluation functions, and how to avoid stability problems in nonlinear functions. We also demonstrate two effects, not previously found in the literature: the suicide construction, and the blemish effect. |
Contents |
When it is not reasonable to expect the solving process to search toward a terminal state, we must have recourse to knowledge to lead the way to a goal. The knowledge is in the form of “properties” or “features” that can be used to describe any state of the domain. Each such feature can take on a range of values and thus defines a dimension in a hyper-space of features. A polynomial that is the sum of various functions on these features is used to assign values to nodes and thus locate them in the hyper-space. These values are then used to order the nodes with respect to their goodness (closeness to goals of the domain). The AI literature does not contain much information about how to construct evaluation functions; only the work of Samuel (1959 and 1967) attempts to shed light on how the construction characteristics of the function (rather than the content) bear on the performance of the program using the function.
We have been developing a backgammon program since 1974 (Berliner 1977). During this time, we ran into many construction problems dealing with the lack of context provided by a linear polynomial (as Samuel had indicated), and problems of occasional erratic behavior when certain types of nonlinearity are introduced. Recently, we have developed a method which we call SNAC (for Smoothness, Nonlinearity, and Application Coefficients) which appears to have remedied all previous problems. We present details of the method below.
However, a linear function is very well behaved. Arithmetically combining two variables, each of which could have a range of 0:50, produces a resulting range of 0:2500 when multiplication is used. The contribution of such a term to the evaluation could vary widely, causing stability problems for the value of the polynomial.
Another type of problem with nonlinear functions is the following. Say we have some advice to the system in the form of S = I × D, where S is suffering, I is the intensity of pain, and D is the duration of pain. The object is to minimize the value of S (as there will be a minus sign in front of this term in the polynomial). This seems to be a well-formed piece of advice. People in general understand that the idea is to reduce both I and D. However, a program that is allowed to manipulate D, when faced with excruciating pain that is difficult to remove, may well recommend suicide. This usually does not qualify as a solution. However, such advice can be forthcoming even when some other term of the polynomial places a large premium on staying alive. We therefore term a relation where there exists the potential to manipulate one of the variables in a generally undesirable direction, a suicide construction.
Nonlinearity is desirable because of the increased sensitivity it provides, while care must be taken to control volatility and avoid the suicide construction.
The above may seem so obvious that the reader would feel that no one could possibly overlook such a thing. However, this is far from the case. There is very little published on the structure of evaluation polynomials that are used in game playing programs. One exception is the work of Samuel (1959 and 1967). Samuel investigates how to achieve nonlinearity without creating an unstable function. His solution, in both cited works, is to map a variable with a large range onto a small range. In the earlier work, he creates “binary connective terms” which are a reduction to a binary value of a range that was large to start. Clearly, this will cause the blemish effect in the vicinity where the value of the variable changes from 0 to 1. In the later work, large ranges are reduced to ranges of from 3 to 7 in order to fit more easity into a “signature table” of limited size. Again, the blemish effect will occur near the locations where the value changes occur. We conjecture that the reason that Samuel’s program did not perform better after learning nonlinear functions is that the blemish effect caused it to commit serious errors occasionally. We now are able to observe the blemish effect in the performance of older versions of our backgammon program.
Whenever the coefficient of a nonsmooth term is under the control of the program, the blemish effect can occur. Consider a chess-evaluation term that says “in an endgame the king should be centrally located, and at other times it should be located near the corners.” Let us assume the king is presently in a corner, and the “endgame” is defined as being those positions where there is less than a certain amount of material on the board. The program may then avoid swaps in material in order to consider its king’s position as “good” (non-endgame) even though the endgame is imminent. Here a step is created by the coefficient that defines endgame, and thus acts to the program’s detriment. A correct definition of endgame would be a smooth function from early middle game to late endgame. In this way, the degree of endgameness increases with each swap of material.
There are two ways of achieving this: (1) by fixing the value of the variable to be that in the original problem situation (frozen variable), i.e., not recomputing it for each new node; and (2) by choosing variables that vary very slowly (application coefficients). We have used both methods successfully.
Using the value of a frozen variable gives the problem solving process a global outlook, where all terms using this variable are viewed as they would have been in the original situation. This has the advantage of not letting small variations create too much of an effect, and suppressing volatility for large variations. It has the disadvantage of making the process insensitive to certain kinds of changes. This method is good for functions that require some discriminiation to determine the degree to which they apply, but are not required to discriminate minimal changes. This method has been used previously for efficiency reasons, when the cost of recomputing the variable at each new node is high.
The other method is to use application coefficients. An application coefficient is a variable that tends to vary slowly with operator applications (moves) due to the nature of the domain. Thus for a set of nodes that are a few operations apart, the value of the variable will tend to be relatively constrained. This results in a coefficient that will provide sensitivity without volatility. We give examples of typical applications coefficients below.
Our typical evaluation polynomial is of the form V = A1F1 + A2F2 + · · · + AnFn, where the Ai’s are application coefficients of frozen variables, and the Fi’s are functions of features of the domain. We have found that while there are usually many dozens of useful Fi’s in a domain, on the order of six or fewer applications coefficients appear to emerge. These are strongly related to ideas such as game phase, ease of winning, readiness for attack, etc.
In chess, typical application coefficients are: (1) what stage the game is in, as denoted by the amount of material on the board, and the degree to which the pieces are still in their original positions; (2) the degree to which the winning side is winning, indicated by (W − V)/(W + V), where W is the score of the winning side and V that of the losing side; and (3) the ease of being able to win, which is a function of the number of pawns left for the winning side and the ease with which they may be exchanged, and the degree to which the position is blockaded. Similar application coefficients exist in backgammon.
Such application coefficients provide a program with a great deal of context for making decisions. Thus a program that understands the stage of the game in chess, as a function of amount of material on the board, will allow its king to gradually achieve a more central position as the amount of material diminishes. Further, the suicide construction can be avoided by using a frozen variable in place of one that can be varied adversely. In the example quoted earlier, the duration of life of the subject becomes frozen, and that value must be used in all functions that could otherwise be subject to the suicide construction.
Our previous best version before using the SNAC method was called BKG 8.0. This program was the result of about 30 man-months of effort. The present version is BKG 9.7, the result of about 8 additional man-months of work. In tests on the problem book, BKG 8.0 achieved a score of 45% based on 74 problems that it could attempt. Without any pretesting, BKG 9.7 achieved 66% on the full set of 77 problems.
Against the best commercially available backgammon micro-processor, BKG 8.0 achieved 56% of the points, while BKG 9.5 (considerably inferior to BKG 9.7) scored 78% of the points in a set of 100 games. BKG 9.7 is now much too good to test against the micro-processor. Our current tests pit BKG 8.0 vs. BKG 9.7, with BKG 9.7 scoring 64% of the points.
BKG 8.0 played in the Carnegie-Mellon University backgammon championships in spring of 1978 and lost its first two matches thus being eliminated. In May 1979, BKG 9.7 played in a tournament of intermediate players in Portola Valley, California and won its first two matches before losing to the ultimate winner of the tourney in the 3rd round. The competition in the California tournament was somewhat better than that in the earlier tourney.
BKG 8.0 has been available on PDP-10 machines for some time and has been regarded as a good game playing program, and by far the best backgammon program around. Yet, as the above results indicate, the SNAC method has resulted in a rather significant improvement in the program. In evaluating the above, the reader should bear in mind that in backgammon, chance plays a significant role. Professional backgammon players will tell you that a few percentage points difference in skill is all they need to have an opponent that they can win from consistently. A 60% edge is quite extreme.
The recent improvement of the program was made possible by the SNAC method that made it possible to: (1) organize existing knowledge so that the functions are sensitive to local conditions (nonlinearity) without being subject to significant volatility; (2) avoid the blemish effect (which used to cause occasional serious errors); and (3) add new smooth knowledge functions that contribute their part without creating opportunities for new blemish effect situations.
Articles by Hans Berliner | |
Articles on programming backgammon | |
Return to: Backgammon Galore |