Backgammon Programming

Programming

Writing a backgammon program

From:   Gary Wong
Address:   gary@cs.arizona.edu
Date:   20 January 1999
Subject:   Re: Writing a bg program
Forum:   rec.games.backgammon
Google:   wtemopth5j.fsf@brigantine.CS.Arizona.EDU

Whigdon2 writes: > My intuition tells me to start at the end, and work backwards, i.e. > develop a program that plays a near perfect end game strategy, and > is knowledgeable with the cube, and then work my way backwards. I > think that I would like the program to make decisions based upon > it's own simulations, where many of the moves are based upon prior > simulations. I realize that this may be a bit idealistic of me, and > that I am likely to face some memory constraints. I believe this approach is feasible; I am working along similar lines myself. The best finished player I have is only of intermediate strength (it plays as Costello on FIBS, rated around 1600 +/- 50) but I have bits and pieces of a newer player which I expect will be significantly stronger. Using strong evaluations of subsequent positions to train evaluators of positions early in the game appears to work well in practice. In some sense, TD training can be considered as doing this; a more explicit approach I am investigating is to break the set of all positions into fine-grained classes, use a distinct evaluator for each class, establish a partial ordering on those classes, roll out a large number of positions in those classes according to the partial ordering, and use supervised training to obtain the evaluators of each other class in turn. "Knowledgeable with the cube" may or may not be important: the standard heuristics for deriving cube behaviour given cubeless evaluations seem to work adequately if you are careful. Ideally it would be nice to have a genuinely cubeful evaluator, but at present it doesn't appear to be worth the headaches (ie. correcting the absolute error in current evaluators still appears to be more fruitful than making those evaluators cubeful). Memory constraints have not been a problem in my experience: for bearoff classes I use a couple of databases mmap()ed into memory, though this could just as easily be read off disk as required; for most other classes I use neural net evaluators which need of the order of 100KB each. > I am hoping to borrow from the experiences of others. Please share your > references, ideas, or suggestions. References: ----------- There isn't a huge amount in the academic literature on backgammon playing programs (compared to chess or othello for instance), but it would be worth your time tracking down papers by Tesauro (for applying TD training to backgammon) and Berliner (general ideas about what metrics to derive from positions). There have been many articles posted here which are collectively invaluable; search on Deja News or look for archived posts at: http://www.bkgm.com/rgb/rgb.cgi?menu+programming More references on the Web are available at: http://forum.swarthmore.edu/~jay/learn-game/systems/gammon.html Some of my code can be grabbed from: ftp://ftp.cs.arizona.edu/people/gary/ (still very badly organised, sorry -- may or may not compile depending when you look at it :-) Ideas: ------ In the long term I want to take advantage of the fact that supervised training is highly parallelisable (in that almost all of the time is spent acquiring accurate training data by rolling out positions; once the training data is there it only takes a few hours to train a net from it). So far I have been generating all training data myself, but once I am able to release a player strong enough to generate accurate rollout data to anybody who wants it, I want to collect the results from those who can spare the CPU time to roll out what they can; distribute subsequent evaluators trained on that data; use them to generate subsequent results, etc. etc. Suggestions: ------------ If you want to get started quickly it might be worth taking a look at my code for administrative fluff like move generation, bearoff databases, generic neural net code etc. so you can spend more time on interesting problems. Grab a few free players (eg. xgammon, Tesauro's pubeval, mine) and use them to benchmark your player against. If using a neural net evaluator, the approach I take to training (which may be a long way from optimal, but it's very slow trying different things to see what works best!) is to start with random weights; use TD training until the network stops improving; then generate LOTS (hundreds of thousands) of rollouts and start supervised training on those. I find it very hard to manage the quantity/quality tradeoff. My approach so far has been not to go overboard on the number of trials per position: estimate the standard error you'd expect your evaluator to have after training and don't bother reducing the standard deviation of your rollouts too far below that; I suspect you'd be better spending your time rolling out more varied positions. In the past I have used no lookahead in the rollouts just so I could get a large amount of data in reasonable time; now I'm looking at using 1-ply lookahead with variance reduction, mainly to lower the _systematic_ error in the data. Neural nets are very good at ignoring statistical noise in training data, but are hurt by systematic error. My rough guess is that a standard deviation of ~0.01 from 36 or 72 rollouts per position is more than sufficient. I hope some of the above has been useful -- yell out of you want more detail in any area. Cheers, Gary. -- Gary Wong, Department of Computer Science, University of Arizona gary@cs.arizona.edu http://www.cs.arizona.edu/~gary/

Did you find the information in this article useful?

Do you have any comments you'd like to add?