Programming

Forum Archive : Programming

 
Building and training a neural-net player

From:   Brian Sheppard
Address:   bsheppard@hasbro.com
Date:   27 August 1998
Subject:   Re: 5-prime vs. 5-prime
Forum:   rec.games.backgammon
Google:   6s3s0k$dph$1@nnrp1.dejanews.com

Gary Wong wrote:
> It's only a little stronger (of the order of 0.1ppg cubeless after
> 100,000 training games).

PUBEVAL is surprisingly resilient. Of course you can win almost 1.0 ppg
against it if you adopt intelligently defined anti-PUBEVAL tactics, but if
you just train a neural network to play well and pit it against PUBEVAL you
won't do as well.

My network scores around 0.6 ppg (cubeless) versus PUBEVAL. I believe this
is about as well as you can do. (If you read the literature you may have
read that HCGAMMON wins 85% against PUBEVAL, but that turns out to be
erroneous.)

> I haven't yet been patient enough to give it a decent amount of training
> because I keep coming up with ideas for new inputs and end up starting
> again...

This is a reasonable plan, but just be sure that your inputs really make a
difference. You need nothing beyond the raw input description (as defined
by Tesauro), priming (as defined by Berliner), and shot-counting. These
alone are sufficient to play at championship caliber when combined with 1
or 2 ply search engine.

My recommendation is to pare down your inputs to these alone, then debug
those terms exhaustively. Check the value of these terms in every
conceivable natural and unnatural situation. Then train your network to
exhaustion.

Then profile your program's play using test cases. Where does the program
make a mistake? It helps a lot to profile many test cases, so that you can
prioritize your attack on the most-significant type of error. Once you
choose a respresentative test case to work on, you need a procedure that
often results in improving the program.

Every mistaken checkerplay results in two position to analyze, which I call
Predicted (the right move) and Actual (the program's move). Predicted and
Actual will differ in certain input terms. Verify that the input terms are
correctly computed in every case. Correct any bugs, and proceed either to
retrain the evaluation or to choose another test case, depending on the
extent of the changes you have made.

If the inputs are computed correctly, then you must extend the input space
in order to give the program a chance to correct its mistake. To design an
input term that has the best chance of making progress, follow these
guidelines:

  1) Never include a term that is a linear combination of existing terms.
  Such terms (e.g. inner board point count, number of men borne off, pip
  count, number of anchors, and many other backgammon concepts) can be
  synthesized by the neural network if they are relevant. You need to add
  something new to the program.

  2) Make sure that the term you add distinguishes Actual and Predicted. If
  it does not, then the chance you will solve this problem is very slight.

  3) Make sure that every term has a tactical basis in backgammon theory.
  The simplest way is to encapsulate something that would ordinarily
  require lookahead to discover. For example, you can create a term that
  estimates the number of return shots. Or blot exposure in bearoffs. Or
  racing equity.

  4) Make sure your term has broad generality, but no broader than
  appropriate.

  5) Give your term a natural scale. By this I mean that the term has a
  range roughly comparable to the other inputs in the system. (The downside
  of not doing this is that some weights in your network will train very
  slowly compared to others.)

> has anybody had any luck
> experimenting with somehow saving (at least some of) the information
> in previously trained weights while introducing new inputs (or hidden
> nodes, for that matter)?

Of course! I just initialize new weights to small random numbers.

> Is initialising the new weights to random
> values while maintaining the existing weights, and starting training
> all over (back to the initial high alpha, etc.) reasonable?

Never restart training from scratch.

At the very least keep a record of the games you played while training the
first time, so that you can train those games without generating them.
Gennerating a game is 20 times as expensive as training on it.

Don't open alpha all the way back up. It might make sense to increase it a
little, but not much. You will probably undo your progress by opening it up
a lot.

Remember that increasing alpha has the effect of weighing future experience
more heavily than past experience. This makes sense if your network (which
encapsulates past experience) is inexperienced. But when you make a minor
change to an experienced network, it is probably a bad idea to weight the
future much more heavily.

> Does maintaining different values of alpha for different weights help?

Different learning rates help in theory. One point is that if two inputs
differ radically in scale you can use a large learning rate for the input
that has a bigger scale. But in practice it matters little provided that
you give your inputs roughly comparable scale.

Another way that varying learning rates helps is in identifying inputs that
do not matter. For example, if you have an adaptive algorithm that varies
learning rates according to the accuracy gained from adjusting that input,
then the learning rates for irrelevant inputs rapidly drive to 0. Sutton
wrote a paper entitled "Gain Adaptation beats Least Squares" (or something
like that) in which he describes such an algorithm. However, it doesn't
work in practice because it is much slower than backprop. Backpropagation
is fast; adaptive-gain algorithms can take many times as long (Sutton's
algorithm takes 13 times as much time) and that is a huge disadvantage.

The concept was so attractive that I also tried a variant of RProp. It,
too, proved slower than backprop.

My judgement: discard the whole idea.

> How much training does it really take to get an idea of how effective
> your inputs are -- is it reasonable to train different networks (ie.
> networks with different inputs) with a limited number of hidden nodes
> over a short number of training games, and then take the "best" inputs
> and expect them to perform well in a network with lots of hidden nodes
> and extensive training?

I think the procedure you described will work, but it can't be the fastest
way to make progress. Surely you are better off using your expert judgment!

There are descriptions in the literature of constructive methods for
identifying neural network inputs. I haven't found any of them to be
useful.

The method you are describing sounds a lot like coevolution or genetic
engineering of neural inputs. This may work, but you are certainly better
on your own.

> What's a good value for lambda?

Zero.

Theoretical models suggest that lambda should be 1/branching-factor, which
would be zero to 6%, depending on how you meansure branching factor. Anyway
it hardly matters because lambda==0 is a good approxximation to
lambda==0.05.

> Does combining MC and TD training (ie. TD(lambda) at each step followed
> by a TD(1) on the terminal step) buy you anything?  What about MC vs. TD
> in general?

TD is better. MC does not learn enough about intermediate stages.

Don't combine MC and TD. The impact of MC is simply to weaken training.

> Is using lookahead evaluations during training ever worth
> the expense?

The answer is no if you mean that you train each position on the 1-ply
lookahead value. Doing a 1-ply lookahead costs you a factor of 21 in
training speed, but you only gain about a factor of 5 in reduction of
training noise over TD. A clear loss.

However, if you reuse the 1-ply search value for at least 5 training
iterations, then you have a clear gain. In practice this is easy to do. But
then you are not using TD, but rather supervised training. I maintain that
supervised training is the right way to go as soon as you have a player of
expert caliber that you can use in a 1-ply search.

> It only takes 10 seconds to come up with another question,
> but days or weeks to answer it...

I can answer your questions in a day, since I have made all these mistakes
myself!

> > In fact, if you choose checker plays using rollouts from the weakest
> > neural network you achieve a skill level at least equal to 3-ply
> > searching.
>
> What kind of positions does that apply to?  In non-contact positions I
> believe you unconditionally.  But in positions the net misplays
> (especially if it plays one side worse than the other), how much faith do
> you need to have in the original evaluator before you have reasonable
> trust in its rollouts?

Pretty much any position.

You need a very good checkerplayer to have confidence in the abolute equity
value returned by a rollout, but the same does not apply to checkerplay
decisions. The great majority of checkerplays are decided by near-term
phenomena that will be apparent in a rollout.

Warm regards,
Brian Sheppard
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Programming

Adjusting to a weaker opponent  (Brian Sheppard, July 1997) 
Anticomputer positions  (Bill Taylor+, June 1998) 
BKG 9.8 vs. Villa  (Raccoon+, Aug 2006) 
BKG 9.8 vs. Villa  (Andreas Schneider, June 1992) 
BKG beats world champion  (Marty Storer, Sept 1991) 
Backgames  (David Montgomery+, June 1998)  [Long message]
Blockading feature  (Sam Pottle+, Feb 1999)  [Long message]
Board encoding for neural network  (Brian Sheppard, Feb 1997) 
Bot weaknesses  (Douglas Zare, Mar 2003) 
Building and training a neural-net player  (Brian Sheppard, Aug 1998) 
How to count plies?  (Chuck Bower+, Jan 2004)  [GammOnLine forum]
How to count plies?  (tanglebear+, Mar 2003) 
Ideas for improving computer play  (David Montgomery, Feb 1994) 
Ideas on computer players  (Brian Sheppard, Feb 1997) 
Introduction  (Gareth McCaughan, Oct 1994) 
Measuring Difficulty  (John Robson+, Feb 2005)  [GammOnLine forum]
Methods of encoding positions  (Gary Wong, Jan 2001) 
N-ply algorithm  (eXtreme Gammon, Jan 2011) 
Neural net questions  (Brian Sheppard, Mar 1999) 
Pruning the list of moves  (David Montgomery+, Feb 1994) 
Search in Trees with Chance Nodes  (Thomas Hauk, Feb 2004) 
Source code  (Gary Wong, Dec 1999) 
TD-Gammon vs. Robertie  (David Escoffery, June 1992) 
Training for different gammon values  (Gerry Tesauro, Feb 1996) 
Training neural nets  (Walter Trice, Nov 2000) 
Variance reduction in races  (David Montgomery+, Dec 1998)  [Long message]
Variance reduction of rollouts  (Michael J. Zehr+, Aug 1998)  [Long message]
Variance reduction of rollouts  (Jim Williams, June 1997) 
What is a "neural net"?  (Gary Wong, Oct 1998) 
Writing a backgammon program  (Gary Wong, Jan 1999) 

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page