Applied Math & Computer Science Lab
Data Analysis, Optimization & Mathematical Modeling, Artificial Intelligence, Neural Net For Everyday Life Applications
Artificial Intelligence/Data Mining Links Webmaster Resources AMCSL Forum: Web Mining Submit Link Archive

Reinforcement Learning for Tic-Tac-Toe



Introduction

In this project reinforcement learning is used for implementing Tic-Tac-Toe game. There are two scripts for learning tt.cgi and for playing play.cgi. The script for playing is using values obtained during the learning and saved in the text file. You can visit the working demo at Tic-Tac-Toe Game. Here is the latest version for Tic Tac Toe game program.

Tic Tac Toe game widget now can be created online within a few minutes. Try Create TTT Game Widget now.

Many different visual settings can be customized and tested without any programming. In addition to this some other options can be customized. For example you can create and train the game with 4x4 number of cells.
Thus you can create many different games.

You can also train it to play online. The created game can be used as widget on your site or can be posted on the list on this site. Here are some examples: Tic Tac Toe Games

Learning

Learning can be done by running perl script tt.cgi from command prompt.
After results of learning are saved and transfered to web server someone can play with the program over the web.
Learning is using a simple formula for update:
V(s)=V(s)+alpha(V(s')-V(s))
where s is state after move, s' is state before move
The learned values are saved in the the file level1.txt in the format s:V(s)
The board position s is represented by string of 9 digits where 0 for 0 , 1 for RL program or x and 2 for emty cell
During the learning the program counts the number of wins for each 500 games and prints all results in the end. That number always bigger in the end then in the beginning because the program starts without any knowledge how to play. (See table from Observations section)

Playing

After learning is completed the file with values should be copied to directory with the script play.cgi. The player always is doing move first by clicking on the cell and the script will mark that position and also make own move.



Observations

By increasing the number of games in learning we can improve learning results however there is limit. After 40000 number of wins for each 500 games during of learning is almost the same. But perfomance of such agent in the game with real opponent is bad. The demo is using agent trained in 400000 games. The perfomance is better. Learning can also be improved by changing alpha. Here is example of results of learning for alpha=0.02 , 0.4 and alpha=0.07-0.06*n/N

alpha\N50010001500200025003000350040004500500055006000650070007500800085009000950010000
0.02272 317 346 375 398 437 452 457 448 447 455 445 444 455 455 446 454 452 438 455
0.4290 339 374 380 376 383 380 406 394 410 405 412 414 411 426 405 410 395 412 404
0.07-0.06*n/N271 318 391 438 438 434 439 462 440 447 451 447 444 448 456 450 440 447 456 457
Thus we saw the example how reinforcement learning can be used for training agent to play in game programming.

Perl Source Code and Related Links for Tic Tac Toe Game

Data file (learned values from training to play)
Perl Source Code for Tic-Tac-Toe Game
Working Demo: Tic-Tac-Toe Game
Get Tic-Tac-Toe Widget Now
Comments, suggestions

Links

Applied Math and Computer Science Lab




Add Me!      
ScriptSearch.com
Books!
Ebooks!
freebies