This program attempts to use Game Tree Search to build an AI for a simple game called Pawns. Pawns is a simple variant of chess, consisting of 6 pawns on each team on a 6x6 board. Each pawn can move one space forward, or attack diagonally to either side, just like in chess. Unlike in chess, the pawns cannot move two spaces forward on their first move. The game of Pans ends either when one pawn reaches the opposite side of the board, or when there are no more moves that can be made. If no moves can be made, the winner is the player with the most pawns still in play, or stalemate if the number of pawns is equal.
The main method used for this AI is Minimax Search. Minimax search is a common method used to represent an intelligent agent in a competitive game situation. It is especially effective in a zero-sum game like Pawns, where there is only one winner, and one loser. The way Minimax works is each turn, the AI will take a snapshot of the current state of the game. From there, it will look at each possible move it can make, in order to pick the best possible next move. To pick the best next move, the program has to consider counter moves the opponent can make against each option. To do this, for each potential move, the program must generate all possible counter moves, and keep the best one. The AI will simply select the potential move with the least effective counter move.
These potential moves are implemented in code using a tree structure. Each node in the tree represents a potential state the game could be in. The tree is made up of layers, alternating max and min. The max layers calculate potential moves the player can make, looking for the best; while the min layers calculate potential counter moves, and conservatively assume that the opponent will make that move. By putting the max layer at the top and alternating down, a Minimax tree is an effective way to find the ideal move to make in any situation.
Minimax is typically implemented using a depth limit, in order to ensure that the program won’t spend too much time processing. The depth limit basically represents how many turns ahead the program will consider. A depth limit of one will have a single max layer, and a single min layer, and then it will use a utility function to calculate the utility of the resulting states. As the depth limit increases, the program will be able to think further and further ahead. The AI’s performance will increase with the depth limit, but the processing time will as well.
One way to decrease the performance needs of the Minimax search is to utilize alpha-beta pruning. Alpha-beta pruning is essentially a method of pruning the search tree of unneeded branches, so that the algorithm doesn’t have to waste time looking at sub-optimal branches. It allows the program to run faster, while still maintaining completeness and optimality. It works by storing two variables: alpha and beta. Alpha represents the value of the highest max node encountered so far, while beta represents the lowest min node. Alpha starts out at the lowest possible utility value, and beta starts at the highest, and these values are passed as parameters to every node in the tree. Because they are passed as parameters, every layer in the has its own copy of alpha and beta that it is given by it’s parent node. As the state tree is searched, any time a max node’s value is greater than alpha, alpha is set locally to that node’s value. Similarly, whenever a min node’s value is less than beta, beta’s value is set to that value. Then, whenever the alpha value is greater than or equal to beta, the current branch can be pruned immediately. That’s because at these points, we know we found a max node that is greater than a previous max node, or a min node that was less than the previous min node, and so these nodes have no chance of being selected by their parents. In this way, unneeded nodes can be instantly ignored. At a high level, we can think of pruned min nodes as being a possible move with an obvious counter move. If we found one good counter move, we don’t need to keep looking for others, we already know that our move isn’t a good decision.
Alpha-beta pruning can be very effective at trimming down the size of large search trees, but the order of the nodes has a huge impact on the effectiveness of the pruning. On max nodes, if the largest value of the set of potential states is found early on, other values can be ignored. If the largest value of the set is found at the very end, no values can be pruned, because they have already been visited. The same logic applies to min nodes and lower values. This problem can be solved by adding a heuristic function to the program. This function looks at a possible state, and gives a quick estimate of the utility of the state. The list of next states is sorted according to this function (lowest first for min nodes, and highest first for max nodes), assisting the AI in pruning out more branches than it would through random selection. Some examples of heuristic functions are the number of pawns on the board, or the distance of the furthest pawn, or anything else that can give a rough heuristic of how well a player is doing.
The search space in this program is the space of all valid movements of the pawns. The state space includes all combinations of all positions each pawn can reach by moving forward and attacking diagonally. The board starts out in the initial state, in which the white pawns fill the entire top row, and the black pawns fill the entire bottom row. From there, a successor function returns all possible successor states for a specified player. After selecting a movement, the successor function is called again to find the moves available to the opposing player from the new state. In this way, all possible valid pawn configurations can be considered, but many will be discarded by alpha-beta pruning.
The successor function used is a relatively simple one. It consists of a simple loop that loops through all surviving pawns on the board. For each pawn, it calls a helper function to retrieve a list of available moves. The helper function does a simple test on the squares ahead of the specified pawn. If there is no pawn in front, the pawn is free to move one space forward. If there is an enemy diagonal in front of the pawn, it is able to move to the enemy’s space. If the pawn is blocked in front, or there is no enemy to attack, those options are not returned by the helper. After retrieving the list of positions the pawn can move to, a new board state is generated for each one, and returned in a list by the successor function.
The heuristic function was a very important part of the project. It is used to order the nodes in the Minimax tree, which influences the ordering visited, and the effectiveness of alpha-beta pruning. Initially I used relatively simple heuristics, such as number of living pawns, or the distance to the goal. I quickly found, however, that on a 6x6 game board, Minimax wouldn’t be able to reach the depth necessary to judge a difference between states early on in the game, and so the beginning moves would be essentially random.
After becoming more familiar with the game, I created a much more complex and robust heuristic function. This function consists of two parts: At first, the relative pawn distribution is calculated. In this part, the number of living pawns on both teams is counted up, and then the enemy’s pawns are subtracted from the player’s number of pawns. Then, to avoid negative numbers, the value is added to the number of squares on a side. This results in a value between 0 and 2n, where n represents the number of pawns on either side. 2n represents the player having n pawns and the enemy having 0, n represents a tie, and 0 represents the enemy having all their pawns and the main player having none. Then this value is multiplied by 1000, and added to the next part.
The second part of the heuristic function represents the number of controlled pawns that are being defended. This is calculated by looking at each pawn, and checking if it has a team mate behind it and to either side. If it does, that means that if an enemy attacks that pawn, one of it’s teammates is set up to get revenge. This value will range between 0 and n, where n is the total number of pawns. The two values are added together, resulting in the pawn value representing the most higher digits of the utility value, and the defending value in the lower digits. This results in a behavior where if the AI can find a new state where it has more pawns, it will prefer that state. If it can’t take any enemy pawns, it will focus on building defensive positioning.
The heuristic value is used as an estimate of the desirability of a state, but it is also the main metric used in the final utility value calculation as well. The only difference between what is calculated in the heuristic function and what is calculated in a terminal node is the terminal node will check to see if the game is in an end state. If so, the utility function will return constants depending on whether the game was won (Int.Max), lost (0), or a stalemate (Int.Max-1). The stalemate value was chosen because if the AI can’t find a winning state, it should prefer to force a tie to any other option.
While working on this project, I quickly found that debugging is an important part of the process. It is important to see what’s going on in the AI’s decision making process, but that is very difficult to test using recursive functions. Whenever a break point is set to check the values of a tree node, it will be hit many times by every child node as well. For this reason, I implemented a function to print the entire Minimax tree in the console, and made it so that each node will cache its own value. While a seemingly small part of the project, it had a significant affect on my workflow, allowing me to create a much better project.
My AI for Pawned appears to play the game very well. The game was able to beat me every time I played it, although that’s not a particularly impressive metric. The depth level can be pushed to 8 at the highest, resulting in a ~ two-minute wait between turns on my personal laptop. If the depth is lowered to 7, the delay is reduced to just 15 seconds. When the level is pushed down to 5 or 6, the delay is nearly instantaneous, and the AI’s performance is still top notch. The depth figure represents how many turns ahead the program is considering, so larger depth values improve performance, but cause larger computation times. More detailed speed and quality data can be found at the table at the bottom of this section.
When playing two instances of the AI against each other, with larger depth values, the white player tends to win. That’s because the white player gets the first move, so it has more time to plan it’s strategy. One unexpected finding is that the black player would occasionally win for certain depth values. I believe this is because of the defensive positioning heuristic. The AI is always attempting to position itself with as many defenders as possible, but defensive positioning can’t be maintained when every turn forces a move forward. What I believe is going on is both players are playing extremely defensively, trying to get as many pawns defending each other as possible. Because neither player is able to attack an unprotected pawn, they play defensively until one of their pawns is forced to break formation, and then the opponent will attack. Because white goes first, it is more likely to go be forced into this position, which in some cases removes it’s first player advantage. Black is especially likely to win in lower depth levels, in which the players are optimizing only 2 or 3 turns ahead, which may cause suboptimal moves in the long term.
While working on this project, I came to appreciate the importance of a good heuristic function. The heuristic essentially control’s the program’s tradeoffs between speed, depth, and the quality of guesses. A good heuristic function will ensure that the pawns will end up in a favorable state if a winning state can’t be found in time. The heuristic function will be called many times while sorting states, however, so it is important that it is efficient. I realized early on that if the state space is small, it is often better to have a simple, lightweight heuristic, and attempt to maximize the depth that can be reached. I found that if the board size was 5x5, if I could make the program run efficient enough to reach depth 13, the white player would always find it’s winning move. In this case, I simply used the number of pawns on the board as a heuristic for efficiency reasons. When I changed the board size to 6x6, however, I found that there was no way that I could search to the end state early on, and my heuristic value was leaving me with a dumb AI. I found that in this case, I had to improve the heuristic to consider more complex strategic details, which had the effect of slowing down my computation time, and forcing me to search to a shallow depth.
After experimenting with this game for a while, it is my belief that the game is likely a guaranteed win by white, if the right sequence of moves is used. This hypothesis is based on the fact that the game is an automatic win with a 5x5 board using a depth of 13. This depth figure is deceptively low, because I considered each depth number to include both a max layer and a min layer. This means that a depth of 13 would create a tree 26 levels deep. The branching factor for the problem can vary between 0-15, typically averaging around 5. If the board 6x6, on the other hand, the branching factor can be between 0-18. And will typically be around 6. The AI will also have to plan many more moves ahead to find a perfect game, requiring a much greater depth. Because branching factor and the depth have an exponential relationship, I don’t believe a perfect solution can feasible be computed on the resources I have available.
To run the program, run the jar file on the command line: $ java -jar ./Minimax.jar