Minimax algorithm
The agent follows this algorithm to decide its next move deterministically:
-
If a winning move is available, it plays the winning move.
-
If a winning move is not available, if a winning move is available for the opponent, it blocks that.
-
If winning move is available for neither, it checks all valid moves. If a valid move leads to a winning move for opponent in immediate next move, it removes that from consideration. If all valid moves have been removed from consideration, it backtracks and plays a random move.
-
Among the remaining valid moves, it computes the heuristic of each one with minimax and selects the ones with the highest heuristic score.
-
Among the moves with the highest heuristic score, it plays the one closest to the center column.
The heuristic consists of six components - number of times pieces occur twice, thrice and four times in a row for both the agent and the opponent.
The agent's piece occuring four times in a row is given the highest score because it is a winning position.
The opponent's pieces occuring four times in a row is given the lowest score because it is a losing position.
Function to apply minimax:
def minimax(node, depth, maximizingPlayer, mark, config):
if depth == 0 or is_terminal_node(node, config):
return get_heuristic(node, mark, config)
valid_moves = [c for c in range(config.columns) if node[0][c] == 0]
if maximizingPlayer:
value = -np.Inf
for col in valid_moves:
child = drop_piece(node, col, mark, config)
value = max(value, minimax(child, depth - 1, False, mark, config))
else:
value = np.Inf
for col in valid_moves:
child = drop_piece(node, col, 3 - mark, config)
value = min(value, minimax(child, depth - 1, True, mark, config))
return value