Minimax algorithm

The agent follows this algorithm to decide its next move deterministically:

If a winning move is available, it plays the winning move.
If a winning move is not available, if a winning move is available for the opponent, it blocks that.
If winning move is available for neither, it checks all valid moves. If a valid move leads to a winning move for opponent in immediate next move, it removes that from consideration. If all valid moves have been removed from consideration, it backtracks and plays a random move.
Among the remaining valid moves, it computes the heuristic of each one with minimax and selects the ones with the highest heuristic score.
Among the moves with the highest heuristic score, it plays the one closest to the center column.

The heuristic consists of six components - number of times pieces occur twice, thrice and four times in a row for both the agent and the opponent.

The agent's piece occuring four times in a row is given the highest score because it is a winning position.

The opponent's pieces occuring four times in a row is given the lowest score because it is a losing position.

Function to apply minimax:

def minimax(node, depth, maximizingPlayer, mark, config):
    if depth == 0 or is_terminal_node(node, config):
        return get_heuristic(node, mark, config)
    valid_moves = [c for c in range(config.columns) if node[0][c] == 0]
    if maximizingPlayer:
        value = -np.Inf
        for col in valid_moves:
            child = drop_piece(node, col, mark, config)
            value = max(value, minimax(child, depth - 1, False, mark, config))
    else:
        value = np.Inf
        for col in valid_moves:
            child = drop_piece(node, col, 3 - mark, config)
            value = min(value, minimax(child, depth - 1, True, mark, config))
    return value