Basic Agent Script

The central element of a submission is the agent python script. Its structure is always composed by two main parts: the preparation step, where the agent and the environment setup is completed, and the interaction loop, where the classical agent-environment interaction happens.

After a first call to the reset method, you start iterating alternating action selection and environment stepping, resetting the environment when the episode is done. That’s it, we take care of the rest.

There is one thing that is worth noticing: since we want your submission to be the same no matter how many episodes are needed to evaluate it, you need to implement the while loop in a way that it keeps iterating indefinitely (while True:) and only exits (the break statement) when the value info["env_done"] is true. This value is set by us and used to let the agent know that the evaluation has been completed. In this way, the same script can be used to run evaluations made of 3, 5, 10 or whatever number of episodes you want, without changing a single line.

#!/usr/bin/env python3
import diambra.arena
from diambra.arena import SpaceTypes, Roles, EnvironmentSettings
from diambra.arena.utils.gym_utils import available_games
import random
import argparse

def main(game_id="random", test=False):
    game_dict = available_games(False)
    if game_id == "random":
        game_id = random.sample(game_dict.keys(),1)[0]
        game_id = opt.gameId if opt.gameId in game_dict.keys() else random.sample(game_dict.keys(),1)[0]

    # Settings
    settings = EnvironmentSettings()
    settings.step_ratio = 6
    settings.frame_shape = (128, 128, 1)
    settings.role = Roles.P2
    settings.difficulty = 4
    settings.action_space = SpaceTypes.MULTI_DISCRETE

    env = diambra.arena.make(game_id, settings)
    observation, info = env.reset()

    while True:
        action = env.get_no_op_action()
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated or truncated:
            observation, info = env.reset()
            if info["env_done"] or test is True:

    # Close the environment

    # Return success
    return 0

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--gameId', type=str, default="random", help='Game ID')
    parser.add_argument('--test', type=int, default=0, help='Test mode')
    opt = parser.parse_args()

    main(opt.gameId, bool(opt.test))