The source code of all examples described in this section is available in our DIAMBRA Agents repository.
We highly recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies. In what follows we use Conda but any other tool should work too.
Create and activate a new dedicated virtual environment:
conda create -n diambra-arena-ray python=3.8
conda activate diambra-arena-ray
Install DIAMBRA Arena with Ray RLlib interface:
pip install diambra-arena[ray-rllib]
This should be enough to prepare your system to execute the following examples. You can refer to the official Ray RLlib documentation or reach out on our Discord server for specific needs.
All the examples presented below are available here: DIAMBRA Agents - Ray RLlib. They have been created following the high level approach found on Ray RLlib examples page and their related repository collection, thus allowing to easily extend them and to understand how they interface with the different components.
These examples only aims at demonstrating the core functionalities and high level aspects, they will not generate well performing agents, even if the training time is extended to cover a large number of training steps. The user will need to build upon them, exploring aspects like: policy network architecture, algorithm hyperparameter tuning, observation space tweaking, rewards wrapping and other similar ones.
DIAMBRA Arena native interface with Ray RLlib covers a wide range of use cases, automating handling of key things like parallelization. In the majority of cases it will be sufficient for users to directly import and use it, with no need for additional customization.
For the interface low level details, users can review the correspondent source code here.
For all the basic examples, the environment will be used in hardcore
mode, so that the observation space will be only of type Box
composed by screen pixels, as in the majority of simple examples found in tutorials and docs. This allows to directly use it without the need of further processing.
This example demonstrates how to:
It uses the PPO algorithm and, for demonstration purposes, the algorithm is trained for only 200 steps, so the resulting agent will be far from optimal.
import diambra.arena
from diambra.arena.ray_rllib.make_ray_env import DiambraArena, preprocess_ray_config
from ray.rllib.algorithms.ppo import PPO
def main():
# Settings
settings = {}
settings["hardcore"] = True
settings["frame_shape"] = (84, 84, 1)
config = {
# Define and configure the environment
"env": DiambraArena,
"env_config": {
"game_id": "doapp",
"settings": settings,
},
"num_workers": 0,
"train_batch_size": 200,
}
# Update config file
config = preprocess_ray_config(config)
# Create the RLlib Agent.
agent = PPO(config=config)
# Run it for n training iterations
print("\nStarting training ...\n")
for idx in range(1):
print("Training iteration:", idx + 1)
agent.train()
print("\n .. training completed.")
# Run the trained agent (and render each timestep output).
print("\nStarting trained agent execution ...\n")
env = diambra.arena.make("doapp", settings)
observation = env.reset()
while True:
env.render()
action = agent.compute_single_action(observation)
observation, reward, done, info = env.step(action)
if done:
observation = env.reset()
break
print("\n... trained agent execution completed.\n")
# Close the environment
env.close()
# Return success
return 0
if __name__ == "__main__":
main()
How to run it:
diambra run python basic.py
In addition to what seen in the previous example, this one demonstrates how to:
The same conditions of the previous example for algorithm, policy and training steps are used in this one too.
from diambra.arena.ray_rllib.make_ray_env import DiambraArena, preprocess_ray_config
from ray.rllib.algorithms.ppo import PPO
from ray.tune.logger import pretty_print
def main():
# Settings
settings = {}
settings["hardcore"] = True
settings["frame_shape"] = (84, 84, 1)
config = {
# Define and configure the environment
"env": DiambraArena,
"env_config": {
"game_id": "doapp",
"settings": settings,
},
"num_workers": 0,
"train_batch_size": 200,
"framework": "torch",
}
# Update config file
config = preprocess_ray_config(config)
# Create the RLlib Agent.
agent = PPO(config=config)
print("Policy architecture =\n{}".format(agent.get_policy().model))
# Run it for n training iterations
print("\nStarting training ...\n")
for idx in range(1):
print("Training iteration:", idx + 1)
results = agent.train()
print("\n .. training completed.")
print("Training results:\n{}".format(pretty_print(results)))
# Save the agent
checkpoint = agent.save()
print("Checkpoint saved at {}".format(checkpoint))
del agent # delete trained model to demonstrate loading
# Load the trained agent
agent = PPO(config=config)
agent.restore(checkpoint)
print("Agent loaded")
# Evaluate the trained agent (and render each timestep to the shell's
# output).
print("\nStarting evaluation ...\n")
results = agent.evaluate()
print("\n... evaluation completed.\n")
print("Evaluation results:\n{}".format(pretty_print(results)))
# Return success
return 0
if __name__ == "__main__":
main()
How to run it:
diambra run python saving_loading_evaluating.py
In addition to what seen in previous examples, this one demonstrates how to:
This example runs multiple environments. In order to properly execute it, the user needs to specify the correct number of environments instances to be created via DIAMBRA CLI when running the script. In particular, in this case, 6 different instances are needed:
from diambra.arena.ray_rllib.make_ray_env import DiambraArena, preprocess_ray_config
from ray.rllib.algorithms.ppo import PPO
from ray.tune.logger import pretty_print
def main():
# Settings
settings = {}
settings["hardcore"] = True
settings["frame_shape"] = (84, 84, 1)
config = {
# Define and configure the environment
"env": DiambraArena,
"env_config": {
"game_id": "doapp",
"settings": settings,
},
"train_batch_size": 200,
# Use 2 rollout workers
"num_workers": 2,
# Use a vectorized env with 2 sub-envs.
"num_envs_per_worker": 2,
# Evaluate once per training iteration.
"evaluation_interval": 1,
# Run evaluation on (at least) two episodes
"evaluation_duration": 2,
# ... using one evaluation worker (setting this to 0 will cause
# evaluation to run on the local evaluation worker, blocking
# training until evaluation is done).
"evaluation_num_workers": 1,
# Special evaluation config. Keys specified here will override
# the same keys in the main config, but only for evaluation.
"evaluation_config": {
# Render the env while evaluating.
# Note that this will always only render the 1st RolloutWorker's
# env and only the 1st sub-env in a vectorized env.
"render_env": True,
},
}
# Update config file
config = preprocess_ray_config(config)
# Create the RLlib Agent.
agent = PPO(config=config)
# Run it for n training iterations
print("\nStarting training ...\n")
for idx in range(2):
print("Training iteration:", idx + 1)
results = agent.train()
print("\n .. training completed.")
print("Training results:\n{}".format(pretty_print(results)))
# Return success
return 0
if __name__ == "__main__":
main()
How to run it:
diambra run -s=6 python parallel_envs.py
The nex example make use of the complete observation space of our environments. This is of type Dict
, in which different elements are organized as key-value pairs and they can be of different type.
In addition to what seen in previous examples, this one demonstrates how to:
There are two main things to note in this example: how to handle observation normalization and dictionary observations. As it can be seen from the snippet below, the normalization wrapper is applied on all elements prescribing one-hot encoding to be applied on binary discrete observations too. This is usually not needed nor suggested, but it is requested by Ray RLlib to automatically handle this observation type. On the other hand, the library does not have constraints on dictionary observation spaces, being able to handle nested ones too.
The policy network is automatically generated, properly handling different types of inputs. Model architecture is then printed to the console output, allowing to clearly identify all the different contributions.
from diambra.arena.ray_rllib.make_ray_env import DiambraArena, preprocess_ray_config
from ray.rllib.algorithms.ppo import PPO
from ray.tune.logger import pretty_print
def main():
# Settings
settings = {}
settings["frame_shape"] = (84, 84, 1)
settings["characters"] = ("Kasumi")
# Wrappers Settings
wrappers_settings = {}
wrappers_settings["reward_normalization"] = True
wrappers_settings["actions_stack"] = 12
wrappers_settings["frame_stack"] = 5
wrappers_settings["scale"] = True
wrappers_settings["process_discrete_binary"] = True
config = {
# Define and configure the environment
"env": DiambraArena,
"env_config": {
"game_id": "doapp",
"settings": settings,
"wrappers_settings": wrappers_settings,
},
"num_workers": 0,
"train_batch_size": 200,
"framework": "torch",
}
# Update config file
config = preprocess_ray_config(config)
# Create the RLlib Agent.
agent = PPO(config=config)
print("Policy architecture =\n{}".format(agent.get_policy().model))
# Run it for n training iterations
print("\nStarting training ...\n")
for idx in range(1):
print("Training iteration:", idx + 1)
results = agent.train()
print("\n .. training completed.")
print("Training results:\n{}".format(pretty_print(results)))
# Evaluate the trained agent (and render each timestep to the shell's
# output).
print("\nStarting evaluation ...\n")
results = agent.evaluate()
print("\n... evaluation completed.\n")
print("Evaluation results:\n{}".format(pretty_print(results)))
# Return success
return 0
if __name__ == "__main__":
main()
How to run it:
diambra run python dict_obs_space.py
Finally, after the agent training is completed, besides running it locally in your own machine, you may want to submit it to our competition platform! To do so, you can use the following script that provides a ready to use, flexible example that can accommodate different models, games and settings.
To submit your trained agent to our platform, compete for the first leaderboard positions, and unlock our achievements, follow the simple steps described in the “How to Submit an Agent” section.
import argparse
import diambra.arena
from diambra.arena.ray_rllib.make_ray_env import DiambraArena, preprocess_ray_config
from ray.rllib.algorithms.ppo import PPO
# Reference: https://github.com/ray-project/ray/blob/ray-2.0.0/rllib/examples/inference_and_serving/policy_inference_after_training.py
"""This is an example agent based on RL Lib.
Usage:
diambra run python agent.py --trainedModel /absolute/path/to/checkpoint/ --envSpaces /absolute/path/to/environment/spaces/descriptor/
"""
def main(trained_model, env_spaces):
# Settings
settings = {}
settings["frame_shape"] = (84, 84, 1)
settings["characters"] = ("Kasumi")
# Wrappers Settings
wrappers_settings = {}
wrappers_settings["reward_normalization"] = True
wrappers_settings["actions_stack"] = 12
wrappers_settings["frame_stack"] = 5
wrappers_settings["scale"] = True
wrappers_settings["process_discrete_binary"] = True
config = {
# Define and configure the environment
"env": DiambraArena,
"env_config": {
"game_id": "doapp",
"settings": settings,
"wrappers_settings": wrappers_settings,
"load_spaces_from_file": True,
"env_spaces_file_name": env_spaces,
},
"num_workers": 0,
"train_batch_size": 200,
"framework": "torch",
}
# Update config file
config = preprocess_ray_config(config)
# Load the trained agent
agent = PPO(config=config)
agent.restore(trained_model)
print("Agent loaded")
# Print the agent policy architecture
print("Policy architecture =\n{}".format(agent.get_policy().model))
env = diambra.arena.make("doapp", settings, wrappers_settings)
obs = env.reset()
while True:
env.render()
action = agent.compute_single_action(observation=obs, explore=True, policy_id="default_policy")
obs, reward, done, info = env.step(action)
if done:
obs = env.reset()
if info["env_done"]:
break
# Close the environment
env.close()
# Return success
return 0
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--trainedModel", type=str, required=True, help="Model path")
parser.add_argument("--envSpaces", type=str, required=True, help="Environment spaces descriptor file path")
opt = parser.parse_args()
print(opt)
main(opt.trainedModel, opt.envSpaces)
How to run it locally:
diambra run python agent.py --trainedModel /absolute/path/to/checkpoint/ --envSpaces /absolute/path/to/environment/spaces/descriptor/