
Task Description#

The goal of this environment is for all agents to hover at their starting positions for as long as possible.


from PyFlyt.pz_envs import MAQuadXHoverEnv

env = MAQuadXHoverEnv(render_mode="human")
observations, infos = env.reset()

while env.agents:
    # this is where you would insert your policy
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}

    observations, rewards, terminations, truncations, infos = env.step(actions)

Environment Options#

class PyFlyt.pz_envs.quadx_envs.ma_quadx_hover_env.MAQuadXHoverEnv(start_pos: ndarray = array([[-1., -1., 1.], [1., -1., 1.], [-1., 1., 1.], [1., 1., 1.]]), start_orn: ndarray = array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]), sparse_reward: bool = False, flight_mode: int = 0, flight_dome_size: float = 10.0, max_duration_seconds: float = 30.0, angle_representation: str = 'quaternion', agent_hz: int = 40, render_mode: None | str = None)#

Simple Multiagent Hover Environment.

Actions are vp, vq, vr, T, ie: angular rates and thrust. The target is for each agent to not crash for the longest time possible.


start_pos (np.ndarray): an (num_drones x 3) numpy array specifying the starting positions of each agent. start_orn (np.ndarray): an (num_drones x 3) numpy array specifying the starting orientations of each agent. sparse_reward (bool): whether to use sparse rewards or not. flight_mode (int): the flight mode of all UAVs. flight_dome_size (float): size of the allowable flying area. max_duration_seconds (float): maximum simulation time of the environment. angle_representation (str): can be “euler” or “quaternion”. agent_hz (int): looprate of the agent to environment interaction. render_mode (None | str): can be “human” or None.