Agents
Acme includes a number of pre-built agents listed below. All are provided as single-process agents, but we also include a distributed implementation using Launchpad. Distributed agents share the exact same learning and acting code as their single-process counterparts and can be executed either on a single machine (–lp_launch_type=[local_mt|local_mp] command line flag for multi-threaded or multi-process execution) or multi machine setup on GCP (–lp_launch_type=vertex_ai). For details please refer to Launchpad documentation.
We’ve listed the agents below in separate sections based on their different use cases, however these distinction are often subtle. For more information on each implementation see the relevant agent-specific README.
Continuous control
Acme has long had a focus on continuous control agents (i.e. settings where the action space consists of a continuous space). The following agents focus on this setting:
Agent |
Paper |
Code |
---|---|---|
Deep Deterministic Policy Gradient (DDPG) |
Lillicrap et al., 2015 |
|
Distributed Distributional Deep Determinist (D4PG) |
Barth-Maron et al., 2018 |
|
Maximum a posteriori Policy Optimisation (MPO) |
Abdolmaleki et al., 2018 |
|
Distributional Maximum a posteriori Policy Optimisation (DMPO) |
- |
|
Multi-Objective Maximum a posteriori Policy Optimisation (MO-MPO) |
Abdolmaleki, Huang et al., 2020 |
Discrete control
We also include a number of agents built with discrete action-spaces in mind. Note that the distinction between these agents and the continuous agents listed can be somewhat arbitrary. E.g. Impala could be implemented for continuous action spaces as well, but here we focus on a discrete-action variant.
Agent |
Paper |
Code |
---|---|---|
Deep Q-Networks (DQN) |
||
Importance-Weighted Actor-Learner Architectures (IMPALA) |
||
Recurrent Replay Distributed DQN (R2D2) |
Batch RL
The structure of Acme also lends itself quite nicely to “learner-only” algorithm for use in Batch RL (with no environment interactions). Implemented algorithms include:
Agent |
Paper |
Code |
---|---|---|
Behavior Cloning (BC) |
- |
Learning from demonstrations
Acme also easily allows active data acquisition to be combined with data from demonstrations. Such algorithms include:
Agent |
Paper |
Code |
---|---|---|
Deep Q-Learning from Demonstrations (DQfD) |
Hester et al., 2017 |
|
Recurrent Replay Distributed DQN from Demonstratinos (R2D3) |
Gulcehre et al., 2020 |
Model-based RL
Finally, Acme also includes a variant of MCTS which can be used for model-based RL using a given or learned simulator
Agent |
Paper |
Code |
---|---|---|
Monte-Carlo Tree Search (MCTS) |