Skip to main content

PPO Training

Proximal Policy Optimization (PPO) is a policy gradient method for reinforcement learning.

Overview

PPO is known for:

  • Stable training
  • Good sample efficiency
  • Easy hyperparameter tuning

Basic Usage

from mesozoic import DinoEnv, PPOAgent

env = DinoEnv("trex")
agent = PPOAgent(env)

agent.train(
total_steps=2_600_000,
learning_rate=3e-4,
clip_ratio=0.2
)

Results

ModelStepsAvg RewardTime
Basic Dinosaur2.6M319.941:29:43
Coming Soon

Detailed PPO documentation is under development.