Skip to main content

SAC Training

Soft Actor-Critic (SAC) is an off-policy algorithm that optimizes a stochastic policy with entropy regularization.

Overview

SAC is known for:

  • Sample efficiency
  • Automatic temperature tuning
  • Stable exploration

Basic Usage

from mesozoic import DinoEnv, SACAgent

env = DinoEnv("trex")
agent = SACAgent(env)

agent.train(
total_steps=3_600_000,
learning_rate=3e-4,
buffer_size=1_000_000
)

Results

ModelStepsAvg RewardTime
Basic Dinosaur3.6M3091.314:36:59

SAC significantly outperforms PPO for dinosaur locomotion tasks.

Coming Soon

Detailed SAC documentation is under development.