1 article in this category
Conservative Q-Learning achieves a 25% higher return mean than Behavior Cloning in safety-critical environments.