deterministic actions when calling play ()

Summary

When playing a trained DRL-Agent to evaluate the policy, the policy is not set as deterministic. So the agent samples from the action distribution, which leads to varying outputs and therefore incorrect evaluation and performance assessment.

Explain your context

call play() for a DRL-Agent

What is the expected correct behavior? What was your goal?

deterministic actions when calling play()

What is the current bug behavior?

non deterministic actions when calling play()

Was there an error message?

Steps to reproduce

calling play() for a trained agent