deterministic actions when calling play ()
Summary
When playing a trained DRL-Agent to evaluate the policy, the policy is not set as deterministic. So the agent samples from the action distribution, which leads to varying outputs and therefore incorrect evaluation and performance assessment.
Explain your context
call play() for a DRL-Agent
What is the expected correct behavior? What was your goal?
deterministic actions when calling play()
What is the current bug behavior?
non deterministic actions when calling play()
Was there an error message?
Steps to reproduce
calling play() for a trained agent