Replies: 1 comment
-
|
This question is strongly related to algorithm rather than environments, because environments only provide information. If you want to figure out how algorithms handle observations from environments, you can check out OmniSafe. It is a comprehensive library for Safe RL. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I want to know where and in which form do you integrate the cost function of safety constraints into the overall training session?
If the constraint should not affect the rewards function how do you update the policy without being influenced by the constraints?
And where do they play a critical role?
do you have 2 separate policies?
I would appreciate if you could guide me also in terms of code that you have written.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions