can it be ensured that the std deviation cannot be negative? this throws an error in line 63 of reward_model.py.
this is an edge case as the reward should never be negative but sometimes the opt problem can generate a flow of -0.00 and we want to be robust to that.
can it be ensured that the std deviation cannot be negative? this throws an error in
line 63ofreward_model.py.this is an edge case as the reward should never be negative but sometimes the opt problem can generate a flow of -0.00 and we want to be robust to that.