Example of using this lib for RLHF?

Just wondering if there are any example of using this lib for implement RLHF (Reinforcement Learning from Human Feedback)?

Inspired by: https://openai.com/blog/chatgpt
![image](https://user-images.githubusercontent.com/6988036/229871768-341d1b74-a1ab-4ac8-815a-b47090e3f4e7.png)

Many thanks for any help! :)