Reinforcement learning from human preferences This repo is an implementation of https://arxiv.org/abs/1706.03741