Reinforcement learning from human preferences

This repo is an implementation of https://arxiv.org/abs/1706.03741

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
a2c.py		a2c.py
a2c_net.py		a2c_net.py
buffer.py		buffer.py
dueling_dqn.py		dueling_dqn.py
main_benchmark.py		main_benchmark.py
neural_net.py		neural_net.py
q_table.py		q_table.py
utils.py		utils.py
videos.py		videos.py

Provide feedback