A reinforcement process is one in which some aspects of the behavior of
a system are caused to become more (or less) prominent in the future
as a consequence of the application of a “reinforcement operator”.
in Steps Toward Artificial Intelligence, Marvin Minsky
本文是关于深度Q强化学习Deep Q-Learning的学习笔记
主要包括前置知识、策略梯度PG、近端策略优化PPO、深度Q网络DQN、深度确定性策略梯度DDPG和相关扩展知识