Proximal Policy Optimization
Authors: John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov Source: https://arxiv.org/pdf/1707.06347.pdf
Problems Addressed
How can we take the biggest possible improvement step on a policy using the data we currently have without stepping so far that we accidentally cause performance collapse?
Key Ideas
Proximal Policy Optimization uses a clipped surrogate objective function which forms a lower bound of the performance of the policy.
To optimize policies, alternate between sampling data from the policy and performing several epochs of optimization on sampled data.