Self-Play Non-Convergence in a Markov Game

An interesting Learning Curve

Learning in a Markov Game that does not convergence to NE but rather oscillate between “locally” best strategies w.r.t. to the opponent. Implemented using Policy Gradient w. baseline, \(\eta = 0.001\)..

Accuracy