Existence of Optimal Policy in Markov Decision Process

In this blog, we will prove the following theorem: Optimal Policy Existence Theorem: For any Markov Decision Process, There exists an optimal policy π that is better than or equal to all other policies, ππ,π All optimal policies achieve the optimal value function, Vπ(s)=V(s) All optimal policies achieve the optimal action-value function Qπ(s,a)=Q(s,a) Definition To simplify the exposition, we first define some basic concepts. ...

September 12, 2022 · 8 min