Deep reinforcement learning (RL) heavily relies on value functions, which are typically trained through mean squared error regression to ensure alignment with bootstrapped target values. However, while cross-entropy classification loss effectively scales up supervised learning, regression-based value functions pose scalability challenges in deep RL.
In classical deep learning, large neural networks show proficiency at handling classification…