-
Apr 10, 2025
My failure log
It's hard to digest that I've failed. These are fragments from years past—a reflection I hope turns out to be meaningful progress. -
Apr 10, 2025
Policy gradient
Don't ever forget about policy gradient. -
Apr 10, 2025
Function approximation and representation learning in RL
The discrepancy between linear RL and deep RL, and intuitive understanding of representation in RL. -
Apr 9, 2025
State vs Observation in RL
-
Apr 9, 2025
Can we trust math?
-
Apr 9, 2025
[Editing] From TRPO to PPO
Many people use PPO without understanding it, but that's why it's such a good algorithm. -
Apr 9, 2025
[Editing] Overdetermined vs Overcomplete vs Overparameterized
-
Apr 9, 2025
[Editing] Most of the deep RL algorithms are TD