Skip to content

Latest commit

 

History

History

12. Learning DDPG, TD3 and SAC

12. Learning DDPG, TD3 and SAC

  • 12.1. Deep Deterministic Policy Gradient
    • 12.1.1. An Overview of DDPG
  • 12.2. Components of DDPG
    • 12.2.1. Critic network
    • 12.2.2. Actor Network
  • 12.3. Putting it all Together
  • 12.4. Algorithm - DDPG
  • 12.5. Swinging Up the Pendulum using DDPG
  • 12.6. Twin Delayed DDPG
  • 12.7. Components of DDPG
    • 12.7.1. Key Features of TD3
    • 12.7.2. Clipped Double Q Learning
    • 12.7.3. Delayed Policy Updates
    • 12.7.4. Target Policy Smoothing
  • 12.8. Putting it all Together
  • 12.9. Algorithm - TD3
  • 12.10. Soft Actor Critic
  • 12.11. Components of SAC
    • 12.11.1. Understanding Soft Actor Critic
    • 12.11.2. V and Q Function with the Entropy Term
    • 12.11.3. Critic Network
      • 12.11.3.1. Value Network
      • 12.11.3.2. Q Network
      • 12.11.3.3. Actor Network
  • 12.12 Putting it all Together
  • 12.13. Algorithm - SAC