You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using ElegantRL for a while now and also compared it to several other RL frameworks. For me, it's a lot easier to use than SB3 or Raylib, and judging from the advanced code-level optimizations you have in PPO and SAC the agent performance/stability/sample efficiency should be state-of-the-art in most cases. However, I can't fully determine this without extensive experimenting and hyperparameter sweeping.
Therefore, I would highly appreciate an extensive benchmarking of your agent implementation much like CleanRL did with PPO. Some agents like your AgentModSAC seem to have undergone rigorous testing, but it would be nice to know how much better it performs to your regular SAC implementation, and the implementations from other RL frameworks, and why specific code-level optimizations were chosen.
Maybe you also like to adopt W&B for experiment managing, as it would give you the option to track experiments, to document hyperparameter sweeps, and to flexibly log different types of animations/plots during episode training, which helps research a lot.
Other minor ideas would include agent implementations with LSTMs (from which I read that they are generally better performing and more sample-efficient in most RL settings), and may also allowing regular hyperparameter optimization via an Optuna TPE sampler or grid search besides the more compute-oriented population-based training (PBT).
The text was updated successfully, but these errors were encountered:
HenningBeyer
changed the title
Benchmarking of Agent Implementations
Benchmarking of Agent Implementations and Other Ideas
Nov 6, 2024
Hey,
I've been using ElegantRL for a while now and also compared it to several other RL frameworks. For me, it's a lot easier to use than SB3 or Raylib, and judging from the advanced code-level optimizations you have in PPO and SAC the agent performance/stability/sample efficiency should be state-of-the-art in most cases. However, I can't fully determine this without extensive experimenting and hyperparameter sweeping.
Therefore, I would highly appreciate an extensive benchmarking of your agent implementation much like CleanRL did with PPO. Some agents like your AgentModSAC seem to have undergone rigorous testing, but it would be nice to know how much better it performs to your regular SAC implementation, and the implementations from other RL frameworks, and why specific code-level optimizations were chosen.
Maybe you also like to adopt W&B for experiment managing, as it would give you the option to track experiments, to document hyperparameter sweeps, and to flexibly log different types of animations/plots during episode training, which helps research a lot.
Other minor ideas would include agent implementations with LSTMs (from which I read that they are generally better performing and more sample-efficient in most RL settings), and may also allowing regular hyperparameter optimization via an Optuna TPE sampler or grid search besides the more compute-oriented population-based training (PBT).
The text was updated successfully, but these errors were encountered: