You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$))
However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned.
I think there should be actions = self._policy.raw_actions(expanded_observations) in
Hey @YuxuanSong, thanks for bringing this up! The SQL implementation in this repo was migrated from https://github.com/haarnoja/softqlearning and I have actually not tested it thoroughly. I'll try to take a closer look at this soon and make sure it's implemented properly.
Hi Haarnoja,
Thanks a lot for maintaining the amazing repo!
I feel a little confused about the implementation of SVGD in soft-q learning.
At
softlearning/softlearning/algorithms/sql.py
Line 281 in 05daa55
,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the
However, the following SVGD used the log probs on the
I think there should be actions = self._policy.raw_actions(expanded_observations) in
softlearning/softlearning/algorithms/sql.py
Line 235 in 05daa55
Best,
Yuxuan
The text was updated successfully, but these errors were encountered: