Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the soft q learning implementation #143

Open
YuxuanSong opened this issue May 19, 2020 · 1 comment
Open

Question on the soft q learning implementation #143

YuxuanSong opened this issue May 19, 2020 · 1 comment

Comments

@YuxuanSong
Copy link

YuxuanSong commented May 19, 2020

Hi Haarnoja,

Thanks a lot for maintaining the amazing repo!
I feel a little confused about the implementation of SVGD in soft-q learning.
At

log_probs = svgd_target_values + squash_correction

,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$))
However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned.

I think there should be actions = self._policy.raw_actions(expanded_observations) in

actions = self._policy.actions(expanded_observations)
. (the policy class could add this property.)

Best,
Yuxuan

@hartikainen
Copy link
Member

Hey @YuxuanSong, thanks for bringing this up! The SQL implementation in this repo was migrated from https://github.com/haarnoja/softqlearning and I have actually not tested it thoroughly. I'll try to take a closer look at this soon and make sure it's implemented properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants