Question on the soft q learning implementation #143

YuxuanSong · 2020-05-19T08:54:23Z

Hi Haarnoja,

Thanks a lot for maintaining the amazing repo!
I feel a little confused about the implementation of SVGD in soft-q learning.
At

softlearning/softlearning/algorithms/sql.py

Line 281 in 05daa55

log_probs = svgd_target_values + squash_correction

，the log probs is calculated as log_probs = svgd_target_values + squash_correction，where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$))
However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned.

I think there should be actions = self._policy.raw_actions(expanded_observations) in

softlearning/softlearning/algorithms/sql.py

Line 235 in 05daa55

actions = self._policy.actions(expanded_observations)

. (the policy class could add this property.)

Best，
Yuxuan

hartikainen · 2020-05-19T09:43:48Z

Hey @YuxuanSong, thanks for bringing this up! The SQL implementation in this repo was migrated from https://github.com/haarnoja/softqlearning and I have actually not tested it thoroughly. I'll try to take a closer look at this soon and make sure it's implemented properly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the soft q learning implementation #143

Question on the soft q learning implementation #143

YuxuanSong commented May 19, 2020 •

edited

Loading

hartikainen commented May 19, 2020

Question on the soft q learning implementation #143

Question on the soft q learning implementation #143

Comments

YuxuanSong commented May 19, 2020 • edited Loading

hartikainen commented May 19, 2020

YuxuanSong commented May 19, 2020 •

edited

Loading