Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using CALM with a different humanoid.xml #16

Open
VineetTambe opened this issue Aug 2, 2024 · 18 comments
Open

Using CALM with a different humanoid.xml #16

VineetTambe opened this issue Aug 2, 2024 · 18 comments

Comments

@VineetTambe
Copy link

VineetTambe commented Aug 2, 2024

I am trying setup training with a different mjcx/humnoid.xml file however I am facing a lot of dimension issues and observation space issues if I just replace the asset_file.

What files and changes do I have to do in order for the repo to work on a custom humanoid.xml with retargetted motions

Edit 1:
I have a humanoid with the 33 nodes in the SkeletonTree and a matching retargetted mocap.
But don't want to actuate all of the joints - only want to actuate a subset similar to that of the AMP humanoid.
Is there a way I can extend the current repo to match the above?

Edit 2:
Can you also elaborate on how the following vars in the observation space is constructed?

self._dof_obs_size = 72
self._num_obs = 1 + 15 * (3 + 6 + 3 + 3) - 3
@tesslerc
Copy link
Collaborator

tesslerc commented Aug 3, 2024

num_obs : height + num_bodies * (pos + rot + vel + ang_vel) - root_pos
height is single dim.
num_bodies is 15 for the amp humanoid.
for each body part the position is 3dim, rotation is in 6d, vel and angular velocity in 3d.
finally the root pos is removed, so that's 3 dims.

For dof_obs_size you can see the dof_to_obs function. Or alternatively run the function and see the expected dimensions.

@tesslerc
Copy link
Collaborator

tesslerc commented Aug 3, 2024

If you don't want to actuate a joint, I would try to set the corresponding entry in the action vector to 0.

@VineetTambe
Copy link
Author

VineetTambe commented Aug 6, 2024

A follow up to the above - turns out I might have been running it with the wrong config

after running it with

python calm/run.py --task HumanoidAMPGetup --cfg_env calm/data/cfg/humanoid_calm_sword_shield_getup.yaml --cfg_train calm/data/cfg/train/rlg/custom_calm_beta.yaml --motion_file calm/data/motions/beta_npy/beta_07_01_cmu4.npy --headless 

In the above command I have replaced the AMP Humanoid .xml with my custom humanoid and replaced the motion with my custom retargetted data.

But I end up getting this error:

Traceback (most recent call last):
  File "calm/run.py", line 274, in <module>
    main()
  File "calm/run.py", line 268, in main
    runner.run(vargs)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/rl_games/torch_runner.py", line 139, in run
    self.run_train()
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/rl_games/torch_runner.py", line 125, in run_train
    agent.train()
  File "/home/vineet/1x/CALM/calm/learning/common_agent.py", line 120, in train
    train_info = self.train_epoch()
  File "/home/vineet/1x/CALM/calm/learning/calm_agent.py", line 200, in train_epoch
    batch_dict = self.play_steps()
  File "/home/vineet/1x/CALM/calm/learning/calm_agent.py", line 85, in play_steps
    res_dict = self.get_action_values(self.obs, self._calm_latents, self._rand_action_probs)
  File "/home/vineet/1x/CALM/calm/learning/calm_agent.py", line 164, in get_action_values
    res_dict = self.model(input_dict)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/1x/CALM/calm/learning/calm_models.py", line 50, in forward
    result = super().forward(input_dict)
  File "/home/vineet/1x/CALM/calm/learning/amp_models.py", line 51, in forward
    result = super().forward(input_dict)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/rl_games/algos_torch/models.py", line 229, in forward
    distr = torch.distributions.Normal(mu, sigma)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/torch/distributions/normal.py", line 56, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/torch/distributions/distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1024, 33)) of distribution Normal(loc: torch.Size([1024, 33]), scale: torch.Size([1024, 33])) to satisfy the constraint Real(), but found invalid values:
tensor([[-0.0145,  0.1211,  0.0497,  ..., -0.0845,  0.0583, -0.0713],
        [ 0.0640,  0.0314,  0.0291,  ...,  0.0107, -0.0104, -0.0126],
        [ 0.0036,  0.0790,  0.0056,  ...,  0.0328,  0.0317, -0.0004],
        ...,
        [ 0.0726,  0.0994,  0.0919,  ..., -0.0465,  0.0093, -0.0204],
        [-0.0061,  0.1919,  0.0032,  ..., -0.0424, -0.0283, -0.0463],
        [-0.0562,  0.0312,  0.0701,  ...,  0.0411, -0.0324, -0.0607]],
       device='cuda:0')

Any clue as to what might be the issue here? My sigmas are set to non trainable and constant.


Edit1:
It seems that the way HumanoidAMPGetup calculates "fall state" might be inducing the instability.
If I understand the implementation correctly - the fall state is obtained by randomly initializing the humanoid and simulating 150 sim steps to obtain a final "fall configuraion" to which actors are reset at random during training.
This might be the reason why isaac gym gives NaN? [ref]

@tesslerc
Copy link
Collaborator

tesslerc commented Aug 6, 2024

From my experience that's usually when you have NANs either in your model weights or in your inputs.

Does it work correctly without the added changes and with the default AMP humanoid?

@VineetTambe
Copy link
Author

VineetTambe commented Aug 6, 2024

So here's what's weird to me is -

The training runs without any issues if I use the following configs:

  1. default CALM configs with human sword shield with HumanoidAMPGetup env
  2. CALM configs with AMP humanoid and humanoid.yaml for environment and calm_humanoid.yaml as training config with HumanoidAMP
  3. CALM configs with my custom humanoid.xml and humanoid.yaml for environment and calm_humanoid.yaml as training config with HumanoidAMP

It crashes when I use:
CALM configs with my custom humanoid.xml and humanoid_calm_sword_shield_getup.yaml for environment and calm_humanoid.yaml as training config with HumanoidAMPGetup.

I have ensured by explicitly checking the inputs i.e. observations are always non nan values before they are returned.

Edit1:
After experimenting for a while -
it seems that the random fall initialization does not play well with isaac gym as some joints may be initialized to a completely invalid state and is very finicky and fails and crashes after a random number of iterations.
Specifically this var self._rigid_body_pos has one of the rows as NaN's.

Thanks again for all the help!

@tesslerc
Copy link
Collaborator

tesslerc commented Aug 7, 2024

Are the tensors generated in

def _generate_fall_states(self):
ok?
Are self._fall_root_states and self._fall_dof_pos without any NANs?

@VineetTambe
Copy link
Author

VineetTambe commented Aug 7, 2024

I managed to figure out the issue!
It was caused due incorrect stiffness and damping params - the CALM codebase expects them to be in the .xml itself, my xml did not specify those - adding them to the code seemed to have solved the issue with NaN's.

@VineetTambe
Copy link
Author

VineetTambe commented Aug 7, 2024

A follow up question on this -

  1. Is there any signal (training curves / reward curves) that I should look for in order to get some signs of life on training? It is a bit too inefficient to wait for 13-14 hours to see if the training is succeeding?
  2. I was monitoring the mean episode length / itr curve but given that I am using only a single .npy file from the cmu opensource dataset, how long should I expect my episode length to be? The .npy file is a retargetted speed walk from the 07_01.fbx which is about 1.5 ish seconds in length?

Currently the mean episode length I get is around 15-20 by itter 5k.
I get a similar range if I run the default command:

python calm/run.py --task HumanoidAMP --cfg_env calm/data/cfg/humanoid_calm_sword_shield.yaml --cfg_train calm/data/cfg/train/rlg/amp_humanoid.yaml --motion_file calm/data/motions/amp_humanoid_walk.npy --headless  --track

The command I am testing on is:

python calm/run.py --task HumanoidAMPGetup --cfg_env calm/data/cfg/humanoid_calm_sword_shield.yaml --cfg_train calm/data/cfg/train/rlg/calm_custom_humanoid.yaml --motion_file calm/data/motions/custom_humanoid_walk.npy --headless  --track

However when I run the default llc training for sword shield humanoid:

python calm/run.py --task HumanoidAMPGetup --cfg_env calm/data/cfg/humanoid_calm_sword_shield_getup.yaml --cfg_train calm/data/cfg/train/rlg/calm_humanoid.yaml --motion_file calm/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless  --track

I get about 200 ep mean length after 2k itters

@tesslerc
Copy link
Collaborator

tesslerc commented Aug 8, 2024

Episodes should be up to 300 frames and terminate early the agent falls down.
It should relatively quickly learn to get up and the episode length should spike upwards.
Over time it reaches ~290, which means it learns to execute commands relatively stable without falling very often.

Other metrics for learning is to track the discriminator reward. However, since we are training with a discriminative objective, it is a bit tricky (like with the entire GAN line of work) to track performance.
What I find works well in these types of models (discriminative-based) is to periodically visualize the saved model. With the sword and shield agent -- you see a point where it stands up. Then it typically learns to turn and walk around. Then over more time it starts to learn the more complex skills such as sword attacks.

I am not sure what to expect with your model, as it is both a different structured humanoid and you seem to have changed some control parameters.

@VineetTambe
Copy link
Author

Okay - after fixing the XLM and motion_lib - I am getting an ep mean length of about 250 which is still less than the expected 290.
There still seems to be something that I might be doing incorrectly.
Is there anything else I could do like varying params of training like - task_reward_w and disc_reward_w? [ref]
currently I have set the following values for my training - referring to amp_humanoid_task.yaml in the training configs

    task_reward_w: 0.5
    disc_reward_w: 0.1
    conditional_disc_reward_w: 1.0

Could you shed light on the intuition behind tuning these values?

@tesslerc
Copy link
Collaborator

It depends what you're trying to solve.
The LLC policy in CALM (and also ASE) is typically trained without a task reward. The task is an environment-given reward, for example a reward for following a provided path.

As you can see here:

the default parameters for CALM only use the conditional discriminator reward.

From my experience combining the conditional and unconditional discriminator doesn't always work well. The unconditional discriminator attempts to push the controller towards the "average" data distribution, whereas the conditional one pushes it to match the state distribution for the current conditioned motions. The two may be combined in a smart way, by providing discriminative rewards in the transition periods between motions, but we have not attempted this and it is mostly speculation.

@VineetTambe
Copy link
Author

VineetTambe commented Aug 14, 2024

Thanks for the insight.
I am now able to get a episode length of about 250-280 when training with my custom xml.
But weirdly enough the policy learns to walk on it's toes instead of having it's foot flat.
At first I thought this might be due to some inaccuracies in the reference data and I regenerated the reference trajectories by updating the root_height_offset in the retargetting script. However, changing that does not seem to have any effect on the behavior.

Do you have any clue as to why this might be happening? Is there a hard coded reference height somewhere in the code base?

Edit:
Playing around env params rootHeightObs and refined the data to ensure that the reference has a flat foot - however I still see that training results in the robot learning to walk on it's toes.

@tesslerc
Copy link
Collaborator

tesslerc commented Sep 1, 2024

I don't recall any offsets for the reference motions. For example, when training on the sword and shield dataset the character does not learn to tip-toe.

@VineetTambe
Copy link
Author

Yes, maybe it's an issue with the way I have configured my xml -
Could you point me where is the actual "discriminator loss" being calculated - a workaround which I would like to try is to remove the motion matching loss with respect to the ankle joints and the foot
I hope doing so might mitigate the peculiar "tip-toe" behaviour.

@VineetTambe
Copy link
Author

An update on this - it was due to the way I had configured my xml, fixing that helped me improve the results.

@tesslerc
Copy link
Collaborator

tesslerc commented Oct 9, 2024

Would be cool to see some results once you can share! :)

@VineetTambe
Copy link
Author

VineetTambe commented Oct 9, 2024

I had some questions regarding the way the observation space is structured:
0. self._num_amp_obs_per_step in humanoid_amp.py - does it represents the input space for the motion encoder ?
0. self._num_obs in humanoid.py - does it represents the input to the motion decode (which is the actual llc policy?) ?

Question:

  1. I found the isaac gym documentation about acquire_rigid_body_state_tensor to be lacking on the documentation page - is there any documentation available that I might have missed? Could you direct me to it?

  2. If I understand it correctly acquire_rigid_body_state_tensor is supposed to give you pos + rot + lin vel + ang vel -> which is 3+4+3+3 (with the rot being a quaternion) BUT the observation space that was described in this comment says rotation is a 6 degree vector - could you explain what is the rotation decomposed as?
    I addition could you describe the ang vel and whether it is in axis angle or euler?

  3. In addition, these vectors seem to be in global frame, which is not good if I want to use this policy where I don't have access to a global frame. Could confirm whether the observation used in CALM is in local frame or global frame?

  4. If in global frame, I would like to modify the observation of the actor policy such that all the observations are in local frame or frame invariant - where do I need to make these modifications?
    a. calm_network_builder.py -> I assume the obs change would reflect here if I modify self._num_obs in humanoid.py

Where do I change the actual observation being passed?
build_amp_observations or compute_humanoid_observations_max?

  1. Could you summaries the final network inp/output of the 3 networks being used here?
    a. motion critic -> takes in observation from which function? what's the shape of input output?
    b. actor -> the main policy -> takes in observation from which function? what's the shape of input output?
    c. HLC -> takes in observation from which function? what's the shape of input output?

Edit:
Oct 9, 2024:
Ahh, I think I confused myself by overthinking the complexity of the problem - to answer my own question for anybody in the future looking at the same problem:

All of the observations are computed in _compute_observation() function in humanoid.py

  1. I still don't know

  2. Yes, the doc is poor - I have no fix for that

  3. The acquire_rigid_body_state_tensor api does give a 3+4+3+3 tensor as described above for all the bodies defined in the xml
    For CALM - the rotation is first converted to a delta rotation by taking a quaternion diff between heading rot and then converted into an axis angle form.

For the rotation ->
I think what's happening is the quaternion nx4 is converted into a tangent nx3 and a normal nx3 and stacked together to make nx6.

The question I have here is why use tangent normal? Why not convert it to euler angles?

I am still confused by how body_ang_vel is handeled as I though it's a 1x3 vector but I see some quaternion operations being applied to it.

  1. Yes, acquire_rigid_body_state_tensor api gives global data but it is converted to local in the _compute_observation() as described above .

  2. It is not in global frame when passed to model (aside from the root_height which can be set to 0.0 or completely removed)

  3. still confused about 5

@tesslerc
Copy link
Collaborator

tesslerc commented Oct 9, 2024

  1. See here (not the best location in hindsight...) https://github.com/NVlabs/CALM/blob/main/calm/data/cfg/humanoid_calm_sword_shield_getup.yaml numAMPEncObsSteps: 60 means 60 frames as input to the encoder. self._num_obs is the number of features for representing the pose of the humanoid and is provided to the policy (LLC and HLC).
  2. Sorry I can't be of much help here. The official docs are the best I have.
  3. I think this type of representation has been shown to be very efficient in learning. It's very common in a wide range of prior works for physics-based animation (amp, ase, phc, pulse, and more).
  4. Yep. We always convert everything to the "ego view". Translate relative to the pelvis and rotate relative to the pelvis.
  5. Yes, you're right, all is in local frame, unless you specifically configure to provide the root position in global, but that's not recommended.

critic:

next_vals = self._eval_critic(self.obs, self._calm_latents)
observes pose and latent
encoder:
enc_amp_obs = self._preproc_amp_obs(input_dict['enc_amp_obs'])
observes 60 poses
policy: like critic -- current pose + latent
discriminator: we have two. one sees a sequence of 10 frames. the other sees 10 frames + latent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants