Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dm_control cheetah run training stops suddenly #151

Open
letusfly85 opened this issue Sep 14, 2020 · 3 comments
Open

dm_control cheetah run training stops suddenly #151

letusfly85 opened this issue Sep 14, 2020 · 3 comments

Comments

@letusfly85
Copy link

Hi, I'm now trying to execute dm_control walker walk, walker run, and cheetah run.

Two walker walk, walker run work fine, however cheetah run fails during training like below...

Failure message

Number of errored trials: 1
+--------------------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name               |   # failures | error file                                                                                                                                                                                                                                              |
|--------------------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| id=43fbb_00000-seed=8373 |            4 | /home/acc12468eh/ray_results/dm_control/cheetah/run/2020-09-14T19-51-36-sl-sac/id=43fbb_00000-seed=8373_0_hidden_layer_sizes=(256, 256),preprocessors=({'pixels': {'class_name': 'convnet_preprocessor', 'config'_2020-09-14_19-51-38hsvhe5yt/error.txt |
+--------------------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Traceback (most recent call last):
  File "/home/acc12468eh/miniconda3/envs/softlearning/bin/softlearning", line 11, in <module>
    load_entry_point('softlearning', 'console_scripts', 'softlearning')()
  File "/home/acc12468eh/softlearning/softlearning/scripts/console_scripts.py", line 207, in main
    return cli()
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/acc12468eh/softlearning/softlearning/scripts/console_scripts.py", line 73, in run_example_local_cmd
    return run_example_local(example_module_name, example_argv)
  File "/home/acc12468eh/softlearning/examples/instrument.py", line 244, in run_example_local
    reuse_actors=True)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/tune.py", line 356, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [id=43fbb_00000-seed=8373])

And I cat the error.txt something like that I found.

(base) [acc12468eh@es2 ~]$ cat /home/acc12468eh/ray_results/dm_control/cheetah/run/2020-09-14T19-51-36-sl-sac/id=43fbb_00000-seed=8373_0_hidden_layer_sizes=(256, 256),preprocessors=({'pixels': {'class_name': 'convnet_preprocessor', 'config'_2020-09-14_19-51-38hsvhe5yt/error.txt

Content of error.txt

-bash: unexpected token `('

Thank you.

@hartikainen
Copy link
Member

I think what you're actually seeing is not the contents of error.txt but rather an error from bash. Can you wrap the cat argument in quotes? I.e.:

cat "/home/acc12468eh/ray_results/dm_control/cheetah/run/2020-09-14T19-51-36-sl-sac/id=43fbb_00000-seed=8373_0_hidden_layer_sizes=(256, 256),preprocessors=({'pixels': {'class_name': 'convnet_preprocessor', 'config'_2020-09-14_19-51-38hsvhe5yt/error.txt"

@letusfly85
Copy link
Author

@hartikainen

Oh, sorry. This is the correct error.txt content.

(base) [acc12468eh@es2 ~]$ cat /home/acc12468eh/ray_results/dm_control/cheetah/run/2020-09-14T19-51-36-sl-sac/id\=43fbb_00000-seed\=8373_0_hidden_layer_sizes\=\(256\,\ 256\)\,preprocessors\=\(\{\'pixels\'\:\ \{\'class_name\'\:\ \'convnet_preprocessor\'\,\ \'config\'_2020-09-14_19-51-38hsvhe5yt/error.txt
Failure # 1 (occurred at 2020-09-14_19-51-58)
Traceback (most recent call last):
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/worker.py", line 1540, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.

Failure # 2 (occurred at 2020-09-14_19-52-06)
Traceback (most recent call last):
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/worker.py", line 1540, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.

Failure # 3 (occurred at 2020-09-14_19-52-14)
Traceback (most recent call last):
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/worker.py", line 1540, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.

Failure # 4 (occurred at 2020-09-14_19-52-23)
Traceback (most recent call last):
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/home/acc12468eh/miniconda3/envs/softlearning/lib/python3.7/site-packages/ray/worker.py", line 1540, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.

@letusfly85 letusfly85 changed the title Unexpected token error only dm_control cheetah run dm_control cheetah run training stops suddenly Sep 19, 2020
@h8907283
Copy link

Yes, walker run is okay, but not cheetah run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants