Absolute State-wise Constrained Policy Optimization

Environment Installation

Install environment:

conda create --name venv python=3.8
conda activate venv
pip install -r requirements.txt

Then, install rl_envs by:

cd rl_envs
pip install -e .

After building the environment, please put any algorithms that you want to run into the folder safe_rl_lib in GUARD repository.

Task Configuration

Please refer to https://github.com/intelligent-control-lab/guard for all tasks supported by this benchmark and how to run them. Also due to the algorithmic nature, please add the word ‘_noconti’ after all tasks to turn off continuous mode. Your final task name format should be: <Task>_<Robot>_<Num><Constraint_Type>_noconti

Parameters Tuning

Two critical hyperparameters need to be chosen wisely which are omega1 and omega2. omega1 refers to $|\mu^\top|\infty$ and omega2 refers to the $K{max}$. For further explanation, please check our original paper.

We tune these two parameters by grid search in domain $[0.001, 0.003, 0.005, 0.007, 0.01]\times[0.001, 0.003, 0.005, 0.007, 0.01]$.

Policy Training

Here is an example of how to turn on task-specific agent training with ASCPO for default settings:

cd ascpo
python ascpo.py --task <Task>_<Robot>_<Num><Constraint_Type>_noconti --seed {seed} --model_save

If you want to use the downsampling technique, add the word 'sub' to the experiment name:

cd ascpo
python ascpo.py --task <Task>_<Robot>_<Num><Constraint_Type>_noconti --seed {seed} --model_save --exp_name <Name>-sub

<Name> is a self-defined name for your own convenience.

If you want to use monotonic-descent technique, add the work 'delta' to the experiment name:

cd ascpo
python ascpo.py --task <Task>_<Robot>_<Num><Constraint_Type>_noconti --seed {seed} --model_save --exp_name <Name>-delta

You can use these two techniques at the same time.

For more hyperparametes, please check the code.

Visualization

To plot training statistics (e.g., reward, cost, cost rate performance), copy the all desired log folders to comparison/ and then run the plot script as follows:

cd rl_lib
mkdir comparison
cp -r <algo>/logs/<exp name> comparison/
python utils/plot.py comparison/ --title <title name> --reward --cost

<title name> can be anything that describes the current comparison (e.g., "all end-to-end methods").

To test a trained RL agent on a task and save the video:

python ascpo_video.py --model_path logs/<exp name>/<exp name>_s<seed>/pyt_save/model.pt --task <env name> --video_name <video name> --max_epoch <max epoch>

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
safe_rl_envs		safe_rl_envs
safe_rl_lib		safe_rl_lib
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Absolute State-wise Constrained Policy Optimization

Environment Installation

Task Configuration

Parameters Tuning

Policy Training

Visualization

About

Releases

Packages

Languages

License

intelligent-control-lab/Absolute-State-wise-Constrained-Policy-Optimization

Folders and files

Latest commit

History

Repository files navigation

Absolute State-wise Constrained Policy Optimization

Environment Installation

Task Configuration

Parameters Tuning

Policy Training

Visualization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages