Skip to content

How to Use Arg Files

Neo-X edited this page Sep 19, 2019 · 1 revision

Intro

Arg files are used to setup and describe the type of simulation and/or learning method you want to use.

// Start with the type of scenario to run
// train_<learning method> to train a character using a particular learning method
// track_motion to have the character track a particular motion
// poli_eval to evaluate a particular method used for learning after the policy has been learned
// -scenario= train_cacla
// -scenario= track_motion
-scenario= poli_eval
// Where to output intermediate policies while training
-output_path= output/dog_cacla_model.h5

// The character file to use. Helps construct the rigid body links for the character
-character_file= data/characters/dog.txt
// The state file to use. This describes the initial state parameters for the controller
// It helps to ensure these states are stable gates beforehand.
-state_file= data/states/dog_bound_state3.txt

-num_threads= 4

// The particular type of character class that should be used to simulate
-char_type= dog
// The particular controller to be used to control the above character
-char_ctrl= dog_cacla
// The specific terrain file that will be used to determine the type of generated terrain
-terrain_file= data/terrain/mixed.txt

// The number of PD update steps between frames
-num_update_steps= 20
// The number of substeps between the PD steps to simulate physics
-num_sim_substeps= 5
// A scaling factor that is suppose to help increase numerical stability in Bullet
-world_scale= 4

// These are input files to the learning method for the actor
-policy_solver= data/policies/dog/nets/dog_actor_solver.prototxt
-policy_net= data/policies/dog/nets/dog_actor_deploy.prototxt
// This needs to be supplied for policy evaluation
-policy_model= output/dog_cacla_model.h5

// These are input files to the learning method for the critic
-critic_solver= data/policies/dog/nets/dog_critic_solver.prototxt
-critic_net= data/policies/dog/nets/dog_critic_deploy.prototxt
// This needs to be supplied for policy evaluation
-critic_model= output/dog_cacla_model_critic.h5

// Parameter to determine the number of training steps to blend epsilon greedy action selection over.
-trainer_num_anneal_iters= 50000
// Parameter to determine the number of training steps to blend the probability of selecting from the initial action set.
-exp_base_anneal_iters= 50000

// Final exploration rates
-exp_rate= 0.2
-exp_temp= 0.1
// The probability of selecting actions from the initial action set
-exp_base_rate= 0.01
// Initial exploration rates
-init_exp_rate= 1
-init_exp_temp= 1
-init_exp_base_rate= 1

// Max number of training iterations to run
-trainer_max_iter= 1000000000
-trainer_freeze_target_iters= 0
-trainer_int_iter= 2000
// Where to save intermediate models of the learned policy
-trainer_int_output= output/intermediate/trainer_int_model.h5
// The number of tuples that are used in batched updates
-tuple_buffer_size= 32
-trainer_num_steps_per_iters= 1
// The number of training iterations before another dump is done to save the current policy
-trainer_iters_per_output= 200
// Whether or not to calculate the initial policy input scaling
-trainer_init_input_offset_scale= true

-trainer_enable_async_mode= false
// The number of training samples to collect to init the policy scaling and start learning iterations
-trainer_num_init_samples= 50000
// Full size of replay memory that experience tuples are stored in.
-trainer_replay_mem_size= 500000

//-trainer_enable_async_mode= true
//-trainer_num_init_samples= 6250
//-trainer_replay_mem_size= 62500