-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.3 fine tuning duration too short #516
Comments
I think that's a normal, normally it takes 4s for 93x352x640 , and your videos are more shorter. |
@LinB203 thx for the reply! In my case, My question is: CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes=1--master_port 29514 \
-m opensora.sample.sample \
--model_path path_to_check_point_model_ema \
--version v1_3 \
--num_frames 33 \
--height 352 \
--width 640 \
--cache_dir "../cache_dir" \
--text_encoder_name_1 "/storage/ongoing/new/Open-Sora-Plan/cache_dir/mt5-xxl" \
--text_prompt "examples/prompt.txt" \
--ae WFVAEModel_D8_4x8x8 \
--ae_path "/storage/lcm/WF-VAE/results/latent8" \
--save_img_path "./train_1_3_nomotion_fps18" \
--fps 16 \
--guidance_scale 7.5 \
--num_sampling_steps 100 \
--max_sequence_length 512 \
--sample_method EulerAncestralDiscrete \
--seed 1234 \
--num_samples_per_prompt 1 \
--rescale_betas_zero_snr \
--prediction_type "v_prediction" |
You can see |
@LinB203 thx for the reply!
So you're saying one step in the training phase (the image I've attached above that shows the progress bar) is basically one denoising step? just like as the sampling step (parameter The output UPDATEoh I think I got it. U right. Also, there's one step training per batch according to the code (I think), which makes 93 steps in total per epoch (93 batches in an epoch). One step here is denoising step for a given timestep. So one step in the training phase is basically the same as a step in the sampling phase ( |
Hi,
I'm fine-tuning v1.3 any93x640x640 (https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.3.0/tree/main/any93x640x640) with the videos of 352x640 (height, width), fps 16.
I see that 1 epoch (93 steps) takes only around 4 minutes. Is this expected? I think it takes too short amount of time.
I'm using 2 A100 gpus. 1 batch size per gpu.
Below I provide a part of json that consists of video data.
Below is the part of the output in the terminal during the training process.
Below is the arguments I used:
The text was updated successfully, but these errors were encountered: