Immediate stop at training progress 0% #24

willyawan16 · 2024-07-13T09:06:41Z

Does anyone face this problem?
Currently, I am trying to train with the dataset provided "Tanks/Francis", but it failed.

(cf3dgs) D:\Research\CF3DGS\CF-3DGS>python run_cf3dgs.py -s data/Tanks/Francis --mode train
Downloading: "https://github.com/intel-isl/MiDaS/zipball/master" to C:\Users\PC21/.cache\torch\hub\master.zip
Rotation type : 6d
Reading camera 150/150
Loading Training Cameras
Loading Test Cameras
Number of points at initialisation :  19361
Train images:  131
['000401', '000403', '000405', '000407', '000411', '000413', '000415', '000417', '000419', '000421', '000423', '000427', '000429', '000431', '000433', '000435', '000437', '000439', '000443', '000445', '000447', '000449', '000451', '000453', '000455', '000459', '000461', '000463', '000465', '000467', '000469', '000471', '000475', '000477', '000479', '000481', '000483', '000485', '000487', '000491', '000493', '000495', '000497', '000499', '000501', '000503', '000507', '000509', '000511', '000513', '000515', '000517', '000519', '000523', '000525', '000527', '000529', '000531', '000533', '000535', '000539', '000541', '000543', '000545', '000547', '000549', '000551', '000555', '000557', '000559', '000561', '000563', '000565', '000567', '000571', '000573', '000575', '000577', '000579', '000581', '000583', '000587', '000589', '000591', '000593', '000595', '000597', '000599', '000603', '000605', '000607', '000609', '000611', '000613', '000615', '000619', '000621', '000623', '000625', '000627', '000629', '000631', '000635', '000637', '000639', '000641', '000643', '000645', '000647', '000651', '000653', '000655', '000657', '000659', '000661', '000663', '000667', '000669', '000671', '000673', '000675', '000677', '000679', '000683', '000685', '000687', '000689', '000691', '000693', '000695', '000699']
Using cache found in C:\Users\PC21/.cache\torch\hub\intel-isl_MiDaS_master
Using cache found in C:\Users\PC21/.cache\torch\hub\intel-isl_MiDaS_master
D:\Research\CF3DGS\CF-3DGS\trainer\trainer.py:493: DeprecationWarning: Since kornia 0.7.0 the `depth_to_3d` is deprecated in favor of `depth_to_3d_v2`. This function will be replaced with the `depth_to_3d_v2` behaviour, where the that does not require the creation of a meshgrid. The return shape can be not backward compatible between these implementations.
  pts = depth_to_3d(depth_tensor[None, None],
Number of points at initialisation :  272338
optimizing frame 000
Training progress:   0%|                                                                      | 0/1000 [00:00<?, ?it/s]

The text was updated successfully, but these errors were encountered:

willyawan16 · 2024-07-13T14:45:07Z

After tracing the code, I found out that it fails to retrieve the Tensor in self.P which is a list[LieGroupParameter]
Everytime the code needs access to LieGroupParameter, it is suddenly dumped and no output is shown.
Any solution to this?

inside render function,

Wang-Chbo · 2024-07-15T02:28:34Z

Have you fix it? I get the same bug

OasisYang · 2024-07-15T05:21:07Z

Does this problem occur only with Tanks/Francis? Can you print self.P or check self.seq_len?

willyawan16 · 2024-07-15T05:59:18Z

It occurs to all dataset (including the one that is provided Tanks and CO3D)
printing self.seq_len is not a problem
the problem lies in the self.P, when I try to print the self.P the code appears to stopped immediately
Here I print them in the function init_RT_seq() which is called from the init_two_view() function

OasisYang · 2024-07-15T06:12:59Z

I guess this error comes from the failed installation of Lietorch. You can check its official repo to see if you can run the provided simple examples.

willyawan16 · 2024-07-15T09:12:47Z

@OasisYang When I try the test examples provided by Lietorch, it comes out that the immediate stop also happens when "Testing lietorch forward pass (GPU)". I guess that it has something to be done with memory leak.
May I ask what is your computer specification that you use to run this code? I would like to compare our computer specs.
And also what OS do you use?
Thank you
My computer spec is as follows:
OS Win10
CPU i7-9700
GPU RTX 2080
RAM 32 GB

iampalop · 2024-07-31T05:25:33Z

I get the same issue. Have you solved it yet?

JHXion9 · 2024-08-05T09:33:09Z

I reinstall the Lietorch and solve the problem. the Eigen
eigen-824272cde8ca2541e8b67b0887f5ded92b128d1f.zip
. Besides, run instruction i used is as follow:
python run_cf3dgs.py -s ./data/cat/ --mode train --data_type custom --depth_model_type depth_anything

RendongZhang · 2024-09-19T19:59:01Z

I get the same ERROR. And I install lietorch successfully. Have you fixed it?

DuHao55 · 2024-09-26T03:49:50Z

I get the same ERROR. And I install lietorch successfully. Have you fixed it?

May I ask how you solved this? I successfully installed lietorch and passed the test, but it still reports this error.

willyawan16 · 2024-10-15T09:17:12Z

In the end i tried to run it in wsl instead

And in the installation phase it is written that we need to install cuda by this command:
conda install conda-forge::cudatoolkit-dev=11.7.0
But i replaced that line by manually install cuda 11.7 from this link instead:
https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux

Magically it works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Immediate stop at training progress 0% #24

Immediate stop at training progress 0% #24

willyawan16 commented Jul 13, 2024

willyawan16 commented Jul 13, 2024 •

edited

Loading

Wang-Chbo commented Jul 15, 2024

OasisYang commented Jul 15, 2024

willyawan16 commented Jul 15, 2024

OasisYang commented Jul 15, 2024

willyawan16 commented Jul 15, 2024 •

edited

Loading

iampalop commented Jul 31, 2024

JHXion9 commented Aug 5, 2024 •

edited

Loading

RendongZhang commented Sep 19, 2024

DuHao55 commented Sep 26, 2024

willyawan16 commented Oct 15, 2024

Immediate stop at training progress 0% #24

Immediate stop at training progress 0% #24

Comments

willyawan16 commented Jul 13, 2024

willyawan16 commented Jul 13, 2024 • edited Loading

Wang-Chbo commented Jul 15, 2024

OasisYang commented Jul 15, 2024

willyawan16 commented Jul 15, 2024

OasisYang commented Jul 15, 2024

willyawan16 commented Jul 15, 2024 • edited Loading

iampalop commented Jul 31, 2024

JHXion9 commented Aug 5, 2024 • edited Loading

RendongZhang commented Sep 19, 2024

DuHao55 commented Sep 26, 2024

willyawan16 commented Oct 15, 2024

willyawan16 commented Jul 13, 2024 •

edited

Loading

willyawan16 commented Jul 15, 2024 •

edited

Loading

JHXion9 commented Aug 5, 2024 •

edited

Loading