Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immediate stop at training progress 0% #24

Open
willyawan16 opened this issue Jul 13, 2024 · 11 comments
Open

Immediate stop at training progress 0% #24

willyawan16 opened this issue Jul 13, 2024 · 11 comments

Comments

@willyawan16
Copy link

Does anyone face this problem?
Currently, I am trying to train with the dataset provided "Tanks/Francis", but it failed.

(cf3dgs) D:\Research\CF3DGS\CF-3DGS>python run_cf3dgs.py -s data/Tanks/Francis --mode train
Downloading: "https://github.com/intel-isl/MiDaS/zipball/master" to C:\Users\PC21/.cache\torch\hub\master.zip
Rotation type : 6d
Reading camera 150/150
Loading Training Cameras
Loading Test Cameras
Number of points at initialisation :  19361
Train images:  131
['000401', '000403', '000405', '000407', '000411', '000413', '000415', '000417', '000419', '000421', '000423', '000427', '000429', '000431', '000433', '000435', '000437', '000439', '000443', '000445', '000447', '000449', '000451', '000453', '000455', '000459', '000461', '000463', '000465', '000467', '000469', '000471', '000475', '000477', '000479', '000481', '000483', '000485', '000487', '000491', '000493', '000495', '000497', '000499', '000501', '000503', '000507', '000509', '000511', '000513', '000515', '000517', '000519', '000523', '000525', '000527', '000529', '000531', '000533', '000535', '000539', '000541', '000543', '000545', '000547', '000549', '000551', '000555', '000557', '000559', '000561', '000563', '000565', '000567', '000571', '000573', '000575', '000577', '000579', '000581', '000583', '000587', '000589', '000591', '000593', '000595', '000597', '000599', '000603', '000605', '000607', '000609', '000611', '000613', '000615', '000619', '000621', '000623', '000625', '000627', '000629', '000631', '000635', '000637', '000639', '000641', '000643', '000645', '000647', '000651', '000653', '000655', '000657', '000659', '000661', '000663', '000667', '000669', '000671', '000673', '000675', '000677', '000679', '000683', '000685', '000687', '000689', '000691', '000693', '000695', '000699']
Using cache found in C:\Users\PC21/.cache\torch\hub\intel-isl_MiDaS_master
Using cache found in C:\Users\PC21/.cache\torch\hub\intel-isl_MiDaS_master
D:\Research\CF3DGS\CF-3DGS\trainer\trainer.py:493: DeprecationWarning: Since kornia 0.7.0 the `depth_to_3d` is deprecated in favor of `depth_to_3d_v2`. This function will be replaced with the `depth_to_3d_v2` behaviour, where the that does not require the creation of a meshgrid. The return shape can be not backward compatible between these implementations.
  pts = depth_to_3d(depth_tensor[None, None],
Number of points at initialisation :  272338
optimizing frame 000
Training progress:   0%|                                                                      | 0/1000 [00:00<?, ?it/s]
@willyawan16
Copy link
Author

willyawan16 commented Jul 13, 2024

After tracing the code, I found out that it fails to retrieve the Tensor in self.P which is a list[LieGroupParameter]
Everytime the code needs access to LieGroupParameter, it is suddenly dumped and no output is shown.
Any solution to this?
1
2
3
inside render function,
4
5

@Wang-Chbo
Copy link

Have you fix it? I get the same bug

@OasisYang
Copy link
Collaborator

Does this problem occur only with Tanks/Francis? Can you print self.P or check self.seq_len?

@willyawan16
Copy link
Author

It occurs to all dataset (including the one that is provided Tanks and CO3D)
printing self.seq_len is not a problem
the problem lies in the self.P, when I try to print the self.P the code appears to stopped immediately
Here I print them in the function init_RT_seq() which is called from the init_two_view() function

Snipaste_2024-07-15_13-53-11

@OasisYang
Copy link
Collaborator

I guess this error comes from the failed installation of Lietorch. You can check its official repo to see if you can run the provided simple examples.

@willyawan16
Copy link
Author

willyawan16 commented Jul 15, 2024

@OasisYang When I try the test examples provided by Lietorch, it comes out that the immediate stop also happens when "Testing lietorch forward pass (GPU)". I guess that it has something to be done with memory leak.
May I ask what is your computer specification that you use to run this code? I would like to compare our computer specs.
And also what OS do you use?
Thank you
My computer spec is as follows:
OS Win10
CPU i7-9700
GPU RTX 2080
RAM 32 GB

@iampalop
Copy link

I get the same issue. Have you solved it yet?

@JHXion9
Copy link

JHXion9 commented Aug 5, 2024

I reinstall the Lietorch and solve the problem. the Eigen
eigen-824272cde8ca2541e8b67b0887f5ded92b128d1f.zip
. Besides, run instruction i used is as follow:
python run_cf3dgs.py -s ./data/cat/ --mode train --data_type custom --depth_model_type depth_anything

@RendongZhang
Copy link

I get the same ERROR. And I install lietorch successfully. Have you fixed it?

@DuHao55
Copy link

DuHao55 commented Sep 26, 2024

I get the same ERROR. And I install lietorch successfully. Have you fixed it?

May I ask how you solved this? I successfully installed lietorch and passed the test, but it still reports this error.

@willyawan16
Copy link
Author

In the end i tried to run it in wsl instead

And in the installation phase it is written that we need to install cuda by this command:
conda install conda-forge::cudatoolkit-dev=11.7.0
But i replaced that line by manually install cuda 11.7 from this link instead:
https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux

Magically it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants