Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

numpy.linalg.LinAlgError: SVD did not converge #896

Closed
Papageno2 opened this issue Jan 9, 2021 · 23 comments · Fixed by #902
Closed

numpy.linalg.LinAlgError: SVD did not converge #896

Papageno2 opened this issue Jan 9, 2021 · 23 comments · Fixed by #902

Comments

@Papageno2
Copy link

Thx for the current repo.

I want to use TensorNetwork package to simulate time evolution of transverse ising model with itebd algorithm.
but it seems svd throws out error like this, can anyone give answers or any hints?

  • does the split_node_full_svd used in apply_two_site_gate() has parameter like max_iter or something else?
  • are there demo code snippets for using tn.InfiniteMPS or tn.FiniteMPS with tebd algorithm?

thanks~

$ python evolution.py 
1727.9739590611944
6.306919862739928e-15
0.014427307657060863
7.142939906200455e-15
0.0015813901699260678
7.156663235322543e-15
../anaconda3/envs/tf2/lib/python3.7/site-packages/tensornetwork/backends/numpy/numpy_backend.py:90: RuntimeWarning: invalid value encountered in sqrt
  return np.sqrt(tensor)

Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.

Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.
Traceback (most recent call last):
  File "evolution.py", line 83, in <module>
    itebd_ising(N=10)
  File "evolution.py", line 76, in itebd_ising
    imps_state.canonicalize()
  File "../anaconda3/envs/tf2/lib/python3.7/site-packages/tensornetwork/matrixproductstates/infinite_mps.py", line 276, in canonicalize
    relative=True)
  File "..anaconda3/envs/tf2/lib/python3.7/site-packages/tensornetwork/backends/numpy/numpy_backend.py", line 627, in svd
    relative=relative)
  File "../anaconda3/envs/tf2/lib/python3.7/site-packages/tensornetwork/backends/numpy/decompositions.py", line 36, in svd
    u, s, vh = np.linalg.svd(tensor, full_matrices=False)
  File "<__array_function__ internals>", line 6, in svd
  File "../anaconda3/envs/tf2/lib/python3.7/site-packages/numpy/linalg/linalg.py", line 1636, in svd
    u, s, vh = gufunc(a, signature=signature, extobj=extobj)
  File "../anaconda3/envs/tf2/lib/python3.7/site-packages/numpy/linalg/linalg.py", line 106, in _raise_linalgerror_svd_nonconvergence
    raise LinAlgError("SVD did not converge")
numpy.linalg.LinAlgError: SVD did not converge

code script shown below,

import tensornetwork as tn 
import numpy as np 
import tensorflow as tf
# tn.set_default_backend('numpy')

def itebd_ising(N=1000):
    J=1.0; 
    g=0.5; 
    chi=50
    d=2;  
    delta=0.05;
    # N=3000;

    # two-site Hamiltonian
    H = np.array([[J, -g/2, -g/2,0],
                [-g/2, -J,0,-g/2],
                [-g/2.,0,-J,-g/2.],
                [0,-g/2.,-g/2.,J]])
    w,v=np.linalg.eig(H)
    U=np.reshape(np.dot(np.dot(v,np.diag(np.exp(-delta*w))),np.transpose(v)),(2,2,2,2))

    imps_state=tn.InfiniteMPS.random(d=[2,2], D=[chi,chi,chi], dtype=np.complex128)
    print(imps_state.check_canonical())
    imps_state.canonicalize()
    print(imps_state.check_canonical())
    truncation=[]
    for step in range(N):
        if step%2==0:
            truncation_err = imps_state.apply_two_site_gate(U, site1=0, site2=1,max_singular_values=chi)
        else:
            truncation_err = imps_state.apply_two_site_gate(np.transpose(U,(1,0,3,2)), site1=0, site2=1,max_singular_values=chi)
        truncation.append(np.linalg.norm(truncation_err))
        imps_state.canonicalize()
        print(np.linalg.norm(truncation_err))
        print(imps_state.check_canonical())
        # pdb.set_trace()
    print("-"*20)
    print(sum(truncation))
if __name__ == "__main__":
    itebd_ising(N=10)
@alewis
Copy link
Contributor

alewis commented Jan 11, 2021 via email

@mganahl mganahl mentioned this issue Jan 12, 2021
@mganahl
Copy link
Contributor

mganahl commented Jan 12, 2021

thanks for issue @Papageno2. I submitted a fix for the error. I'm not sure if the iTEBD actually converges to the right state (some quick tests indicate it always converges to a product state), so there could still be a bug somewhere. Let me know if there are any more issues

@Papageno2
Copy link
Author

Papageno2 commented Jan 12, 2021

I think there is a NaN or an Inf somewhere in imps_state

I think so, but I don't know how to check it out and why it occurs. @alewis

@mganahl
Copy link
Contributor

mganahl commented Jan 12, 2021

it was a precision issue

@sr33dhar
Copy link

it was a precision issue

Hi @mganahl,
Sorry for the long absence again, but I believe this is related to the issue #888 I had raised too some time ago.
Has there been a resolution to this? Thanks again, Martin!

@mganahl
Copy link
Contributor

mganahl commented Jan 14, 2021

I submitted a fix, hopefully we'll pull it in today!

@sr33dhar
Copy link

Can you please elaborate on what the issue was?
Thanks!

@mganahl
Copy link
Contributor

mganahl commented Jan 14, 2021

a precision problem. Canonicalize takes the sqrt of eigenvalues of left and right reduced density matrices. EVs of the rdm's grew small (order of precision), and due to finite precision arithmetic actually negative. np.sqrt of a negative float gives nan, producing the issue.

@mganahl
Copy link
Contributor

mganahl commented Jan 14, 2021

@sr33dhar I can't really tell though if this will solve your problem. Best you try it and let us know

@sr33dhar
Copy link

Hey @mganahl,

For smaller instances, this had fixed the problem,
but when I am now trying out simulations of 20 qubit MPS states, the same SVD issue arises.

I am trying to create a parametrised state with two parameters.

When applying no ways to limit the bond-dimension, I am getting SVD errors while using the apply_two_site_gate() function

This error arises only when the bond-dimension is high. It almost always breaks when the maximum bond-dimension of 1024 (for 20 qubits) is used.

But repeating the same calculation with the same input parameters a few times seems to be resolving the issue.
So the entire process looks a bit random.

I believe this attached file should reproduce the issue. (Attaching the .py code as a .txt file)
Can you please help me out here?

Test_code.txt

Thanks again, Martin!
Much appreciated

@mganahl
Copy link
Contributor

mganahl commented Mar 24, 2021

Hi @sr33dhar I looked at the code, and there is indeed something going on in numpy. The issue I believe arises because the singular values of the matrices you are creating are highly degenerate. I think that if you were to enforce a canonical form on the MPS, this would fix your problem, but I haven't tried it. Instead, I submitted a PR (similar to your PR, which is still open (sorry!)) that defaults to using QR instead of SVD if no truncation is performed. This fixes the issue. Hopefully we'll have it merged soon.

@mganahl
Copy link
Contributor

mganahl commented Mar 24, 2021

Apart from the above, I noticed that the code you sent uses inconsistent dtypes, i.e. you are setting some dtypes to complex64, but others are float64 or complex128. In order to avoid any suprises there just set all types of all arrays to be the same

@sr33dhar
Copy link

Thanks, @mganahl! :)

The code snippet I've provided is only part of what I'm trying to do, and for the method, I'm developing:

1.) I have to reduce the bond-dimension and make sure that all bond-dimensions do not cross a set threshold D_max

2.) After truncation, my states have to remain non-normalised.

So given these, are you suggesting that using the tn.FiniteMPS.canonicalize(normalize: bool = False) after each apply_two_site_gate() operation? I tried this just now, and it looks to be not working.

Another thing I tried doing is increasing the precision of everything to complex128 hoping that higher precision will resolve this, but that too is poorly implemented and didn't work.

And no worries about the previous QR suggestion. I'm sorry I did not follow up on that too! x|

@sr33dhar
Copy link

Also, @mganahl, in a more complete version of the code, I am also doing implicit truncations to the MPS state using a self-defined truncate_mps() function that takes an MPS state and returns a truncated state with all bond-dimensions <= a set limit, D_max.

Attaching that code too here as a .txt file.
Test_code2.txt

The same error with SVD appears even for bond-dimensions = 64.
This is again for a 20-qubit instance where the maximum bond-dim = 1024.

P.S. I started using python only for this project, and so if there are any suggestions to make the code better/faster, please do state them.

Thanks again, Martin :)

@mganahl
Copy link
Contributor

mganahl commented Mar 25, 2021

Hi @sr33dhar, had a very quick look at your code. The truncation there will in many cases lead to non-optimal truncations, something that you definitely want to avoid. I submitted a PR that adds truncation to the MPS class via the position function. You can use that for truncating your MPS

@sr33dhar
Copy link

sr33dhar commented Apr 4, 2021

Hey @mganahl, are these changes accessible by just re-installing the latest version? Sorry, I'm still very new to Git.

@mganahl
Copy link
Contributor

mganahl commented Apr 5, 2021

it's not released yet. You can clone #913 to your local machine and install it there

@sr33dhar
Copy link

Hey @mganahl,

After the new update (version 0.4.5), the number of SVD not converging errors have definitely reduced, but they still seem to be happening. I've stopped using my self-defined truncation function, and am now using the built-in truncation option from the apply_two_site_gate() function.

Yet for the code snippet attached, I am getting an 'SVD did not converge' error after ~30 iterations on an average.

Test_code_III.txt

Also, if you spot anything, please do let me know if there are better ways to achieve what I am doing in the code.

Thanks again for all your help, Martin! I really appreciate it :)

@mganahl
Copy link
Contributor

mganahl commented Apr 30, 2021

thanks for the message! Could you isolate parameter values for which the code breaks, so that I don't have to wait for 30 iterations to finish until the error appears? Thanks!

@mganahl mganahl reopened this Apr 30, 2021
@sr33dhar
Copy link

sr33dhar commented May 1, 2021

Hey @mganahl,

Thanks again for your prompt response!

1.) Previously, for me, the SVD error was very random and running the same code with the same parameter values sometimes solved the issue. This is why in the code attached above [Test_code_III.txt], I was initialising the parameters with a random number each time.

2.) But now, the error looks to be more deterministic. From the few times, I ran this code, it seemed like most errors occurred when the parameter corresponding to two-qubit gates was an element of [1.4, 1.6] radians. Not sure if this is the only interval though.

3.) In the code attached below, 10 sets of parameters where I got an SVD error are hard-coded into the Error_list array.
Test_code_IV.txt

4.) Sometimes, reducing the decimal precision seems to solve the issue.

Thanks again, Martin! :)
Cheers,
Rishi

@sr33dhar
Copy link

sr33dhar commented May 1, 2021

2.) The error does not look to be so deterministic anymore.
Re-running the same code resulted in no error for the first 5 parameter sets. Not really sure why though.

@mganahl
Copy link
Contributor

mganahl commented May 3, 2021

Hi @sr33dhar, I had a look at it. This is indeed a bug, but with the numpy package. The problem is caused by the matrix you are trying to sv decompose. The matrix essentially is highly singular, i.e. it has very many singular values at or very close to machine epsilon. lapack (the underlying fortran library that numpy is using) apparently has some trouble if the singular value spectrum has this kind of property. There are some ad-hoc workarounds that can fix this issue in some cases (e.g. doing a QR prior to the svd).

From an MPS perspective, your code seems to be using a much too large bond dimension. I think you should try to add matrix truncation based on a truncation threshold, rather than an absolute bond dimension. This will 1) improve efficiency (at least during early runs) and 2) will likely fix the svd issue you are seeing.

@gefux
Copy link
Contributor

gefux commented May 10, 2022

Hi all!
I believe my recent PR #962 might be of help here. With the fallback to the _gesvd LAPACK routine all examples in the Test_code_IV.txt worked reliably, at least for me. I've run them all 5 times now. Is that also true for you, @sr33dhar?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants