Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue: h5 writing of AtomDiag object very slow #844

Open
the-hampel opened this issue May 26, 2022 · 0 comments
Open

Performance issue: h5 writing of AtomDiag object very slow #844

the-hampel opened this issue May 26, 2022 · 0 comments

Comments

@the-hampel
Copy link
Member

Writing an AtomDiag object into an h5 archive is very slow due to the unfortunate data layout. Especially, for sparse Hamiltonians with many sub-blocks, a few ~100k small datasets need to be written, making the h5 serialization very slow. This problem shows itself also when broadcasting an AtomDiag object, as the broadcast uses h5 for serialization.

Here is a minimal example for n orbitals with a sparse structure. For n<=5 this works fine and finishes in a few seconds. However, when running for n=6 or even n=7 solving the problem is very fast, but writting or bcast the object takes very long:

from triqs.gf import *
from triqs.operators import *
from triqs.utility import mpi
from h5 import HDFArchive
from triqs.atom_diag import *
from itertools import product
import numpy as np
from triqs.operators.util.hamiltonians import h_int_kanamori
import timeit

spin_names = ('up','dn')
n_orb = 6
orb_names = list(range(n_orb))
fops = [(sn,on) for sn, on in product(spin_names,orb_names)]

H = n('up',0) * n('dn',0)

# Split the Hilbert space automatically
start_time = timeit.default_timer()
ad = AtomDiag(H, fops)
mpi.report('time for AD: {:.2f} s'.format(timeit.default_timer() - start_time))

# this part takes very long especially with sparce Hamiltonians with many blocks
if mpi.is_master_node():
    start_time = timeit.default_timer()
    with HDFArchive('test.h5','w') as ar:
        ar['ad'] = ad
    mpi.report('time for h5 write: {:.2f} s'.format(timeit.default_timer() - start_time))

start_time = timeit.default_timer()
ad = mpi.bcast(ad)
mpi.report('time for h5 bcast: {:.2f} s'.format(timeit.default_timer() - start_time))

for example for n=7 the output looks like:

time for AD: 0.56 s
time for h5 write: 48.08 s
time for h5 bcast: 30.18 s

We should think about how to more efficiently serialize the atomdiag object. This is now done via h5:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant