Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Portilla-Simoncelli code more efficient #222

Open
billbrod opened this issue Sep 20, 2023 · 1 comment
Open

Make Portilla-Simoncelli code more efficient #222

billbrod opened this issue Sep 20, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@billbrod
Copy link
Collaborator

Because of the multiscale representation used in the Portilla-Simoncelli texture model, it's hard to write the code in a GPU-friendly way. I attempted to do so with in this commit, by using the non-downsampled pyramid, which allowed me to store the coefficients as a single tensor of shape (batch, channel, n_scales, n_orientations, height, width), which made a variety of things easier. It is probably worth comparing against the version here, which uses the downsampled version, packs the coefficients into lists of length n_scales, each entry of which is a tensor of shape (batch, channel, n_orientations, height/2^s, width/2^s), where s is the scale / index in the list, and uses list comprehension for much of the computations.

However, the version that uses the non-downsampled pyramid is significantly slower on the CPU (slightly faster on the GPU), much more memory-intensive, and it's really hard to compare against the earlier code -- in general, it's just not the same, because e.g., computing the autocorrelation of the coefficients at the coarsest scale gives a different answer depending on when that scale is size (64, 64) or (256, 256). You can downsample the coefficient image before computing the autocorrelation or downsampling it afterwards, but there's no way to do that efficiently on the GPU (vmap doesn't let you convert something being vmapped over to an int or like dynamically-sized inputs, one of which would be necessary to do the downsampling and center cropping in an efficient manner) and it's still generally not the same value (something like allclose(rtol=1e-1, atol=1e-1)). And you must do the downsampling, or the definition of "autocorrelations up to 4 shifts in all directions" is completely different at the coarser scales.

With the downsampled + list comprehension version of the code, I tried considering the lists as pytree and using tree_map (either from torch.utils._pytree, jax, or optree) and got no speedup (either on GPU or CPU, probably because a list of 4d/5d tensors is not that difficult of a pytree to parse), and vmap doesn't like ragged data (a list of variably-sized arrays) or lists of structs, so I couldn't figure out a way to make anything vmap-able.

So, unsure how to improve this.

@billbrod billbrod added the enhancement New feature or request label Sep 20, 2023
@billbrod
Copy link
Collaborator Author

A good chunk of time is also spent in the forward and recon_pyr methods of the steerable pyramid, so improving the efficiency of those would also help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant